Personalization of Big Data Analytics: Personal Genome Sequencing

November 24, 2015

“The greatest single generator of wealth over the last 40 years, in the digital revolution, has been a transition from ABC’s to 1’s and 0’s,” says author Juan Enríquez. What will be the single biggest driver of change tomorrow? Enríquez believes the single biggest driver of change, of growth, of industries in the future is the transition to writing in life code. The field known as bioinformatics will be the technological transformation for big data. Today, technology is merely playing on the fringe of its real potential.

What if we could program a cell to make stuff we want? To store data, we want stored.

Big data analytics when applied to personal genome sequencing, the combination of digital code and life code, represents the greatest social driver of change since the invention of the computer; the programmable cell (Enriquez, 2007).

Big Data Analytics Applied

Behavior analytics, biometrics, distributive processing, Hadoop, Hana, HBase, and Hive have all risen in frequency in big data discussions. Big data is a term describing the large volume of data – both structured and unstructured. It also implies that conventional databases can’t handle the processing and analytics to ensure the data is retrievable and usable information recorded on a persistent medium. In short this data is recorded, stored, processed, and used to make better decisions. Due to the magnitude of data standard processing, often is ineffective.

Typically, big data is used for decisions around business operations, supply chain management, vendor metrics or executive dashboards. When the data involved is enormous big data can be applied for better business outcomes, here are three examples:

UPS – uses sensor data and big-data analytics on delivery vehicles to monitor speed, miles per gallon, number of stops and engine health. The sensors collect over 200 data points for each a fleet of 80,000 vehicles daily.
Macy’s – before every sale prices are adjusted in practically real time for their 73 million items on sale. Using SAS © the reduced analytics cost by $500k.
Virgin Atlantic – leveraging multiple-internet –connected parts huge volumes of data are produced. For example, each connected flight can generate more than ½ a terabyte of data. The goal is to improve flight and fuel efficiency and use predictive analytics to fly safer planes leveraging predictive maintenance.

Each are helpful in their own right, but they are several degrees away from personally affecting your health. We’re simply not committed to understanding how to leverage these process, tools and techniques into our business. Do you know of the big data stories like King’s Hawaiian or BC Hydro? Why not? The answer is they are not personalized. So let’s make it personal and explore big data analytics applied to personal genome sequencing. This is about how big data can impact your personal health and your lifespan. Interested?

Quest to Map the Human Genome

The Human Genome Project, officially started in 1990 coordinated by the U.S Department of Energy (DOE) and National Institutes of health (NIH, an agency of the United States Department of Health and Human Services), in a 13-year initiative. The goal of the project was to determine the sequence of chemical base pairs that make up human DNA, and identify and map all of the genes of the human genome from both a physical and functional standpoint. The initial lofty goal wasn’t fully realized, but they came very close when the project closed in 2003. This project still remains the world’s largest collaborative biological project, at a cost of $3 billion; the project sequenced 90% of the human genome (only euchromatic regions specifically). Shortly after this project personal genome sequencing gained rapid interest globally. Personal genomics is the branch of genomics concerned with the sequencing and analysis of the genome of an individual. Direct-to-consumer genome sequencing has been slow to gain adoption, despite a number of firms providing this service globally: Australia (Lumigenix), Beliguim (Gentle Labs), China (IDNA.com, Mygen23), Finland (Geenitesti), Ireland (Geneplanet), and UK (Genotek) among others.

According to Snyder, Du and Gerstein in their 2010 paper ‘Personal genome sequencing: current approaches and challenges,’ “the immediate applications of the information derived from a human genome sequence, both on the individual and societal levels, are somewhat limited at present…the promise of personal genomics lies in the future, as we amass a database of personal genomes.” Today the database of genomes doesn’t exist to provide deep analytics. Similarly, the large database of diseases doesn’t exist to map genomes against. Both databases, the genome and the disease, are required in order to draw correlations to probable individualistic health outcomes. There are about 30,000 diseases known to man, with about 1/3 containing effective treatments. The Center for Disease Control and Prevention sited that about 7,000 of those diseases are rare with about 20 new rare diseases being identified every month. Connecting diseases and genetic variations have proven to be incredibly elusive and complicated despite the positive advancements made under the human genome project.

Everyone has access the latest advancements in medicine. All diseases are screenable.

Why aren’t these two statements true today? By applying big data and big data analytics, these two statements could be true. Steve jobs believed. “Steve Jobs, co-founder of Apple Inc., was one of the first 20 people in the world to have his DNA sequenced, for which he paid $100,000. He also had the DNA of his cancer sequenced, in the hope, it would provide information about more appropriate treatments for him and for other people with the same cancer (yourgenome.org, 2015).” Similar to his journey with Apple Computer, Jobs was ahead of his time with genomics. Let’s venture on.

Writing and Rewriting DNA

Author Juan Enríquez, is a futurist and a visionary in the space of bioinformatics with some intriguing ideas. The global pace of data generation is staggering and will continue to continue along its exponential growth curve to 1.8 zettabytes (a zettabyte is a trillion gigabytes; that’s a 1 with 21 zeros trailing behind it). Where will all this data and information reside? In bigger and bigger computers? No. The answer lies in bioinformatics: in programmable bacteria cells. Bacteria is already being designed to clean toxic waste and emit light. These cells are programmable like a computer chip and changing how things we want are made and removing the boundaries of where things are made; Exxon is attempting to program algae to generate gasoline, BP is working to extract gas from coal, and Novartis is rapid-prototyping vaccines (Bonnet, Subsoontorn, Endy, 2012). What’s next? Creating a cell to generate energy or produce plastics? It’s a fascinating space to explore. Think about the ideal storage eco-system. Juan Enríquez, elaborates and said that an ounce of DNA can theoretically hold 300,000 terabytes of data and survive intact for more than 1,000,000 years.

“Anything you can store in a computer you can store in bacteria” — Juan Enríquez

This software makes its own hardware and operates on a nanoscale. The future of big data is a world where computers are designed, that could float on a speak of dust — as powerful as a laptop today — life is so efficient it can copy itself (reproduce) and make billions of copies. The global benefits of bioinformatics applied to healthcare outcomes will be incredible. Understanding the human genome and personal genome sequencing are the keys to unlocking this mystery (Enriquez, 2007).

Improving Your Health

Each individual has a unique ‘genome’ and ‘mapping the human genome’ involves sequencing multiple variations of each gene. Genes are stretches of DNA that code a protein for a specific bodily function and DNA contains all the genetic material that determine traits such as hair color and eye color.

What does my DNA sequencing say about my future health? Can you predict life expectancy based on my sequenced DNA? These are all important questions, however, the human genome project won’t provide this information. Before we can ascertain this information your personal DNA needs to be sequenced.

There are dozens of companies that provide personal genome sequencing in the US including: Sequencing.com (software applications to analyze the data based on their patent-pending Real-Time Personalization™ technology), The Genographic Project (National Geographic Society and IBM to collect DNA samples to map historical human migration patterns helping to create the direct-to-consumer (DTC) genetic testing industry), The Personal Genome Project (PGP is a long term, large cohort study based at Harvard Medical School which aims to sequence and publicize genomes and medical records), and the list goes on SNPedia, deCODEme.com, Navigenics, Pathway Genomicsanalyzes, 23andMe and others. They all provide personal sequencing of your DNA.

Harvard’s Personal Genome Project

The Personal Genome Project (PGP) is a long term Harvard study which aims to sequence and publicize the complete genomes and medical records of 100,000 volunteers, in order to enable research into personal genomics and personalized medicine. After spending way too much time reading journals and research findings published over the last five years, it gets one quite curious. How could knowing my personal genome sequence improve my health outcomes? Are my days numbered due to a potential future disease? This and many more questions caused a deeper exploration into the Personal Genome Project consent form. While the 24 page-form is amazingly well written, it does proactively disclose several disturbing risks. Allow me to share a few of the more interesting risks when volunteering to participate:

Non-anonymous – your name and personal information is identifiable and available to the global public; read no confidentiality
Cell lines created from your DNA may be genetically modified and/or by mixing human and nonhuman cells in animals
Consent enables the ability to make synthetic DNA and plant it at a crime scene or implicate you and/or a family member in a crime
Accurately or inaccurately reveal the possibility of a disease or propensity for a disease
Whether legal or not, affect the ability for you or a family member to obtain or maintain employment, insurance or financial services
Inability to control access to cell lines

After reading the risk, it doesn’t take long to grasp why adoption hasn’t been prolific over the last decade.

Big Data Meets Health

Is it worth it to have your personal genome sequenced when the volumes of data required to provide deep analysis doesn’t exist today? John D. Halamka, MD, MS is Chief Information Officer of the Beth Israel Deaconess Medical Center, Chief Information Officer and Dean for Technology at Harvard Medical School and weighs in on this question. Dr. Halamka, in a November 2015 interview with athenaHealth said, that based on his personal genome sequencing, he will die of prostate cancer. Dr. Halamka, was also one of the first 10 people to have a personal genome sequence completed, through Harvard’s Personal Genome Project. He mentions that the recommended prostrate testing frequency for men is every 4-5 year, for the population. He argued that for the general population that’s fine, but for himself because of his genome he would be wise to check yearly. This information wouldn’t have been available without personal genome sequencing. The below image provides a good illustration of genome sequencing, combining sample (your personal genome sequencing) with reference material (database of previously genome sequenced individuals) to produce aggregated information that is specific to an individual’s health.

He also provided an intriguing example explaining that when his wife was diagnosed with breast cancer her personal genome was mapped. Her genome was compared to the 10,000 other genomes able at the time, and from this information, they determined the best course of treatment based on her genes, given favorable outcomes of the population samples.

Housing population genomes and disease inventories will consume huge amounts of data. Data available today is already changing patient outcomes. Population genome data and global disease inventories will accelerate amazing advancements in the identification and treatment disease.

Future of Big Data

Bioinformatics is the future of big data. As it becomes easier to write and rewrite in life code, every business on earth will be changed.

“The places that read and write life code are going to become the centers of the global economic system; the greatest single database that humans have ever built.” — Juan Enríquez

Bioinformatics will refine big data and society will eventually reach a tipping point when personal health self-service hits the mainstream, patients will become the ‘CEO of their personal health.’ When will conventional storage be obsolete? How will information security change, when the coding is biological? As the population ages new open source business models will develop, erupting community development. Communities that are not just involved but committed! Passionate communities that have blood in the game, because they are fighting for their life or that of a loved family member.

Is it worth it to have your personal genome sequenced? Yes, it’s your life — it’s worth it.

References

Ball, M. P. (2012). Seeking Diversity (Especially Families) (image). Retrieved November 23, 2015, from http://blog.personalgenomes.org/2012/11/29/seeking-diversity/

bigthink.com. (2010). Learning to Speak “Life Code” | Big Think. Retrieved November 23, 2015, from http://bigthink.com/videos/learning-to-speak-life-code

Bioengineers create rewritable digital data storage in DNA | KurzweilAI. (n.d.). Retrieved November 23, 2015, from http://www.kurzweilai.net/bioengineers-create-rewritable-digital-data-storage-in-dna

Bonnet, J., Subsoontorn, P., Endy, D., Rewritable digital data storage in live cells via engineered control of recombination directionality, Proceedings of the National Academy of Sciences, 2012 DOI: 10.1073/pnas.1202344109

Enriquez, J. (2007). Juan Enriquez: The life code that will reshape the future | TED Talk. Retrieved November 23, 2015, from https://www.ted.com/talks/juan_enriquez_on_genomics_and_our_future/transcript?language=en

Ross, A. (2015). Genome sequencing for just $1000 (online image). Retrieved November 22, 2015, from http://www.geeksnack.com/2015/10/05/genome-sequencing-for-just-1000/

Snyder, M., Du, J., & Gerstein, M. (2010). Personal genome sequencing: current approaches and challenges. Retrieved November 22, 2015, from http://genesdev.cshlp.org/content/24/5/423.full

yourgenome.org. (2015). Personal genomics: the future of healthcare? Retrieved November 23, 2015, from http://www.yourgenome.org/stories/personal-genomics-the-future-of-healthcare

Peter Nichol, empowers organizations to think different for different results. You can follow Peter on Twitter or on his blog. Peter can be reached at pnichol [dot] spamarrest.com.