Personalization of Big Data Analytics: Personal Genome Sequencing

“The greatest single generator of wealth over the last 40 years, in the digital revolution, has been a transition from ABC’s to 1’s and 0’s,” says author Juan Enríquez. What will be the single biggest driver of change tomorrow? Enríquez believes the single biggest driver of change, of growth, of industries in the future is the transition to writing in life code. The field known as bioinformatics will be the technological transformation for big data. Today, technology is merely playing on the fringe of its real potential.

What if we could program a cell to make stuff we want? To store data, we want stored.

Big data analytics when applied to personal genome sequencing, the combination of digital code and life code, represents the greatest social driver of change since the invention of the computer; the programmable cell (Enriquez, 2007).

Big Data Analytics Applied

Behavior analytics, biometrics, distributive processing, Hadoop, Hana, HBase, and Hive have all risen in frequency in big data discussions. Big data is a term describing the large volume of data – both structured and unstructured.  It also implies that conventional databases can’t handle the processing and analytics to ensure the data is retrievable and usable information recorded on a persistent medium. In short this data is recorded, stored, processed, and used to make better decisions. Due to the magnitude of data standard processing, often is ineffective.

Typically, big data is used for decisions around business operations, supply chain management, vendor metrics or executive dashboards. When the data involved is enormous big data can be applied for better business outcomes, here are three examples:

  1. UPS – uses sensor data and big-data analytics on delivery vehicles to monitor speed, miles per gallon, number of stops and engine health. The sensors collect over 200 data points for each a fleet of 80,000 vehicles daily.
  2. Macy’s – before every sale prices are adjusted in practically real time for their 73 million items on sale. Using SAS © the reduced analytics cost by $500k.
  3. Virgin Atlantic – leveraging multiple-internet –connected parts huge volumes of data are produced. For example, each connected flight can generate more than ½ a terabyte of data. The goal is to improve flight and fuel efficiency and use predictive analytics to fly safer planes leveraging predictive maintenance.

Each are helpful in their own right, but they are several degrees away from personally affecting your health. We’re simply not committed to understanding how to leverage these process, tools and techniques into our business. Do you know of the big data stories like King’s Hawaiian or BC Hydro? Why not? The answer is they are not personalized. So let’s make it personal and explore big data analytics applied to personal genome sequencing. This is about how big data can impact your personal health and your lifespan. Interested?

Quest to Map the Human Genome

The Human Genome Project, officially started in 1990 coordinated by the U.S Department of Energy (DOE) and National Institutes of health (NIH, an agency of the United States Department of Health and Human Services), in a 13-year initiative. The goal of the project was to determine the sequence of chemical base pairs that make up human DNA, and identify and map all of the genes of the human genome from both a physical and functional standpoint. The initial lofty goal wasn’t fully realized, but they came very close when the project closed in 2003.  This project still remains the world’s largest collaborative biological project, at a cost of $3 billion; the project sequenced 90% of the human genome (only euchromatic regions specifically). Shortly after this project personal genome sequencing gained rapid interest globally. Personal genomics is the branch of genomics concerned with the sequencing and analysis of the genome of an individual. Direct-to-consumer genome sequencing has been slow to gain adoption, despite a number of firms providing this service globally: Australia (Lumigenix), Beliguim (Gentle Labs), China (IDNA.com, Mygen23), Finland (Geenitesti), Ireland (Geneplanet), and UK (Genotek) among others.

According to Snyder, Du and Gerstein in their 2010 paper ‘Personal genome sequencing: current approaches and challenges,’ “the immediate applications of the information derived from a human genome sequence, both on the individual and societal levels, are somewhat limited at present…the promise of personal genomics lies in the future, as we amass a database of personal genomes.” Today the database of genomes doesn’t exist to provide deep analytics.  Similarly, the large database of diseases doesn’t exist to map genomes against. Both databases, the genome and the disease, are required in order to draw correlations to probable individualistic health outcomes. There are about 30,000 diseases known to man, with about 1/3 containing effective treatments. The Center for Disease Control and Prevention sited that about 7,000 of those diseases are rare with about 20 new rare diseases being identified every month.  Connecting diseases and genetic variations have proven to be incredibly elusive and complicated despite the positive advancements made under the human genome project.

Everyone has access the latest advancements in medicine. All diseases are screenable.

Why aren’t these two statements true today? By applying big data and big data analytics, these two statements could be true. Steve jobs believed. “Steve Jobs, co-founder of Apple Inc., was one of the first 20 people in the world to have his DNA sequenced, for which he paid $100,000. He also had the DNA of his cancer sequenced, in the hope, it would provide information about more appropriate treatments for him and for other people with the same cancer (yourgenome.org, 2015).” Similar to his journey with Apple Computer, Jobs was ahead of his time with genomics. Let’s venture on. 

Writing and Rewriting DNA

Author Juan Enríquez, is a futurist and a visionary in the space of bioinformatics with some intriguing ideas.  The global pace of data generation is staggering and will continue to continue along its exponential growth curve to 1.8 zettabytes (a zettabyte is a trillion gigabytes; that’s a 1 with 21 zeros trailing behind it). Where will all this data and information reside? In bigger and bigger computers? No. The answer lies in bioinformatics: in programmable bacteria cells. Bacteria is already being designed to clean toxic waste and emit light. These cells are programmable like a computer chip and changing how things we want are made and removing the boundaries of where things are made; Exxon is attempting to program algae to generate gasoline, BP is working to extract gas from coal, and Novartis is rapid-prototyping vaccines (Bonnet, Subsoontorn, Endy, 2012). What’s next? Creating a cell to generate energy or produce plastics? It’s a fascinating space to explore.  Think about the ideal storage eco-system. Juan Enríquez, elaborates and said that an ounce of DNA can theoretically hold 300,000 terabytes of data and survive intact for more than 1,000,000 years.

“Anything you can store in a computer you can store in bacteria” — Juan Enríquez

This software makes its own hardware and operates on a nanoscale. The future of big data is a world where computers are designed, that could float on a speak of dust — as powerful as a laptop today — life is so efficient it can copy itself (reproduce) and make billions of copies. The global benefits of bioinformatics applied to healthcare outcomes will be incredible. Understanding the human genome and personal genome sequencing are the keys to unlocking this mystery (Enriquez, 2007).

Improving Your Health

Each individual has a unique ‘genome’ and ‘mapping the human genome’ involves sequencing multiple variations of each gene. Genes are stretches of DNA that code a protein for a specific bodily function and DNA contains all the genetic material that determine traits such as hair color and eye color.

What does my DNA sequencing say about my future health? Can you predict life expectancy based on my sequenced DNA? These are all important questions, however, the human genome project won’t provide this information. Before we can ascertain this information your personal DNA needs to be sequenced.

There are dozens of companies that provide personal genome sequencing in the US including: Sequencing.com (software applications to analyze the data based on their patent-pending Real-Time Personalization™ technology),  The Genographic Project (National Geographic Society and IBM to collect DNA samples to map historical human migration patterns helping to create the direct-to-consumer (DTC) genetic testing industry), The Personal Genome Project (PGP is a long term, large cohort study based at Harvard Medical School which aims to sequence and publicize genomes and medical records), and the list goes on SNPedia, deCODEme.com, Navigenics, Pathway Genomicsanalyzes, 23andMe and others. They all provide personal sequencing of your DNA.

Harvard’s Personal Genome Project

The Personal Genome Project (PGP) is a long term Harvard study which aims to sequence and publicize the complete genomes and medical records of 100,000 volunteers, in order to enable research into personal genomics and personalized medicine. After spending way too much time reading journals and research findings published over the last five years, it gets one quite curious. How could knowing my personal genome sequence improve my health outcomes? Are my days numbered due to a potential future disease? This and many more questions caused a deeper exploration into the Personal Genome Project consent form. While the 24 page-form is amazingly well written, it does proactively disclose several disturbing risks.  Allow me to share a few of the more interesting risks when volunteering to participate:

  1. Non-anonymous – your name and personal information is identifiable and available to the global public; read no confidentiality
  2. Cell lines created from your DNA may be genetically modified and/or by mixing human and nonhuman cells in animals
  3. Consent enables the ability to make synthetic DNA and plant it at a crime scene or implicate you and/or a family member in a crime
  4. Accurately or inaccurately reveal the possibility of a disease or propensity for a disease
  5. Whether legal or not, affect the ability for you or a family member to obtain or maintain employment, insurance or financial services
  6. Inability to control access to cell lines

After reading the risk, it doesn’t take long to grasp why adoption hasn’t been prolific over the last decade.

Big Data Meets Health

Is it worth it to have your personal genome sequenced when the volumes of data required to provide deep analysis doesn’t exist today? John D. Halamka, MD, MS is Chief Information Officer of the Beth Israel Deaconess Medical Center, Chief Information Officer and Dean for Technology at Harvard Medical School and weighs in on this question.  Dr. Halamka, in a November 2015 interview with athenaHealth said, that based on his personal genome sequencing, he will die of prostate cancer.  Dr. Halamka, was also one of the first 10 people to have a personal genome sequence completed, through Harvard’s Personal Genome Project. He mentions that the recommended prostrate testing frequency for men is every 4-5 year, for the population.  He argued that for the general population that’s fine, but for himself because of his genome he would be wise to check yearly. This information wouldn’t have been available without personal genome sequencing. The below image provides a good illustration of genome sequencing, combining sample (your personal genome sequencing) with reference material (database of previously genome sequenced individuals) to produce aggregated information that is specific to an individual’s health.

He also provided an intriguing example explaining that when his wife was diagnosed with breast cancer her personal genome was mapped.  Her genome was compared to the 10,000 other genomes able at the time, and from this information, they determined the best course of treatment based on her genes, given favorable outcomes of the population samples.

Housing population genomes and disease inventories will consume huge amounts of data. Data available today is already changing patient outcomes. Population genome data and global disease inventories will accelerate amazing advancements in the identification and treatment disease.

Future of Big Data

Bioinformatics is the future of big data. As it becomes easier to write and rewrite in life code, every business on earth will be changed.

“The places that read and write life code are going to become the centers of the global economic system; the greatest single database that humans have ever built.” — Juan Enríquez

Bioinformatics will refine big data and society will eventually reach a tipping point when personal health self-service hits the mainstream, patients will become the ‘CEO of their personal health.’ When will conventional storage be obsolete? How will information security change, when the coding is biological? As the population ages new open source business models will develop, erupting community development.  Communities that are not just involved but committed! Passionate communities that have blood in the game, because they are fighting for their life or that of a loved family member.

Is it worth it to have your personal genome sequenced? Yes, it’s your life — it’s worth it.

References

Ball, M. P. (2012). Seeking Diversity (Especially Families) (image). Retrieved November 23, 2015, from http://blog.personalgenomes.org/2012/11/29/seeking-diversity/

bigthink.com. (2010). Learning to Speak “Life Code” | Big Think. Retrieved November 23, 2015, from http://bigthink.com/videos/learning-to-speak-life-code

Bioengineers create rewritable digital data storage in DNA | KurzweilAI. (n.d.). Retrieved November 23, 2015, from http://www.kurzweilai.net/bioengineers-create-rewritable-digital-data-storage-in-dna

Bonnet, J., Subsoontorn, P., Endy, D., Rewritable digital data storage in live cells via engineered control of recombination directionality, Proceedings of the National Academy of Sciences, 2012 DOI: 10.1073/pnas.1202344109

Enriquez, J. (2007). Juan Enriquez: The life code that will reshape the future | TED Talk. Retrieved November 23, 2015, from https://www.ted.com/talks/juan_enriquez_on_genomics_and_our_future/transcript?language=en

Ross, A. (2015). Genome sequencing for just $1000 (online image). Retrieved November 22, 2015, from http://www.geeksnack.com/2015/10/05/genome-sequencing-for-just-1000/

Snyder, M., Du, J., & Gerstein, M. (2010). Personal genome sequencing: current approaches and challenges. Retrieved November 22, 2015, from http://genesdev.cshlp.org/content/24/5/423.full

yourgenome.org. (2015). Personal genomics: the future of healthcare? Retrieved November 23, 2015, from http://www.yourgenome.org/stories/personal-genomics-the-future-of-healthcare

Peter Nichol, empowers organizations to think different for different results. You can follow Peter on Twitter or on his blog. Peter can be reached at pnichol [dot] spamarrest.com.

Previous articleComputerworld Names Peter Nichol a “2016 Premier 100 Technology Leader
Next articleHealthcare’s Two Biggest Problems Impacting CIO Performance
Peter is a technology executive with over 20 years of experience, dedicated to driving innovation, digital transformation, leadership, and data in business. He helps organizations connect strategy to execution to maximize company performance. He has been recognized for Digital Innovation by CIO 100, MIT Sloan, Computerworld, and the Project Management Institute. As Managing Director at OROCA Innovations, Peter leads the CXO advisory services practice, driving digital strategies. Peter was honored as an MIT Sloan CIO Leadership Award Finalist in 2015 and is a regular contributor to CIO.com on innovation. Peter has led businesses through complex changes, including the adoption of data-first approaches for portfolio management, lean six sigma for operational excellence, departmental transformations, process improvements, maximizing team performance, designing new IT operating models, digitizing platforms, leading large-scale mission-critical technology deployments, product management, agile methodologies, and building high-performance teams. As Chief Information Officer, Peter was responsible for Connecticut’s Health Insurance Exchange’s (HIX) industry-leading digital platform transforming consumerism and retail-oriented services for the health insurance industry. Peter championed the Connecticut marketplace digital implementation with a transformational cloud-based SaaS platform and mobile application recognized as a 2014 PMI Project of the Year Award finalist, CIO 100, and awards for best digital services, API, and platform. He also received a lifetime achievement award for leadership and digital transformation, honored as a 2016 Computerworld Premier 100 IT Leader. Peter is the author of Learning Intelligence: Expand Thinking. Absorb Alternative. Unlock Possibilities (2017), which Marshall Goldsmith, author of the New York Times No. 1 bestseller Triggers, calls "a must-read for any leader wanting to compete in the innovation-powered landscape of today." Peter also authored The Power of Blockchain for Healthcare: How Blockchain Will Ignite The Future of Healthcare (2017), the first book to explore the vast opportunities for blockchain to transform the patient experience. Peter has a B.S. in C.I.S from Bentley University and an MBA from Quinnipiac University, where he graduated Summa Cum Laude. He earned his PMP® in 2001 and is a certified Six Sigma Master Black Belt, Masters in Business Relationship Management (MBRM) and Certified Scrum Master. As a Commercial Rated Aviation Pilot and Master Scuba Diver, Peter understands first hand, how to anticipate change and lead boldly.