January 21, 2014. Where are we today with the promise of personal genomics? To great fanfare, Illumina announced last week that the $1000 genome had arrived (1). This milestone ushers in a new future where anyone and everyone will get their genome “done” (e.g. sequenced) and use that information to manage their health, relationships, and daily activities. Or will it? What can we actually do with our genomic information today? Despite the recent breakthroughs in sequencing technology, we have yet to develop the tools and methodologies that connect gene sequences to traits in a way that makes the data useful. Without interpretation, one’s genome sequence is meaningless. In a bid to address this challenge, we recently started Molquant, an algorithm-driven enterprise that builds powerful tools to improve genome interpretation, the next big challenge in personal genomics.

    The Promise of Personal Genomics

    With more than 1300 Genome Wide Association Studies (GWAS), one might imagine that we have a whole trove of gene:trait linkages. But what can we learn if we want to get sequenced today? Not much, really. To promote whole genome sequencing, Illumina hosts personal genome conferences, where you can get your genome sequenced and interpreted for $5000.  At a recent conference, one session provided interpretation for 340 genes affecting 140 traits. The state-of-the-art in genomic interpretation today links less than 2% of the genome to associated trait data -- a little disappointing. Even though an individual can get their genome sequenced, there isn’t much information that ties genes to specific traits.

    Consumer genomics leader, 23andMe, prior to being muzzled by the FDA, had a similar offering, reporting on over 240 “health conditions and traits.” While 23andMe isn’t actually sequencing yet (too expensive still) they assay over 1 million sequence variants or SNPs  and cover most of the reported trait associations, as well as a few of their own researched associations (e.g. SNPs rs4481887 and rs4309013 in the Olfactory Receptor gene cluster determine your ability to smell asparagus pee). 23andMe does a fantastic job organizing and interpreting the available information, it’s just that there isn’t much known. Given the limited interpretability to date, how can we begin to approach this complex puzzle?

    4001 Gene-linked Disorders

    Some clues to where we should be looking can be found in another dataset, the repository of human genetic disorders: Online Mendelian Inheritance in Man (OMIM).  As of December 2013 OMIM tallied 4001 disorders for which the molecular basis (gene or genes) are known. These syndromes are typically extremely rare, and appear to represent the extremes of the natural variation in genes seen in “healthy” individuals. These direct links between gene and trait provide important clues to the functions of many more genes than have been identified in GWAS to date.

    Clues in Faces

    A few illuminating examples lie in the genetics of faces. One’s face is arguably the most clear, albeit complex trait that is highly genetically determined.  We all know how difficult it is to distinguish identical twins; we recognize familial resemblances; and we can often guess someone’s ancestral origins from facial features (increasingly becoming blurred as we all become more mobile, mixing our genes together). Individuals with a particular genetic alteration often exhibit specific facial features (e.g. Down’s syndrome--an entire chromosomal gain--produces a recognizable face).  Although subtle, many genetic disorders produce specific facial traits as a feature of what are often complex phenotypes. Peter Hammond at University College London has developed morphometric tools to characterize facial traits from a number of genetic disorders associated with intellectual disability such as: 22q11 Deletion Syndrome; Noonan syndrome (PTPN11 in 50% ); Smith Magenis Syndrome (RAI1); and Williams Beuren Syndrome (likely GTF2I).

    Source: Hammond et al., Am. J. Hum. Genet. 77:999, 2005

    In the studied examples, Hammond’s tools can use facial features to diagnose the syndromes, demonstrating the role of each of the affected genes in face shape.

    In 2012, two GWAS papers (2,3) found only five genes linked to three subtle changes in the 3D measurement of faces. However, mutations of two of the genes had been previously linked to facial dysmorphic traits, including PAX3, the gene associated with Waardenberg syndrome.  PAX3 in the general population was associated with the width of the nose bridge; Waardenberg syndrome PAX3 mutations cause a distinctive “wide nasal bridge” phenotype.  (the other, PBRM16 was previously identified as causal in a cleft palate associated syndrome), again suggesting that the genetic syndromes may represent extremes of normal variation.

    These clues from human disorders suggest two things: 1) The genome likely contains a wealth of extractable data linking gene variation to a wide variety of human traits and health conditions; 2) A comprehensive gene::trait catalog from the monogenic disorders may provide an important tool to aid in interpreting genomic data in the broader population.

    Follow @molquant to receive our news and updates.

    1 Matthew Herper, “The $1000 Genome Arrives -- For Real This TimeForbes, January 14, 2014
    2 Paternoster et al., “Genome-wide Association Study of Three-Dimensional Facial Morphology Identifies a Variant in PAX3” American Journal of Human Genetics, 2012 March 9; 90(3): 478–485
    3 Liu et al.. (2012) “A Genome-Wide Association Study Identifies Five Loci Influencing Facial Morphology in Europeans”. PLoS Genet 8(9): e1002932, 2012

    TOP OF PAGE: A sampling of various genome visualizations (non-Molquant) Source: Google image search



    Human Ras CAAX Endoprotease (RCE1), initially thought to be essential for Ras function, assembled from EST fragments by homology to yeast Rce1p, 1999.

    For many of us, the field of genomics was born in 1992 when Craig Venter published a landmark paper reporting an astounding 2,500 previously unknown human genes, from the analysis of 871,000 base pairs of sequence (1). (This paper was actually a follow-up to their earlier Science publication describing Expressed Sequence Tags (ESTs) (2) and demonstrated that the approach was robust and scalable.) At the time, I was working at Amgen and these papers inspired our head of research Dan Vapnek to purchase the entire inventory of ABI (Applied Biosystems, Inc.) sequencers, a total of sixteen machines. We spent the next decade discovering and characterizing new genes. Colleagues at Genetics Institute and Genentech were all doing the same. ESTs also jump-started new biotech companies such as Human Genome Sciences (est. 1992) and Millennium (est. 1993).

    I was interested in cancer and spent endless hours searching the ever-expanding sequence databases looking for new genes linked to cancer biology. Among the genes my lab cloned:  BCL2 family members, Ras processing enzymes, cell cycle genes, and most exciting - the gene for the catalytic subunit of telomerase, the enzyme that enabled immortal growth of tumor cells. This decade of advances by the entire field was summarized in a seminal review in Cell by Doug Hanahan and Bob Weinberg entitled “Hallmarks of Cancer (2000).” In retrospect, the 1990’s were the decade of discovering new cancer genes that led to the current crop of approved targeted therapies for cancer.  

    One Genome Is Not Enough

    In 2001, the Human Genome Project, the sequencing of the entire genome of one human, was completed. Although this was an exciting milestone, one genome doesn’t give us any insight into variation between individuals. If there’s only one genome, why are we all so different? It’s the genetic variation that is key for connecting the relatively subtle differences in DNA among all of us to the observed variation in traits, health, appearance, physiology, etc. (NOTE: There is plenty of room for environment; myriad examples demonstrate that traits are a combination of genes and environment, and in areas such as behavior, the extent to which genes contribute remains controversial.  Yet, examples from the monogenic (single gene) human disorders show that a single change in a single gene can have a dramatic impact on a wide range of traits, including behavior).  So then, the real business of linking genes to traits needs many genomes.

    New Technology Drives a New Field

    Recent advances in sequencing technology have led to remarkable reductions in cost and time to sequence a genome, prompting an explosion in the number of genomes/exosomes/RNAseq transcriptomes produced and the amount of genomic information available.

    L: Plotted relative to Moore’s law, sequencing costs have plummeted, driving an explosion in the amount of genomic data available for analysis. Source: R: Genome Wide Association Studies have increased dramatically as genomics data generation becomes feasible.

    The generation and availability of  genomic data has spurred a subsequent explosion in Genome Wide Association Studies (GWAS). Because we all have essentially the same 21,000 or so genes, researchers are interested in the variation among our genes. These natural inherited variations are found in the DNA itself, called Single Nucleotide Polymorphisms (SNPs). So it should be simple: sequence 1,000 people, sort them by a trait, say height, then find the SNPs that are common to the tall people, common to the shorter people.
    Since 2005, over 1300 independent GWAS studies have been conducted in which researchers attempt to correlate one or more traits (e.g. height, weight, disease risk, physiology, behavior) with the genes associated with each.  An excellent interactive diagram on the website for the National Human Genome Research Institute lists more than 300 SNP associations for 18 different trait classes.
    Sequencing the first human genome cost about $1 billion and took 13 years. Today a human genome can be sequenced for $1000 in 1-2 days. GWAS analysis of many human genomes connects gene variation/SNPs to any studied trait. To date, results from GWAS have had limited success in identifying the large majority of the known heritability of traits. This unexpected challenge now has a name, the “missing heritability” (more on that in a future post).  One goal of the Molquant toolkit is to find some of the missing heritability by defining mathematically-robust gene networks.
    Follow @molquant on Twitter to receive our news and updates.

    1 Adams et al., “Sequence Identification of 2,375 Human Brain Genes” Nature, 355:632, 1992
    2 Adams et al., “Complementary DNA sequencing: expressed sequence tags and human genome project” Science, 252:1561, 1991