January 25, 2016. We recently applied our networking and visualization tools to explore the biological underpinnings of genes linked to Intellectual Disability (ID), discovering subsets comprising molecularly related classes of ID genes. Several of these mathematically derived classes fit with mechanisms known to play a role in ID such as metabolism, neuronal development, mitochondria, epigenetic regulation, and cohesin. This framework enabled the biological annotation of a number of uncharacterized ID genes such as RAI1 (Smith Magenis Syndrome, SMS), which exhibits tight linkage to epigenetic regulation networks.
The largest subset, comprising approximately 16% of ID genes, exhibited strong linkage to the biology of histone modification (epigenetic regulation), a subset of cohesin genes, and the core regulators of circadian rhythm. Recent work reported that small molecule HDAC inhibitors could reverse some of the deleterious phenotypes in the Histone MLL2/KMT2D heterozygous knock out mouse model of Kabuki syndrome (1). This provides a rationale for exploring small molecule inhibitors of HDAC or Bromodomain proteins for their ability to modulate phenotypes resulting from genetic defects in ID histone modification linked genes such as RAI1.
Genes linked to intellectual disability could be mathematically grouped into five biologically distinct subsets (Epigenetic Regulation, Mitochondrial function, Metabolism, Proliferation and Neuronal Function)
Gene networks linked to Epigenetic Regulation, the enzymatic modification of chromatin to affect gene transcripton, comprised about 16% (63/380) of all ID genes examined.
RAI1, the gene responsible for Smith Magenis Syndrome exhibited tight linkage to epigenetic regulation, suggesting that this PHD domain containing protein plays a role in regulation of gene expression through chromatin modulation.
RAI1 and other epigenetic regulation linked genes also exhibited linkage to genes that control the circadian rhythm oscillator. Several of these syndromes exhibit sleep disturbances (CDKL5, Williams syndrome linked GTF2I, MBD5 and RAI1), supporting hypotheses regarding a causal relationship between syndromic gene dysfunction and control of circadian rhythm.
Intellectual disability encompasses a wide range of human phenotypes that impact the development of intellect, language, social and motor skills (2). Broadly defined, Intellectually disability has an estimated prevalence in Western populations of ~2%, with both environmental and genetic factors contributing to the condition.
Using a large set of transcriptome data, we conducted comprehensive gene expression correlation analysis to create mathematically based correlation networks for about 20,000 transcribed genes. Many tightly linked genes were highly enriched in genes whose biological function/properties were known (see earlier Molquant work). Figure 1 shows 36 such annotated networks, providing a biological framework for analyses.
Relationships among biologically curated gene networks
Because the network tools rely only on mathematical correlation as opposed to knowledge based linkage, any gene can be placed into its relevant network. This enables functional annotation of unknown or poorly characterized genes based on their expression correlation to sets of known genes.
To apply this analytical framework to ID, we first collated 380 genes, each with a reported relationship to ID, from publications, reviews and meeting reports (Table 1). Cross gene correlation analysis, clustering and plotting the genes revealed a complex set of networks among the 380 genes (figure 2). Among the network clusters, five predominant clusters were observed, which together comprised approximately half of the ID genes using a network pearson correlation (mean of the 10 plotted network genes) of >0.4. These mathematically derived clusters recapitulate several biological processes known to be associated with intellectual disability ((3)) The biological networks plotted in figure 1 were clustered together with the ID genes (figure 2), enabling annotation of the clusters.
Network relationships of 380 Genes linked to Intellectual Disability
The largest of the mathematically determined clusters contained predominantly genes known to be involved in epigenetic regulation, including ID genes CREBBP, ARID1A, ARID1B, MLL, MLL2, KDM5C, KDM6A, KDM6B, EHMT1, EP300, BRWD3, ATRX , MYST3, MYST4, and a number of others. Also present in the cluster were several ID genes with limited biological characterization, including PRR12, PTPN23 and RAI1. Using a “guilt by association” rationale, we hypothesize that these genes are also involved in epigenetic regulation.
Focusing on RAI1, we then generated a network based on RAI1 itself . The top 50 genome-wide expression correlates of RAI1 are clustered in figure 3. Note that about 1/3 of the top genes in the RAI1 network are also ID genes (yellow), and that most of the genes in the network are linked to epigenetic regulation, further supporting the functional assignment of RAI1.
RAI1 anchored network plotted with biological networks (green) shows tight correlation to epigenetic regulation ID genes. Known ID genes in yellow, RAI1 in Red.