August 10, 2014. We recently used our analysis tools to examine the genetics and genomics of cardiomyopathies and uncovered a new candidate pathway for common variants associated with cardiovascular phenotypes.
Network analysis of sixteen genes associated with Hypertrophic Cardiomyopathy (HCM) and thirty one genes associated with cardiovascular disease phenotypes in Genome Wide Association Studies (GWAS) identified three major classes of cardiovascular disease-associated genes. Consistent with well established biology, one major group comprised sarcomere genes, however, the networks further grouped the known HCM genes into subsets by muscle type. Networking generated a list of 250 candidate sarcomere associated genes for use in exploring sequence variants underlying HCM. Analysis of gene lists from three GWA studies identified two additional classes, one comprising genes associated with endothelium/smooth muscle, and a more speculative brain expressed group. Many candidate genes from GWAS loci did not fall into one of the three network classes. These analyses demonstrate the utility of network analysis in prioritization and interpretation of GWAS data, provide candidate genes for assessing sequence variants in HCM, and identifies candidate biological processes linked to cardiovascular risk in the general population.
Although recently surpassed by cancer as the leading cause of death in the US, Heart disease remains one of the major health issues faced by modern society (1).
Decades of research have provided a good understanding of causes and mechanisms of cardiovascular disease, resulting in effective lifestyle recommendations and pharmacologic agents to reduce risks, especially regarding management of LDL and triglyceride levels.
While environmental contributors are well established, susceptibility to cardiovascular disease also exhibits a strong genetic component. A better understanding of genes and pathways linked to cardiovascular phenotypes may provide new targets for intervention. The recent flood of genomics data provides a great opportunity to develop new insight.
Examples of the power of genomic information in cardiovascular drug discovery include PCSK9 (2, 3), first identified as a target in 2003 through positional cloning of familial cholesterol phenotypes (4, 5). ANGPTL3, another genomically validated target (6), awaits the development of targeted inhibitors; currently pursued by antisense and RNAi companies, but perhaps best targeted with an antibody as anticipated by this Regeneron patent application (7). Recent functional analysis of 23 GWA studies on blood lipid levels may provide additional drug targets (8).
In addition to lipid homeostasis, the genetics of other biological systems also impact cardiovascular disease. Multisystem genetic disorders can often exhibit cardiomyopathies, such as some of the lysosomal disorders (LAMP2, PRKAG2, GLA/Fabry) (9), and Rasopathies, Ras/MAPK gene defects such as Noonan Syndrome (10). Genetic Mitochondrial disorders also commonly present with cardiomyopathy (11).
The most common cardiovascular genetic disorder, Hypertrophic Cardiomyopathy (HCM) affects approximately 1:500 (0.2%) of the population, and typically results from dominant mutations in the structural components of contractile heart muscle (12, 13). Symptoms in affected individuals are varied, many show no overt pathology. This is the disorder responsible for the sudden death of young athletes during or just after vigorous exercise.
Muscle type-specific gene networks
Drawing from the recently expanded GTEx human tissue data set (www.gtexportal.org) we added a new set of networks generated using gene expression based correlation, this round focusing on organ type and tissue type specific networks. Figure 1a shows a bathymetry plot of 12 networks, represented by 10 genes each, including: smooth muscle, skeletal muscle, cardiac muscle, brain, lung, pancreas, adipose tissue, liver, ribosomes, mitochondria, proliferation, endothelium and lysosomal biogenesis (seeded by TFEB). Most networks exhibit little cross network linkage. To confirm tissue specific expression for the muscle networks, figure 1B shows GTEX portal generated expression plots for the seed genes used in the cardiac (NKX2-5), skeletal (CACNA1S) and smooth muscle (MYH11) networks, demonstrating the extent of tissue specific expression of the seed genes in the GTEx RNAseq data.
Figure 1A Biological networks bathymetry plot
Figure 1B Muscle subtype network seed gene expression in GTEx dataset
Hypertrophic Cardiomyopathy gene networks
Sixteen genes linked to HCM were selected from a recent review (14) and used as seeds for network generation. Figure 2 shows a plot of networks generated using each gene as the seed, as well as networks for smooth, cardiac and skeletal muscle. Consistent with the common sarcomere biology attributed to these genes, most networks exhibited considerable cross network linkage. However, clear subsets emerged, where networks grouped based on what appears to be relative expression among the distinct muscle subtypes. MYL2, MYH7, MYL3, MYOZ, MYH6, TCAP, TNNC1, ACTN2 and CSRP3 formed a group with dominant linkage to skeletal muscle, but significant cardiac muscle linkage as well. ACTC1, NEXN, TPM1 and PLN formed a tight group characterized by strong linkage to smooth muscle. MYBPC3, TNNT2 and TNNI3 formed a third tight group characterized by selective linkage to cardiac muscle. We interpret these results to represent the normal, not pathologic networks for these genes as it is likely that most of the samples are from individuals without HCM associated gene mutations.
Figure 2 Hypertrophic Cardiomyopathy gene networks form distinct subgroups linked to muscle type
Sarcomere biology gene network
There are 190 genes plotted in figure 2, each of which is likely associated with sarcomere biology. Collating the top 50 network genes for each of the 19 seed genes in figure 2 resulted in a list of 250 unique genes. A PDF of the gene list is available here. We believe this represents a comprehensive list of sarcomere associated genes that includes known players, uncharacterized genes, and others with no reported link to muscle biology. Such a list may support the interpretation of sequence variant analysis, considering the significant challenges in distinguishing between incidental and pathogenic variants (15). Five additional HCM associated genes that were not included in the network analysis, JPH2, LDB3, MYLK2, TTN and VCL are all present on the mathematically derived 250 gene list.
Cardiovascular GWAS genes link to smooth muscle/endothelial and possible neuronal network
In addition to the relatively rare genetic cardiomyopathies, genetic predisposition to cardiovascular disease across the general population is expected to be both common and multigenic. Several large GWAS have been conducted linking a number of cardiovascular phenotypes to genetic loci. The EchoGen consortium recently published 16 loci that exhibited significant association with at least one of five echocardiographic traits, including ventricular hypertrophy (16). Five loci reached significance in the second phase. To further explore all 16 loci for potential biological pathway links we first re-examined the candidate genes identified at each locus then generated networks for 18 genes associated with the loci.
Most genes associated with the loci were either within the gene or the candidate gene was the only gene in the region. For the 17p13 locus, each of the top three listed SNPs were most closely associated with SMG6, HN1L or SRR respectively, however, two of the implicated SNPs in 17p13 are correlated suggesting their may only one or two relevant genes. We included all three genes in the networks. Because PLN is a known gene in HCM, we didn’t pursue the other two candidates in the 6q22 locus (SLC35F1, C6ORF204/CEP85L). SLC35F1 is an uncharacterized brain specific channel whose close network members are involved in neurogenesis (data not shown), CEP85L is more challenging to interpret, exhibiting broad expression with testis and brain highest; close network members involved in DNA damage response (ATM, TTBK2, TAOK1) and ataxias (ATM, SACS, QKI). At the 5q23 locus (CCDC100/CEP120), the more recent genomic build 38 now positions the the top listed SNP adjacent to PRDM6 so we included it as well.
Figure 3 shows networks and linkages for the 18 selected genes and several relevant biological networks. While half of the genes exhibit only modest network linkages, the other half formed two distinct biologically related groups. PRDM6, PLN, PALMD, MEIS2 and PDE3A formed one meta-network that linked to both endothelium and smooth muscle networks. PRDM6, a smooth muscle-specific transcriptional repressor (17 ) and PDE3A are both known to be involved in smooth muscle biology, and PLN, a HCM linked gene is a member of the smooth muscle sub-network in figure 2 above. This enables a hypothesis that a subset of the genes associated with cardiac echocardiographic phenotypes function in smooth muscle/endothelial cell biology, and further that these genes are the relevant affected genes in their respective loci. While PRDM6, PLN, PDE3A loci replicated in the study, PALMD (meta-analysis p value of 1.1 x 10-7) did not and the MEIS2 SNP is a rare allele. This network supports continued exploration of PALMD and MEIS2 as well as PRDM6, PLN and PDE2A in future work.
Figure 3 EchoGen GWAS candidate gene networks form subgroups that link to smooth muscle/endothelium and brain expressed networks
A second meta-network shown in figure 3 comprises NOVA1, SRR, GRID1 and WWOX which exhibit neuronal expression and exhibit modest linkage to a brain specific network. However, only the SRR locus replicated in the study. In addition, many genes exhibit brain enriched expression patterns, so in the absence of other strong support caution is warranted.
Endothelial/Smooth Muscle network seen in a second GWAS gene set
An unpublished study on Kaiser/UCSF GERA cohort was recently presented (18). Ten genes from loci exhibiting significant association with Left Ventricular Hypertrophy were reported. To further explore potential biological networks associated with cardiomyopathy phenotypes, we generated network plots for the reported genes: CCDC141, CRIM1, CTNNA1, HAND1, NFIA, PROCR, SIPA1L1, TBX3, VGLL2, ZNF595 (figure 4). One gene, VGLL2 exhibited good network linkage to skeletal muscle and was present in the sarcomere gene list generated above, aligning most closely with the skeletal dominant subgroup of HCM genes. A known muscle differentiation gene, the association with hypertrophy suggests a role for VGLL2 in proper expression of muscle specific genes. Of the other listed genes, CRIM1, CTNNA1 TBX3, NFIA, PROCR and SIPA1L1 formed a meta-network that also linked to endothelium and smooth muscle networks. This second linkage of GWAS identified genes with endothelium and smooth muscle cell networks further supports these networks as relevant in cardiovascular disease.
Figure 4 GERA GWAS gene candidate networks form endothelium/smooth muscle linked subgroup
Sudden Cardiac Death Susceptibility locus 2q24.2 gene TANC1 is also a member of the endothelial/smooth muscle network
A two stage GWAS for sudden cardiac death employed 4,400 cases and more than 30,000 controls to identify a single locus with candidate genes BAZ2B, TANC1 and WDSUB1 (22). BAZ2B is widely expressed (highest in cerebellum, ovary, testis) exhibiting a strong network association with chromatin modulators (data not shown). WDSUB1, also widely expressed (highest in pituitary, ovary and adrenal gland) exhibits modest network associations to intracellular trafficking (data not shown). TANC1 exhibits highest expression in skin, cervix, lung and artery, and exhibits strong network association with the GWAS candidate network identified above as well as endothelial/smooth muscle networks (figure 5).
Figure 5 TANC1 network joins GERA and EchoGen Endothelial/smooth muscle gene candidates