Poster abstracts

Poster number 26 submitted by James Li

Global Local Folding of the Human Transcriptome

James Li (The Institute for Genomic Medicine at Nationwide Childrens Hospital and The Ohio State University Department of Pediatrics), Jeffrey Gaither, Grant Lammi, David Gordon, Harkness Kuck, Benjamin Kelly, James Fitch, Peter White (The Institute for Genomic Medicine at Nationwide Childrens Hospital and The Ohio State University Department of Pediatrics)

Abstract:
Analyzing sequence variants for disease has largely relied upon the predicted effects missense mutations have on protein function. Previous in silico RNA folding studies suggest that selection in humans and mammals may have been influenced by mRNA secondary structure in certain genes. However, the connection between RNA folding and genetic diseases has yet to be established at the level of an entire transcriptome. Therefore, we performed whole transcriptome analysis to ascertain the effects of single nucleotide polymorphisms (SNPs) on local RNA folding. We aimed to (1) build a cloud-based big data pipeline to procure RNA folding statistics for every possible SNP in the known human transcriptome (~0.5 billion variants), (2) utilize population allele frequencies from 138,632 patients and mammalian conservation scores to determine if there was constraint on SNPs causing large RNA disruptions, thereby supporting our hypothesis that RNA stability/structure may play a role in disease and (3) develop a tool and composite score to analyze patient genomes for highly disruptive SNPs. For every position in all known RefSeq mRNA transcript sequences, we generated 101 nucleotide flanking sequences corresponding to the reference allele and the three possible alternate alleles. Next, we used the ViennaRNA Package to obtain 10 RNA folding disruption metrics for each possible variant (445,740,246 total SNPs). For each of the 10 RNA folding metrics we sorted the SNPs and then divided them into ten equally sized bins. Metric bins with higher RNA disruption values had a lower proportion of SNPs with non-zero allele frequencies compared to bins with lower RNA disruption values. Similarly, median and mean GERP++ scores were greater for higher disruption bins. The correlation of increased RNA disruption values with constrained allele frequencies and GERP++ scores at the level of the whole human transcriptome, suggests that RNA folding plays an important role in human health and disease.

Keywords: sequence variant analysis, vienna, population constraint