Poster abstracts
Poster number 15 submitted by Regina Edgington
Long-read transcriptomics reveals sample-specific protein variants missed by reference proteomics
Regina M. Edgington (Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio, USA), Jacob W. Smith (The Ohio State Biochemistry Program, The Ohio State University, Columbus, Ohio, USA), Vladislav Belyy (Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio, USA; Center for RNA Biology, The Ohio State University, Columbus, Ohio, USA), Damien B. Wilburn (Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio, USA; Center for RNA Biology, The Ohio State University, Columbus, Ohio, USA)
Abstract:
The functional complexity of the cellular proteome is driven by genetic variation and further expanded through splicing, translation, and modification, giving rise to diverse protein variants. In mass spectrometry-based proteomics, the peptide search space is typically restricted to the reference proteome which can overlook individual genetic and transcriptional features essential to biological function and disease. Here we present an integrated transcriptomic–proteomic workflow with long-read sequencing that enables the quantification of novel proteoforms for any tissue, cell type, or species.
Matched transcriptomics and proteomics data were collected from multiple human monoclonal cell lines that include individual gene deletions within the unfolded protein response pathway. Long-read transcriptome sequencing was performed on an Oxford Nanopore P2 Solo with barcode multiplexing, and proteomics was performed by DIA-MS on a Thermo Fisher Eclipse mass spectrometer. To enable streamlined transcriptome–proteome analysis, we developed a computational pipeline that performs basecalling, isoform grouping, read-based quantification, generation of sample-specific spectral libraries, and streamlined DIA database searching.
Applied to wild-type and IRE1-knockout U2OS cells, the pipeline identifies 3,546 novel precursors absent from the RefSeq database (5.7% of 61,701 total identifications). Most novel peptide identifications (~82%) arise from single–amino acid substitutions. The remainder include N- and C-terminal variation (~12%), indels (~3%), endogenous peptides (~2%), and complex mutational events (~2%). These results demonstrate that transcriptome-informed searching recovers a layer of proteoform diversity inaccessible to reference-based proteomics. The approach requires ~10–100 ng RNA, with library preparation and sequencing completed in ~1 week for <$2,000 in reagents after instrument setup. The pipeline will be released as open source to enable broad adoption.
Keywords: Long read sequencing, Proteomics, Transcriptomics
