PrEMeR-CG - Probabilistic Extension of Methylated Reads at CpG resolution ========== Contributors: Mark Murphy, David E Frankhouser Contact for Questions: Ralf Bundschuh - bundschuh@mps.ohio-state.edu David E Frankhouser - david.frankhouser@osumc.edu ----------- Publication ----------- PrEMeR-CG: Inferring Nucleotide Level DNA Methylation Values from MethylCap-Seq Data ----------- Authors ----------- David E. Frankhouser, Mark Murphy, James S Blachly, Jincheol Park, John Curfman, John C Byrd, Shili Lin, Guido Marcucci, Pearlly Yan, Ralf Bundschuh ----------- Description ----------- PrEMeR-CG is a computational approach that harnesses the implicit information associated with library fragment profiles to infer nucleotide-resolution methylation values in addition to read counts data. CpG binning quantifies the methylation for each CpG in the genome using aligned reads and a fragment profile. ------- Scripts ------- ### main.PrEMeR_CG.py ### - Takes an aligned read (.sam file), cpg index (preferably serialized with pickle) and a fragment profile (.profile cdf file - See NOTE 2 below) as arguments. - Requires -cgbinner2_readnorm.py (Included. See NOTE 3 below) -Python Modules: exceptions, cPickle, sqlite3, csv, gzip, threading, time, argparse - Produces cgb file usage: main.PrEMeR_CG.py [-h] [--out OUT] [--samtype SAMTYPE] [--cdf CDF] [--adapter ADA] [--nthreads NTHREADS] [--text] sam cgindex PrEMeR_CG Binner positional arguments: sam SAM file to bin. cgindex Genome reference CpG index marshal file. optional arguments: -h, --help show this help message and exit --out OUT Base name for output files. --samtype SAMTYPE SAM file type. --cdf CDF CDF file to extend by. --adapter ADA Length of adapter to trim. --nthreads NTHREADS Number of threads to use. --text Output to text file ### PrEMeR_CG.py ### - CpG Binner. - Creates cgb files from cpg index, sam, and fragment length distribution files. - Main worker program for cgb file generation ----- NOTES ----- NOTE 1: main.PrEMeR_CG.py is more or less a wrapper for PrEMeR_CG.py. Importing the binner as a compiled file increases performance. NOTE 2: The profile required by the main.PrEMeR-CG.py script is a flat text file with a header that provides the probability that a nulceotide position is included in a read. This can be generated from the output of any fragment profile analyzer's fragment profile distribution (generated from the smear analysis). First, determine the total area under the curve of the fragment profile distribution ( Atot ). Second, for a given nucleotide position from the start of the fragment, determine the area under the fragment profile distribution curve from the start of the curve to the nucleotide position being considered ( Anuc ). Finally, the probability of a nucleotides inclusion in a fragment ( Pnuc ) is given by subtracting the fractional area up to the considered nucleotide ( Anuc ) from the total area ( Atot ) under the fragment profile distribution ( Pnuc = Atot - Anuc ). This process can be repeated for evey nucleotide contained within the fragment (until the probability of a nucleotide's inclusion falls to zero, or a defined lower threshold). An example profile (example.profile) that can be used for formatting and testing purposes in included. NOTE 3: PrEMeR_CG.py module should be located in the same directory as main.PrEMeR_CG.py. See 'docs.python.org/2/tutorial/modules.html#the-module-search-path' for alternatives.