Talk abstracts

Talk on Saturday 05:00-05:15pm submitted by Marianne Lee

Simple is beautiful: a straightforward approach to improve the delineation of true and false positives in PSI-BLAST searches

Marianne M. Lee (Biophysics Program, The Ohio State University), Michael K. Chan (Departments of Biochemistry and Chemistry, The Ohio State University), Ralf Bundschuh (Department of Physics, The Ohio State University)

Abstract:
Sequence alignment is one of the most widely used techniques in computational biology, particularly for functional annotation of non-characterized sequences. BLAST and PSI-BLAST (Altschul, et al., 1990 and 1997) are arguably the most popular. PSI-BLAST, which uses an iterative profile-based search strategy, is more sensitive than BLAST in detecting weak homologies, thus making it suitable for remote homolog detection.

In its first iteration, PSI-BLAST identifies relatively close homologs. These close homologs are then used to generate a profile for the next iteration that will be able to find more remote homologs. Iterating this process in theory yields progressively better models and thus finds more and more weakly related homologs. However, in practice, non-homologous false positives are frequently incorporated into the model at some point during the iterative process. Once this happens, the model is "corrupted", resulting in the false identification of many non-homologs as true putatives. Such model corruption is particularly treacherous if the non-homologous sequence belongs to a large family. A naive approach is to set a stringent inclusion threshold, but the trade-off is a loss of sensitivity, especially for the more remote homologs, thus diminishing the strength and utility of PSI-BLAST.

Many different approaches have been proposed to improve the discrimination of true and false positives. Despite their better performance in remote homology detection, these approaches are computationally more expensive than PSI-BLAST, thus hindering their wide acceptance by the user community.

We have developed a simple and elegant approach to resolve the problem of model corruption in PSI-BLAST searches. We hypothesized that combining results from the first (least-corrupted) profile with results from later (most sensitive) iterations of PSI-BLAST provides a better discriminator for true and false hits. Accordingly, we have derived a formula that utilizes the E-values from these two iterations to obtain a figure of merit for rank-ordering the hits. Our verification results using a "gold-standard" test set show that our approach does delineate true positives from false positives better than PSI-BLAST E-values. Perhaps what is most notable about this strategy is that it is simple and straightforward to implement.

References:
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool, J Mol Biol, 215, 403-410.

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of pro-tein database search programs, Nucleic Acids Res, 25, 3389-3402.

Lee, MM., Chan, MK. Bundschuh, R., Simple is beautiful: a straightforward approach to improving the delineation of true and false positives from a PSI-BLAST search. Bioinformatics, 2008, Apr 10

Keywords: PSI-BLAST, model corruption, distant homology detection