William R. Pearson

Professor of Biochemistry
Ph.D., 1977, California Institute of Technology


Dept. of Biochemistry and Molecular Genetics
Jordan Hall, Rm 6-057
Box 800733
1300 Jefferson Park Ave.
Charlottesville, VA 22908

FAX: (434)-924-5069

Biochem 508 – Computer Analysis of DNA and Protein Sequences – Syllabus


We have a long-standing interest in exploiting protein sequence information, both for understanding better how new protein sequences arise and for understanding the relationship between protein sequence and protein structure. Since the description of the FASTP program in 1985, our group has been developing more effective methods for identifying distantly related protein sequences. Over the past 10 years, state-of-the-art methods have improved to where proteins that have diverged from a common ancestor in the past billion years are likely to be detected by sequence similarity searching. We hope to push back that threshold to beyond 2 billion years (near the time when prokaryotes and eukaryotes diverged), but already it is possible to identify novel proteins that are likely to have emerged in the last 500 - 800 million years. If we can identify proteins that emerged in the last 100 - 250 million years, it may be possible to identify the mechanisms by which new proteins are formed.

The FASTA WWW search page

The FASTA programs can be used to search protein and DNA sequence databases, and to confirm the statistical significance of a match by comparing the alignment score to a distribution of scores produced by shuffled sequences. Programs are also available to display local alignments.

The FASTA package of sequence comparison programs

ISMB 2000 Tutorial on Protein Evolution and Protein Sequence Comparison (PDF file)

Selected Publications

Furnham N, Holliday GL, de Beer TA, Jacobsen JO, Pearson WR, Thornton JM. (2013) "The Catalytic Site Atlas 2.0: cataloging catalytic sites and residues identified in enzymes." Nucleic Acids Res. 2013 Dec 6. [Epub ahead of print] [Entrez] [PDF]

Mills, L. J. and Pearson, W. R. (2013) "Adjusting Scoring Matrices to Correct Overextended Alignments" Bioinformatics. 29:3007-2013 doi: 10.1093/bioinformatics/btt517. [Entrez] [PDF]

Pearson, W. R. (2013) "" Curr. Prot. Bioinformatics Chapter 3: Unit 3.5 "Selecting the Right Similarity-Scoring Matrix" doi: 10.1002/0471250953.bi0305s43.

Pearson, W. R. (2013) "An Introduction to Similarity ("Homology") Searching" Curr. Prot. Bioinformatics Chapter 3: Unit 3.1 doi: 10.1002/0471250953.bi0301s42. [Entrez]

Li W, McWilliam H, Goujon M, Cowley A, Lopez R, Pearson WR. (2012) "PSI-Search: iterative HOE-reduced profile SSEARCH searching." Bioinformatics. [Entrez] [PDF]

Holliday GL, Andreini C, Fischer JD, Rahman SA, Almonacid DE, Williams ST, Pearson WR. (2012) "MACiE: exploring the diversity of biochemical reactions." Nucleic Acids Res. 2012 Jan;40(Database issue):D783-9. [Entrez] [PDF]

M. W. Gonzalez and W. R. Pearson (2010) Bioinformatics "RefProtDom: A protein database with improved domain boundaries and homology relationships" 26:2361-2361 [Entrez] [PDF]

M. L. Sierk, M. E. Smoot, E. J. Bass, and W. R. Pearson (2010) "Improving pairwise sequence alignment accuracy using near-optimal alignments" BMC Bioinformatics 11:146 doi:10.1186/1471-2105-11-146 [Entrez] [PDF]

M. W. Gonzalez and W. R. Pearson (2010) "Homologous over-extension: a challenge for iterative similarity searches" Nuc. Acids Research 38:2177-2189 [Entrez] [PDF]

D. T. Lavelle and W. R. Pearson (2010) "Globally, unrelated protein sequences appear random" Bioinformatics 26:310-318 [Entrez] [PDF]

B. L. Cantarel, H. G. Morrison, and W. R. Pearson (2006) "Exploring the relationship between sequence similarity and accurate phylogenetic trees." Mol Biol Evol. 23(11):2090-100. Epub 2006 Aug 4. [Entrez] [PDF]

W. R. Pearson and M. L. Sierk (2005) "The limits of protein sequence comparison?" Curr Opin Struct Biol. 15:254-260. [Entrez] [PDF]

M. L. Sierk and W. R. Pearson (2004) "Sensitivity and selectivity in protein structure comparison." Protein Sci. 13:773-85. [Entrez] [PDF]

M. E. Smoot, S. A. Guerlain, and W. R. Pearson (2004) "Visualization of near-optimal alignments" Bioinformatics 20:953-958 [Entrez] [PDF]

Reese JT, Pearson WR. (2002) Empirical determination of effective gap penalties for sequence comparison. Bioinformatics. 18:1500-1507. [Entrez] [PDF]

A. J. Mackey, T. A. J. Haystead, and W. R. Pearson (2002) "Getting more From Less: Algorithms for Rapid Protein Identification with Multiple Short Peptide Sequences" Mol. Cell. Proteomics 1:139-147 [Entrez]

W. R. Pearson and T. C. Wood (2001) "Statistical significance in biological sequence comparison" in Handbook of Statistical Genetics, D. J. Balding, M. Bishop, and C. Cannings eds. London: Wiley, pp. 39-65

T. C. Wood and W. R. Pearson Evolution of Protein Sequences and Structures (1999) J.Mol. Biol. 291:977-995 [Entrez], also available from http://www.idealibrary.com

W. R. Pearson, G. Robins, and T. Zhang (1999) Mol. Biol. Evol. 16:806-16 Generalized neighbor-joining: more reliable phylogenetic tree reconstruction. [Entrez].

J. D. Retief, K. R. Lynch, and W. R. Pearson (1999) Panning for genes - a visual strategy for identifying novel gene orthologs and paralogs Genome Res. 9:373-382 [Entrez].

Pearson, W. R. (2000) Flexible sequence similarity searching with the FASTA3 program package Methods Mol. Biol. 132:185-219 PDF file

Pearson, W. R., (1998) Empirical statistical estimates for sequence similarity scores J.Mol. Biol. 276:71-84 [Entrez].

Ivarsson Y, Mackey AJ, Edalat M, Pearson WR, Mannervik B. (2003) "Identification of residues in glutathione transferase capable of driving functional diversification in evolution. A novel approach to protein redesign." J Biol Chem. 2003 278:8733-8738. [Entrez].

Patskovsky YV, Huang MQ, Takayama T, Listowsky I, Pearson WR (1999) Distinctive structure of the human GSTM3 gene-inverted orientation relative to the mu class glutathione transferase gene cluster. Arch. Biochem. Biophys. 361:85-93 [Entrez]

Xu, S.-j., Wang, Y.-p., Roe, B., Pearson, W. R. (1998) Characterization of the Human Class Mu Glutathione S-Transferase Gene Cluster and the GSTM1 Deletion. J. Biol. Chem. 273:3517-3527. [Entrez]

Pearson, W. R., Vorachek, W. R., Xu, S., Berger, R., Hart, I., Vannais, D., and Patterson, D. (1993) Identification of class-mu glutathione transferase genes GSTM1 - GSTM5 on human chromosome 1p13. Am. J. Human Genet. 53:220-233. [Entrez]

Daly, A. K., Thomas, D. J., Cooper, J., Pearson, W. R., Neal, D. E., and Idle, J. R. (1993) Homozygous deletion of the glutathione S-transferase M1 (GSTM1) gene is a risk factor in bladder cancer. Brit. Med. J. 307:481-482. [Entrez]

Reprints from the Pearson Lab