XMAn: A Homo sapiens Mutated-Peptide Database for the MS Analysis of Cancerous Cell State
Yang, X.; Lazar, I.M., J. Proteome Res., Epub. Sep 11, 2014. DOI: 10.1021/pr5004467.
To enable the identification of mutated peptide sequences in complex biological samples, in this work, two novel cancer- and disease-related protein databases with mutation information collected from several public resources such as COSMIC, IARC P53, OMIM and UniProtKB, were developed. In-house developed Perl-scripts were used to search and process the data, and to translate each gene-level mutation into a mutated peptide sequence. The cancer and disease mutation databases comprise a total of 872,125 and 27,148 peptide entries from 25,642 and 2,913 proteins, respectively. A description line for each entry provides the parent protein ID and name, the cDNA- and protein-level mutation site and type, the originating database, and the disease or cancer tissue type and corresponding hits. The two databases are FASTA formatted to enable data retrieval by commonly used tandem MS search engines. While the largest number of mutations were encountered for the amino acids A/D/E/G/L/P/R/S, the global mutation profiles replicate closely the outcome of the 1000 Genomes Project aimed at cataloguing natural mutations in the human population. The affected proteins were primarily involved in transcription regulation, splicing, protein synthesis/folding/binding, redox/energy production, adhesion/motility, and to some extent in DNA damage repair and signaling. The applicability of the database to identifying the presence of mutated peptides was investigated with MCF-7 breast cancer cell extracts.
Link to publication: http://pubs.acs.org/doi/abs/10.1021/pr5004467
Direct access to XMAn cancer mutation database: link
Instructions: Open the database in Notepad and check for complete download the last entry, shown below (DO NOT open in Excel; due to size, opening in Excel will truncate the file):
>CANCER_sp_P35573,GDE_HUMAN Glycogen debranching enzyme|c.4404G>T|p.L1468F|FSRFMGP|Missense|COSMIC|Lung(1)
AKLYFSRFMGPETTAKTIVLVKNVLSR
Due to the presence of two missed K/R cleavages on both sides of the mutated amino acid site, peptide matches resulting from a search may contain a large number of entries that do not encompass any mutation site. To eliminate such entries, a Perl script can be developed to select only the mutated peptide sequences. Mutated peptide hits in an Excel spreadsheet can be also selected by making use of simple functions that enable the discovery of a substring in a string of characters in a cell (i.e., the FIND or SEARCH functions).
Please acknowledge the use of the XMAn database in your publications by citing the following reference:
Xu Yang and Iulia M. Lazar, “XMAn: A Homo sapiens Mutated-Peptide Database for MS Analysis of Cancerous Cell States,” J. Proteome Res. 2014, 13(12), 5486-5495.