BIOCOMPUTING METHODS FOR PROTEIN ANALYSIS
PROTEIN INFORMATION RESOURCES
1. GENERAL
PIR - Protein
Information Resource
UniProt - United
Protein Databases
MIPS - Munich Information Center for Protein Sequences
PDB - Protein Database
Proteome - Databases
of C.Elegans & S. Cerevisiae Proteins; cross - correlations
ModBase - a database of three-dimensional protein models calculated by comparative modeling.
2. PROTEINS IN SIGNALLING PATHWAYS AND REACTIONS
SPAD - Signalling Pathway Database
CSNDB - Cell Signalling Networks Database
PREDICTION OF PROTEIN FUNCTION ACCORDING TO GENE ONTOLOGY CATEGORIES
ProtFun at Denmark Technical University
PROBLEM: Find the possible function of the human proteins with Ensembl accession numbers ENSP00000257015 and ENSP00000252184
comparing the results with BLAST searches
PROTEIN FUNCTION FROM GENOME SEQUENCES -NEW METHODS
1. Domain Fusion & Fission - Rosetta Stone Method
Based on the observaton that some pairs of interacting proteins have a homolog in another organism fused into a single protein chain. Many members of these pairs are confirmed as functionally related. Some proteins have links to several other proteins; these coupled links appear to represent functional interactions such as complexes or pathways.
Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D Nature 1999 Nov 4;402(6757):83-6
A combined algorithm for genome-wide prediction of protein function.
Yanai et al. (2001) Genes linked by fusion events are generally of the same functional category .... PNAS 98: 7940 - 7945.
Internet resources:
FusionDB: database
of bacterial and archaeal gene fusion events - also known as Rosetta stones
Proteinpathways - examples & references
E.Coli Proteins linked by Rosetta Stone Proteins
The availability of over 20 fully sequenced genomes has driven the development of new methods to find protein function and interactions. Here we group proteins by correlated evolution, correlated messenger RNA expression patterns and patterns of domain fusion to determine functional relationships among the 6,217 proteins of the yeast Saccharomyces cerevisiae. Using these methods, we discover over 93,000 pairwise links between functionally related yeast proteins. Links between characterized and uncharacterized proteins allow a general function to be assigned to more than half of the 2,557 previously uncharacterized yeast proteins. Examples of functional links are given for a protein family of previously unknown function, a protein whose human homologues are implicated in colon cancer and the yeast prion Sup35.
The Database of Interacting Proteins (DIP) is a database that documents experimentally determined protein-protein interactions. This database is intended to provide the scientific community with a comprehensive and integrated tool for browsing and efficiently extracting information about protein interactions and interaction networks in biological processes. Beyond cataloging details of protein-protein interactions, the DIP is useful for understanding protein function and protein-protein relationships, studying the properties of networks of interacting proteins, benchmarking predictions of protein-protein interactions, and studying the evolution of protein-protein interactions.
2. Phylogenetic Profiling
Based on the assumption that proteins that evolve together in a correlated fashion are likely to function together in a pathway or structural complex. This permits us to assign functions to uncharacterized proteins that have phylogenetic profiles similar to proteins of known function.
Pellegrini M et al. PNAS Vol. 96, Issue 8, 4285-4288, April 13, 1999
3. Chromosomal proximity
Proteins whose genes are in physical proximity function in the same pathway or interact in complexes. An uncharacterized protein whose gene is physically close to another of known function may be assigned a function on this basis.
4. Correlated Messenger RNA Expression Patterns
Genes coexpressed in similar ways in different tissues may again interact or function in a metabolic or signalling pathway. Allows funtions to be assigned to uncharacterized proteins.
5. Combined methods:
A database of predicted links between the proteins of 44 genomes based on the implementation of three computational methods—chromosomal proximity, phylogenetic profiling and domain fusion—and large-scale experimental screenings of protein–protein interaction data. The combination of data from various predictive methods in one database allows for their comparison with each other, as well as visualization of their correlation with known pathway information.
STRING:Search Tool for the Retrieval of Interacting Genes/ProteinsSTRING is a database
of known and predicted protein-protein interactions. The interactions include direct (physical) and indirect (functional) associations; they are derived from four sources:
|
MULTIFUNCTIONAL (MOONLIGHTING) PROTEINS
An increasing number of proteins have been identified as having multiple and distinct functions
Reviews: Jeffery CJ (2003) Trends Genet and Ann Med. 2003;35(1):28-35
Copley SD (2003) Curr. Opin. Chem. Biol. Enzymes with extra talents: moonlighting functions and catalytic promiscuity.
Recent studies of moonlighting functions and catalytic promiscuity provide insights into the structural and mechanistic bases of these phenomena. Moonlighting proteins that are highlighted include gephyrin, the Neurospora crassa tyrosyl tRNA synthetase, phosphoglucose isomerase, and cytochrome c. New insights into catalytic promiscuity are provided by studies of aminoglycoside kinase (3') type IIIa, tetrachlorohydroquinone dehalogenase, and aldolase antibody 38C2.
Problem of identifying multifunctional proteins discussed in Gomez A Bioinformatics. 2003 May 1;19(7):895-6: Do current sequence analysis algorithms disclose multifunctional (moonlighting) proteins?
Table at http://ibb.uab.es/moonlighting
A number of multifunctional, 'moonlighting', proteins were analyzed by different current programs to test whether they identify both functions. PSI-BLAST and PRODOM performed best in predicting the alternative function.
PRIMARY STRUCTURE ANALYSIS
Proteomics Tools at ExPASy : prediction of:
Protein sorting signals, signal peptide cleavage sites,
O-glycosylation sites in mammalian proteins
GPI-anchor and cleavage sites
Ser, Thr and Tyr phosphorylation sites in eukaryotic proteins:
also at: Denmark Technical University
Coiled coil regions in proteins, two- and three-stranded
coiled coils
MHC type I (HLA) peptide binding
MHC type I and II peptide binding
Identification of PEST regions (short-lived proteins)
MISCELLANEOUS TOOLS
ProtScale - Amino acid
scale representation (Hydrophobicity, other conformational parameters, etc.)
Drawhca - Draw an HCA (Hydrophobic
Cluster Analysis) plot of a protein sequence
Protein Colourer - Tool
for coloring your amino acid sequence
Colorseq - Tool to highlight
(in red) a selected set of residues in a protein sequence
HelixWheel - Representation
of a protein fragment as an helical wheel
RandSeq - Random protein
sequence generator
EXAMPLE:
Predict the phosphorylation sites for the human retinoblastoma
protein (RB1):
>gi|4506435|ref|NP_000312.1| retinoblastoma 1 [Homo sapiens]
MPPKTPRKTAATAAAAAAEPPAPPPPPPPEEDPEQDSGPEDLPLVRLEFEETEEPDFTALCQKLKIPDHV
RERAWLTWEKVSSVDGVLGGYIQKKKELWGICIFIARVDLDEMSFTLLSYRKTYEISVHKFFNLLKEIDT
STKVDNAMSRLLKKYDVLFALFSKLERTCELIYLTQPSSSISTEINSALVLKVSWITFLLAKGEVLQMED
DLVISFQLMLCVLDYFIKLSPPMLLKEPYKTAVIPINGSPRTPRRGQNRSARIAKQLENDTRIIEVLCKE
HECNIDEVKNVYFKNFIPFMNSLGLVTSNGLPEVENLSKRYEEIYLKNKDLDRRLFLDHDKTLQTDSIDS
FETQRTPRKSNLDEEVNIIPPHTPVRTVMNTIQQLMMILNSASDQPSENLISYFNNCTVNPKESILKRVK
DIGYIFKEKFAKAVGQGCVEIGSQRYKLGVRLYYRVMESMLKSEEERLSIQNFSKLLNDNIFHMSLLACA
LEVVMATYSRSTSQNLDSGTDLSFPWILNVLNLKAFDFYKVIESFIKAEGNLTREMIKHLERCEHRIMES
LAWLSDSPLFDLIKQSKDREGPTDHLESACPLNLPLQNNHTAADMYLSPVRSPKKKGSTTRVNSTANAET
QATSAFQTQKPLKSTSLSLFYKKVYRLAYLRLNTLCERLLSEHPELEHIIWTLFQHTLQNEYELMRDRHL
DQIMMCSMYGICKVKNIDLKFKIIVTAYKDLPHAVQETFKRVLIKEEEYDSIIVFYNSVFMQRLKTNILQ
YASTRPPTLSPIPHIPRSPYKFPSSPLRIPGGNIYISPLKSPYKISEGLPTPTKMTPRSRILVSIGESFG
TSEKFQKINQMVCNSDRVLKRSAEGSNPPKPLKKLRFDIEGSDEADGSKHLPGESKFQQKLAEMTSTRTR
MQKQKMNDSMDTSNKEEK
SECONDARY STRUCTURE PREDICTION
Protein Secondary Structure , Lectures Stanford Computational
Biology Course, November 22, 1999,
Sequence Blocks and Profiles 2002
Collection of analysis tools at BCM Search Launcher, ExPASy and PredictProtein server.
META PP - a general interface that enables users to simultaneously submit a sequence to a wide variety of prediction services. You submit your sequence once (paste into WWW form), and you get results from currently more than 10 different programs via email.
Other Protein secondary structure prediction servers
PSIpred - Prediction of secondary
structure from multiple sequences
DSC: Discrimination of Protein
Secondary Structure Class
PHDsec: the PredictProtein server
at EMBL
PREDATOR: another EMBL server
NNPREDICT server at UCSF
NSSP server at Baylor College
of Medicine
Implementation of GOR method in
Leeds
GOR at the University of Southampton
JPRED Secondary structure prediction
server at EBI
Pred2ary Secondary structure and
class prediction server at UCSF
HELIX-TURN-HELIX MOTIF PREDICTION at EMBOSS
COILED - COILED prediction methods:
COILS at ExPASy
MATCHER at Polytechnic University
NPS@
at PBIL (Lyon)
EXAMPLE: predict the secondary sequence of the S. Typhimurium tryptophane
synnthetase alpha subunit
acc. # P00929.
MERYENLFAQ LNDRREGAFV PFVTLGDPGI EQSLKIIDTL IDAGADALEL GVPFSDPLAD
GPTIQNANLR AFAAGVTPAQ CFEMLALIRE KHPTIPIGLL MYANLVFNNG IDAFYARCEQ
VGVDSVLVAD VPVEESAPFR QAALRHNIAP IFICPPNADD DLLRQVASYG RGYTYLLSRS
GVTGAENRGA LPLHHLIEKL KEYHAAPALQ GFGISSPEQV SAAVRAGAAG AISGSAIVKI
IEKNLASPKQ MLAELRSFVS AMKAASRA
3-D PROTEIN ANALYSIS AND COMPARISON
3-D viewing and analysis of proteins can be carried out with several programs including:
Cn3D - available at NCBI Molecular Modeling Database MMDB
- allows linked sequence and structural
displays of 3-D alignments of protein domains using
VAST (Vector Alignment Search
Tool)
Example: 1TFR - Bacteriophage T4 RNAseH.
Use VAST to compare with 1EXN