ProteomicsI

PROTEOMICS I

BIOCOMPUTING METHODS FOR PROTEIN ANALYSIS

PROTEIN INFORMATION RESOURCES

1. GENERAL

PIR - Protein Information Resource

UniProt - United Protein Databases

MIPS - Munich Information Center for Protein Sequences

PDB - Protein Database

Human Proteome Organization

Proteome - Databases of C.Elegans & S. Cerevisiae Proteins; cross - correlations

ModBase - a database of three-dimensional protein models calculated by comparative modeling.

2. PROTEINS IN SIGNALLING PATHWAYS AND REACTIONS

Reactome - a database of individual biochemical reactions from humans and non-human systems such as rat, mouse, pufferfish, and zebrafish,
obtained either via a literature citation or an electronic inference based on sequence similarity.

BioCarta - interactive graphic models of molecular and cellular pathways.

SPAD - Signalling Pathway Database

CSNDB - Cell Signalling Networks Database

PREDICTION OF PROTEIN FUNCTION ACCORDING TO GENE ONTOLOGY CATEGORIES

ProtFun at Denmark Technical University

PROBLEM: Find the possible function of the human proteins with Ensembl accession numbers ENSP00000257015 and ENSP00000252184

comparing the results with BLAST searches

PROTEIN FUNCTION FROM GENOME SEQUENCES -NEW METHODS

1. Domain Fusion & Fission - Rosetta Stone Method

Based on the observaton that some pairs of interacting proteins have a homolog in another organism fused into a single protein chain. Many members of these pairs are confirmed as functionally related. Some proteins have links to several other proteins; these coupled links appear to represent functional interactions such as complexes or pathways.

Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D Nature 1999 Nov 4;402(6757):83-6

A combined algorithm for genome-wide prediction of protein function.

Yanai et al. (2001) Genes linked by fusion events are generally of the same functional category .... PNAS 98: 7940 - 7945.

Internet resources:

FusionDB: database of bacterial and archaeal gene fusion events - also known as Rosetta stones

Proteinpathways - examples & references

E.Coli Proteins linked by Rosetta Stone Proteins

The availability of over 20 fully sequenced genomes has driven the development of new methods to find protein function and interactions. Here we group proteins by correlated evolution, correlated messenger RNA expression patterns and patterns of domain fusion to determine functional relationships among the 6,217 proteins of the yeast Saccharomyces cerevisiae. Using these methods, we discover over 93,000 pairwise links between functionally related yeast proteins. Links between characterized and uncharacterized proteins allow a general function to be assigned to more than half of the 2,557 previously uncharacterized yeast proteins. Examples of functional links are given for a protein family of previously unknown function, a protein whose human homologues are implicated in colon cancer and the yeast prion Sup35.

The Database of Interacting Proteins (DIP) is a database that documents experimentally determined protein-protein interactions. This database is intended to provide the scientific community with a comprehensive and integrated tool for browsing and efficiently extracting information about protein interactions and interaction networks in biological processes. Beyond cataloging details of protein-protein interactions, the DIP is useful for understanding protein function and protein-protein relationships, studying the properties of networks of interacting proteins, benchmarking predictions of protein-protein interactions, and studying the evolution of protein-protein interactions.

2. Phylogenetic Profiling

Based on the assumption that proteins that evolve together in a correlated fashion are likely to function together in a pathway or structural complex. This permits us to assign functions to uncharacterized proteins that have phylogenetic profiles similar to proteins of known function.

Pellegrini M et al. PNAS Vol. 96, Issue 8, 4285-4288, April 13, 1999

3. Chromosomal proximity

Proteins whose genes are in physical proximity function in the same pathway or interact in complexes. An uncharacterized protein whose gene is physically close to another of known function may be assigned a function on this basis.

4. Correlated Messenger RNA Expression Patterns

Genes coexpressed in similar ways in different tissues may again interact or function in a metabolic or signalling pathway. Allows funtions to be assigned to uncharacterized proteins.

5. Combined methods:

Predictome

A database of predicted links between the proteins of 44 genomes based on the implementation of three computational methods—chromosomal proximity, phylogenetic profiling and domain fusion—and large-scale experimental screenings of protein–protein interaction data. The combination of data from various predictive methods in one database allows for their comparison with each other, as well as visualization of their correlation with known pathway information.

STRING:Search Tool for the Retrieval of Interacting Genes/Proteins

STRING is a database of known and predicted protein-protein interactions.
The interactions include direct (physical) and indirect (functional) associations; they are derived from four sources:

Genomic Context

High-throughput Experiments

(Conserved) Coexpression

Previous Knowledge

Prolinks: a database of protein functional linkages derived from coevolution: http://dip.doe-mbi.ucla.edu/pronav

MULTIFUNCTIONAL (MOONLIGHTING) PROTEINS

An increasing number of proteins have been identified as having multiple and distinct functions

Reviews: Jeffery CJ (2003) Trends Genet and Ann Med. 2003;35(1):28-35

Copley SD (2003) Curr. Opin. Chem. Biol. Enzymes with extra talents: moonlighting functions and catalytic promiscuity.

Recent studies of moonlighting functions and catalytic promiscuity provide insights into the structural and mechanistic bases of these phenomena. Moonlighting proteins that are highlighted include gephyrin, the Neurospora crassa tyrosyl tRNA synthetase, phosphoglucose isomerase, and cytochrome c. New insights into catalytic promiscuity are provided by studies of aminoglycoside kinase (3') type IIIa, tetrachlorohydroquinone dehalogenase, and aldolase antibody 38C2.

Problem of identifying multifunctional proteins discussed in Gomez A Bioinformatics. 2003 May 1;19(7):895-6: Do current sequence analysis algorithms disclose multifunctional (moonlighting) proteins?

Table at http://ibb.uab.es/moonlighting

A number of multifunctional, 'moonlighting', proteins were analyzed by different current programs to test whether they identify both functions. PSI-BLAST and PRODOM performed best in predicting the alternative function.

PRIMARY STRUCTURE ANALYSIS

Proteomics Tools at ExPASy : prediction of:

Protein sorting signals, signal peptide cleavage sites,
O-glycosylation sites in mammalian proteins
GPI-anchor and cleavage sites
Ser, Thr and Tyr phosphorylation sites in eukaryotic proteins:
also at: Denmark Technical University
Coiled coil regions in proteins, two- and three-stranded coiled coils
MHC type I (HLA) peptide binding
MHC type I and II peptide binding
Identification of PEST regions (short-lived proteins)

MISCELLANEOUS TOOLS

      ProtScale - Amino acid scale representation (Hydrophobicity, other conformational parameters, etc.)
      Drawhca - Draw an HCA (Hydrophobic Cluster Analysis) plot of a protein sequence
      Protein Colourer - Tool for coloring your amino acid sequence
      Colorseq - Tool to highlight (in red) a selected set of residues in a protein sequence
      HelixWheel - Representation of a protein fragment as an helical wheel
      RandSeq - Random protein sequence generator

EXAMPLE:

Predict the phosphorylation sites for the human retinoblastoma protein (RB1):

>gi|4506435|ref|NP_000312.1| retinoblastoma 1 [Homo sapiens]
MPPKTPRKTAATAAAAAAEPPAPPPPPPPEEDPEQDSGPEDLPLVRLEFEETEEPDFTALCQKLKIPDHV
RERAWLTWEKVSSVDGVLGGYIQKKKELWGICIFIARVDLDEMSFTLLSYRKTYEISVHKFFNLLKEIDT
STKVDNAMSRLLKKYDVLFALFSKLERTCELIYLTQPSSSISTEINSALVLKVSWITFLLAKGEVLQMED
DLVISFQLMLCVLDYFIKLSPPMLLKEPYKTAVIPINGSPRTPRRGQNRSARIAKQLENDTRIIEVLCKE
HECNIDEVKNVYFKNFIPFMNSLGLVTSNGLPEVENLSKRYEEIYLKNKDLDRRLFLDHDKTLQTDSIDS
FETQRTPRKSNLDEEVNIIPPHTPVRTVMNTIQQLMMILNSASDQPSENLISYFNNCTVNPKESILKRVK
DIGYIFKEKFAKAVGQGCVEIGSQRYKLGVRLYYRVMESMLKSEEERLSIQNFSKLLNDNIFHMSLLACA
LEVVMATYSRSTSQNLDSGTDLSFPWILNVLNLKAFDFYKVIESFIKAEGNLTREMIKHLERCEHRIMES
LAWLSDSPLFDLIKQSKDREGPTDHLESACPLNLPLQNNHTAADMYLSPVRSPKKKGSTTRVNSTANAET
QATSAFQTQKPLKSTSLSLFYKKVYRLAYLRLNTLCERLLSEHPELEHIIWTLFQHTLQNEYELMRDRHL
DQIMMCSMYGICKVKNIDLKFKIIVTAYKDLPHAVQETFKRVLIKEEEYDSIIVFYNSVFMQRLKTNILQ
YASTRPPTLSPIPHIPRSPYKFPSSPLRIPGGNIYISPLKSPYKISEGLPTPTKMTPRSRILVSIGESFG
TSEKFQKINQMVCNSDRVLKRSAEGSNPPKPLKKLRFDIEGSDEADGSKHLPGESKFQQKLAEMTSTRTR
MQKQKMNDSMDTSNKEEK

SECONDARY STRUCTURE PREDICTION

Protein Secondary Structure , Lectures Stanford Computational Biology Course, November 22, 1999,
Sequence Blocks and Profiles 2002

Collection of analysis tools at BCM Search Launcher, ExPASy and PredictProtein server.

META PP - a general interface that enables users to simultaneously submit a sequence to a wide variety of prediction services. You submit your sequence once (paste into WWW form), and you get results from currently more than 10 different programs via email.

Other Protein secondary structure prediction servers

     PSIpred - Prediction of secondary structure from multiple sequences
     DSC: Discrimination of Protein Secondary Structure Class
     PHDsec: the PredictProtein server at EMBL
     PREDATOR: another EMBL server
     NNPREDICT server at UCSF
     NSSP server at Baylor College of Medicine
     Implementation of GOR method in Leeds
     GOR at the University of Southampton
     JPRED Secondary structure prediction server at EBI
     Pred2ary Secondary structure and class prediction server at UCSF

HELIX-TURN-HELIX MOTIF PREDICTION at EMBOSS

COILED - COILED prediction methods:

COILS at ExPASy

MATCHER at Polytechnic University

NPS@ at PBIL (Lyon)

EXAMPLE: predict the secondary sequence of the S. Typhimurium tryptophane synnthetase alpha subunit

acc. # P00929.

     MERYENLFAQ LNDRREGAFV PFVTLGDPGI EQSLKIIDTL IDAGADALEL GVPFSDPLAD
     GPTIQNANLR AFAAGVTPAQ CFEMLALIRE KHPTIPIGLL MYANLVFNNG IDAFYARCEQ
     VGVDSVLVAD VPVEESAPFR QAALRHNIAP IFICPPNADD DLLRQVASYG RGYTYLLSRS
     GVTGAENRGA LPLHHLIEKL KEYHAAPALQ GFGISSPEQV SAAVRAGAAG AISGSAIVKI
     IEKNLASPKQ MLAELRSFVS AMKAASRA

3-D PROTEIN ANALYSIS AND COMPARISON

3-D viewing and analysis of proteins can be carried out with several programs including:

Cn3D - available at NCBI Molecular Modeling Database MMDB

- allows linked sequence and structural displays of 3-D alignments of protein domains using
VAST (Vector Alignment Search Tool)

Example: 1TFR - Bacteriophage T4 RNAseH. Use VAST to compare with 1EXN