Transcriptomics

The identification and annotation of multiple or all the mRNAs transcribed from specific complete genomes

Databases are available for research related to drug development, gene hunting, molecular evolution, and comparative genomics

Homo Sapiens:

H-Inv DB: Pools current information at Japan Biological Information Research Center in Tokyo

Has data on more than 20,000 unique cDNA sequences, including everything known on

function, structure, tissue expression patterns, disease relationships

 and orthologs with multiple alignments and phylogenetic trees in common experimental animals


The German human cDNA Consortium

Mus musculus:

The NIA mouse cDNA Project at the National Institute on Aging

Man, mouse & Rat:

Mammalian Gene Collection (MGC)

BLAST
searches available at the above sites

Transcripts related to cancer at CGAP (Cancer Genome Anatomy Project):

NCI/Affymetrix Human Transcriptome Project

Affymetrix, in collaboration with the National Cancer Institute, initiated the Human Transcriptome Project (HTP) in 2001
to generate the complete collection of transcribed elements of the human genome.

PROBLEM:

Find the relative degrees of expression of the AHR gene among normal tissues and in various neoplasias.



Genome - wide Analysis of Gene Transcription

Analysis of Transcription on a genomic scale is now possible using several methodologies, including:
 


1. MICROARRAY TECHNOLOGIES
 

Lecture  in NHGRI course, 2006

The Chipping Forecast

Supplements to Nature Genetics Volume 21, January 1999 and 32, December 2002
The Chipping Forecast is a collection of reviews on different aspects of microarray analysis and is
publicly available online.

The Brown Lab Home Page at Stanford - Seminal Microarray experiments, Instructions for building your own Microarrayer,etc.

Application of microarrays to analysis of gene expression in human mammary epithelial cells and cancers
Perou et al. PNAS 1999

Gene Expression Patterns in human cancer cell lines
Ross et al. Nature Genetics March 2000
 

APPLICATIONS OF MICROARRAYS
 

1. The identification of genetic signatures and hence cellular targets of drug, pathogen or toxicant exposure.

2. Hypothesis-generating attempts to assign probable functions to newly identified genes (for example, by comparison with the expression
    patterns of known genes: gene products with similar expression profiles (e.g. with respect to time) may interact or act in a pathway)

3. Identification of therapeutic targets e.g. by comparing the actions of multiple agents on multiple cell lines under multiple conditions

4. The delineation of complex patterns of gene expression that provide a potentially 'pathogenomonic' molecular phenotype
    e.g. to distinguish subtypes of neoplasms

5. Personalized medicine, in molecular diagnosis of disease and in predicting drug efficacy and toxicity in different individuals (pharmaco- or
  toxicogenomics)

6. Elucidation of global mechanisms by which cells or organisms respond to environmental changes (e.g. temperature, nutrient concentration).
 
  
MICROARRAY DATA ANALYSIS & VISUALIZATION METHODS

The large data sets generated from microarray experiments necessitates methods for visualization and relation of data to Gene Ontology and cellular pathways

(metabolic, signal transduction, etc.).

Tools are also available to convert between disparate gene ids, and design optimal microarrays for a given application.


DATA MINING WITH GENE ONTOLOGY OR LITERATURE ASSOCIATIONS

Tools at Gene Ontology Home Page.

FatiGO - data mining with Gene Ontology, compares response sets. 

PubGene - analyzes expression data in the context of literature or sequence associations.


TOOLS FOR MAPPING GENE LISTS TO CELLULAR PATHWAYS

GenMAPP - visualization of gene expression data on maps representing biological pathways and groupings of genes and

MappFinder - Automatically analyzes gene expression/proteomic profiling data in the context of the Gene Ontology hierarchy and
                 GenMAPP biological pathways.

Pathway Miner - Extracts gene association networks from molecular pathways for predicting the biological significance of gene expression microarray data.

Maps to KEGG, BioCarta and GenMAPP pathways and data.


INTEGRATED TOOLS:

BioRag: an online resource for easy access to collective and integrated information from various public biological resources for human and mouse genes.

DAVID (Database for Annotation, Vizualization and Integrated Discovery): Suite of microarray data annotation tools.

Includes EASE (the Expression Analysis Systematic Explorer).

The R project: a language and environment for statistical computing and graphics.

Bioconductor: an open source and open development software project for the analysis and comprehension of genomic data.

MedMiner, Matchminer, GOMiner, CIMMiner   

Medminer extracts and organize relevant sentences in the literature based on a gene, gene-gene or gene-drug query

MatchMiner is a set of tools that enables the user to translate between disparate ids for the same gene. It uses data from the UCSC, LocusLink, Unigene, OMIM, Affymetrix and Jackson data sources to determine how different ids relate. Supported id types include, gene symbols and names, IMAGE and FISH clones, GenBank accession numbers and UniGene cluster ids.

GoMiner is a tool for biological interpretation of 'omic' data – including data from gene expression microarrays.

CIMMiner generates color-coded Clustered Image Maps (CIMs) ("heat maps") to represent "high-dimensional" data sets such as gene expression profiles.

Onto-Tools - set of four integrated databases:

Onto-Express: translates lists of  genes differentially regulated into functional profiles,

Onto-Compare: allows determination of which array or set of arrays covers hypothesis studied.

Onto-Design: allows selection of genes that represent given functional categories

Onto-Translate: allows translation of of accession numbers, Unigene clusters and Affymetrix probes into one another.

Discovering Regulatory Modules and their Context Specific Regulators from Gene Expression Data - Stanford course, Eran Segal,

Project report "Survey of Computational Algorithms to Analyze Patterns of Gene Expression from cDNA/DNA chip Microarray Data" by Peter Feng.  (Contain links to important experiments)

Analysis of Regulation of Yeast Gene Expression  at Church Lab

XENOPUS at the Rockefeller University

GeneSpring

TIGR ArrayViewer

NHGRI Microarray Project

Exercise 1: the following genes were found to be up- or downregulated  in mouse lung following toxicant  exposure:

Upregulated:

AB030252
NM_020570
U58633
AF233333
AF126063
M17243
AJ132098
NM_011580
AK005731
X70298
AF177146
U12785
U03283
AK003950
U19463
X62622
D00208
U75215
X99641
D16464
NM_020510
AK007483
AB031386
AF230074
U89491
AK018332
AJ242954
AK005491
X89749
X70392
AF083215
NM_013713
U58882


Downregulated:

NM_026412
BC005475
X12944
NM_025415
AK019168
BC003903
NM_025754
AB017616
AK015017
NM_007774
U79144
AK005449
AK011341
AK003660
AF088910
NM_025319
AK003661
AK006236
AF155647
AK009100
NM_019671
AK010511
AK014208
AK018311
AF173681
AK003549
AF316872
AK002376
Y18365
NM_026535
AK007567
NM_026065
AF022223

What are the predominant gene ontologies in these sets?

Are some ontologies predominantly represented in one set compared to the other?

Exercise II:

Analyze the sample data sets provided in Fatigo with respect to function & ontology

Translate the lists to alternate ids

Exercise III:

Find which Caspase genes are on the human oncochip (using Medminer)?

MICROARRAY EXPRESSION DATABASES
 

EDGE: Environmental Database for Environmental Gene Expression. Contains databases and analyses of gene expression studies following
exposure to a variety of chemicals or physiological changes.

Gene Expression Atlas : Datasets from 45 mouse and 46 human tissue samples and cell lines.

Gene Expression Omnibus at NCBI. A gene expression and hybridization array data repository,

as well as an online resource for the retrieval of gene expression data from any organism or artificial source.
 

2. SERIAL ANALYSIS OF GENE EXPRESSION

Based on the isolation of unique sequence tags from individual transcripts. Serial analysis of gene expression, or SAGE, is a technique designed to take advantage of high-throughput sequencing technology to obtain a quantitative profile of cellular gene expression. Essentially, the SAGE technique measures not the expression level of a gene, but quantifies a "tag" which represents the transcription product of a gene. A tag, for the purposes of SAGE, is a nucleotide sequence of a defined length, directly 3'-adjacent to the 3'-most restriction site for a particular restriction enzyme. As originally described, the length of the tag was nine bases, and the restriction enzyme NlaIII. Current SAGE protocols produce a ten to eleven base tag, and, although NlaIII remains the most widely used restriction enzyme, enzyme substitutions are   possible. The data product of the SAGE technique is a list of tags, with their corresponding count values, and thus is a digital representation of cellular gene expression.

Schematic of SAGE Method:

sage

Serial analysis of Gene Expression (SAGE)

SAGEMap at NCBI
 

3. MASSIVELY PARALLEL SIGNATURE SEQUENCING (MPSS) : Megaclone (Lynx Corporation)


Megaclone™ technology uses a proprietary library of approximately 16.7 million short synthetic DNA sequences, called tags, and their complementary anti-tags, to uniquely mark and process each DNA molecule in a sample. Each unique tag is a permanent identifier of the DNA molecule it is attached to, and all of the tagged molecules in a sample are amplified together to create multiple copies of the tagged molecules. Another proprietary process is used to generate five-micron diameter micro-beads, each of which carries multiple copies of a short anti-tag        DNA sequence complementary to one of the 16.7 million tags. The amplified tagged DNA molecules are then collected onto the micro-beads through hybridization of the tags to the complementary anti-tags. Each micro-bead carries on its surface enough complementary anti-tags to collect
  approximately 100,000 identical copies of the corresponding tagged DNA molecule.

By this process, each tagged DNA molecule in the original sample is converted into a micro-bead carrying about 100,000 copies of the same sequence. Therefore, in a few steps, Megaclone™ technology can transform a complex mixture of a million or more identified DNA molecules into a usable format that provides the following:

  1.             substantially all the different DNA molecules present in a sample are represented in the final micro-bead collection
  2.             these million or more DNA molecules can be analyzed simultaneously in various applications
  3.             the need for storing and handling millions of individual DNA clones is eliminated

    (slightly modified from the Megaclone website)

Pictures at::  http://www.nature.com/nbt/journal/v18/n6/fig_tab/nbt0600_597_F1.html

                   http://www.nature.com/nbt/journal/v18/n6/fig_tab/nbt0600_597_F2.html

From:

T
aking a census of mRNA populations with microbeads:
Sanjay Tyagi, Nature Biotechnology 18, 597 - 598 (2000)

4. Random Activation of Gene Expression (RAGE)



The basic strategy of the RAGE technique is to use PCR methods to amplify fragments of the cDNAs present in two populations, e.g. experimental and control treatments of a single cell type, or wild type and transgenic cells, or tumor and normal cells. The amplifications are done under conditions where each cDNA gives rise to a unique amplification product (amplimer) and the amount of product formed is directly proportional to the concentration of template. The amount of amplimer produced in reactions from paired populations is then quantitated and  compared. This gives a measure of the relative expression of the chosen gene in the two cell populations.

Overview of technique from Global gene Expression Group, MD Anderson Cancer Center, Science Park

Description of RAGE and SAGE experiments

Summary by Athersys Corporation
 

LARGE SCALE TRANSCRIPTOME ANALYSIS

Large Scale analysis of the mouse and human transcriptomes. Gene Expression Atlas and Su et al. PNAS (2002) 99: 4465-4470

Profile of gene expression from 91 human and mouse samples across a diverse array of tissues, organs, and cell lines. Because these samples predominantly come from the normal physiological state in the human and mouse, this dataset represents a preliminary, but substantial, description of the normal mammalian transcriptome. Dataset was used to illustrate methods of mining these data, and to reveal insights into molecular and physiological gene function.

(modified from publication abstract)

TRANSCRIPTION FACTOR BINDING SITE IDENTIFICATION IN RELATION TO CONSERVED NONCODING SEQUENCES

Noncoding sequences conserved across species (Phylogenetic Footprints) often include important regulatory sequences.

Several tools are available for identification of these sites in collections of large genomic sequences:

ConSite: Explores transcription factor binding sites shared by two genomic sequences

PromH: Promoter identification using orthologous genomic sequences

MultiPipMaker and supporting tools: alignments and analysis of multiple genomic DNA sequences

RSAT: Collection of Regulatory Sequence Analysis Tools at Université Libre de Bruxelles.

THEATRE: a software tool for detailed comparative analysis and visualization of genomic sequence

Gene expression: Special issue  Science: October 22, 2004
    
OTHER MICROARRAY LINKS


Xenopus project
links


MICROARRAY MANUFACTURERS CORPORATE SITES


1.  Affymetrix - includes movie of chip manufacture, scientific papers, ....


SELECTIVE SILENCING OF TRANSCRIPTION 


Protein synthesis can be stifled  selectively  by small RNA strands  (mi- or SiRNAs) that prevent  translation  of target mRNAs

Small, noncoding RNA molecules mediate a posttranscriptional gene-silencing mechanism that regulates the expression of developmental genes by inhibiting the translation of target mRNAs. This mechanism is common to plants, fungi, and animals, and the generation of these microRNAs (miRNAs, also known as small inhibitory RNAs or siRNAs) involves a series of sequential steps, where primary RNA transcripts (pri-miRNAs) are cleaved in the nucleus to smaller pre-miRNAs. These are transported to the cytosol where Dicer, a member of the RNAse III nuclease family, further processes them to yield mature miRNAs. MiRNAs associate with multicomponent ribonucleoprotein complexes, or RISCs, which effect the silencing of the target mRNA molecules. (The Scientist September 25, 2003)

Registry at Sanger Institute: miRNA sequences can be identified on submitted sequences or specified chromosomes.

MicroRNA target prediction at Memorial Sloan-Kettering Cancer Center.

GeneACT: identifies TF binding sites and miRNA targets.

EXONIC SPLICING ENHANCERS

Proteins maintaining correct pre-mRNA splicing patterns.

Point mutations frequently cause genetic diseases by disrupting the correct pattern of pre-mRNA splicing.


May occur by inactivation of ESEs resulting in exon skipping.


ESEfinder:  Identification of exonic  splicing enhancers
.