Analysis of Transcription on a genomic scale is now possible
using several methodologies, including:
1. MICROARRAY TECHNOLOGIES
Lecture in NHGRI course, 2006
Supplements to Nature Genetics Volume 21, January 1999
and 32,
December 2002
The Chipping Forecast is a collection of reviews on different
aspects of microarray analysis and is
publicly available online.
The Brown Lab Home Page at Stanford - Seminal Microarray experiments, Instructions for building your own Microarrayer,etc.
Application of microarrays to analysis of gene expression
in human mammary epithelial cells and cancers
Perou
et al. PNAS 1999
Gene Expression Patterns in human cancer cell lines
Ross et
al. Nature Genetics March 2000
APPLICATIONS OF MICROARRAYS
1. The identification of genetic signatures and hence cellular targets of drug, pathogen or toxicant exposure.
2. Hypothesis-generating attempts to assign probable functions
to newly identified genes (for example, by comparison with the expression
patterns of known genes: gene products
with similar expression profiles (e.g. with respect to time) may interact
or act in a pathway)
3. Identification of therapeutic targets e.g. by comparing the actions of multiple agents on multiple cell lines under multiple conditions
4. The delineation of complex patterns of gene expression
that provide a potentially 'pathogenomonic' molecular phenotype
e.g. to distinguish subtypes of neoplasms
5. Personalized medicine, in molecular diagnosis of disease
and in predicting drug efficacy and toxicity in different individuals (pharmaco-
or
toxicogenomics)
6. Elucidation of global mechanisms by which cells or organisms
respond to environmental changes (e.g. temperature, nutrient concentration).
MICROARRAY DATA ANALYSIS & VISUALIZATION METHODS
The large data sets generated from microarray experiments
necessitates methods for visualization and relation of data to Gene Ontology and cellular pathways
(metabolic, signal transduction, etc.).
Tools at Gene
Ontology Home Page.
FatiGO - data mining with Gene Ontology, compares response sets.
PubGene - analyzes expression data in the context of literature or sequence associations.
GenMAPP - visualization of gene expression data on maps representing biological pathways and groupings of genes and
MappFinder - Automatically analyzes gene expression/proteomic
profiling data in the context of the Gene Ontology hierarchy and
GenMAPP biological pathways.
Pathway
Miner - Extracts gene association networks from molecular pathways
for predicting the biological significance of gene expression microarray
data.
Maps to KEGG, BioCarta and GenMAPP pathways and data.
Bioconductor:
an open source and open development software project for the analysis and
comprehension of genomic data.
MedMiner, Matchminer, GOMiner, CIMMiner
Medminer extracts and organize relevant sentences in the
literature based on a gene, gene-gene or gene-drug query
MatchMiner is a set of tools that enables the user to translate between disparate ids for the same gene. It uses data from the UCSC, LocusLink, Unigene, OMIM, Affymetrix and Jackson data sources to determine how different ids relate. Supported id types include, gene symbols and names, IMAGE and FISH clones, GenBank accession numbers and UniGene cluster ids.
GoMiner is a tool for biological interpretation of 'omic' data –
including data from gene expression microarrays.
CIMMiner generates color-coded Clustered Image Maps (CIMs) ("heat
maps") to represent "high-dimensional" data sets such as gene expression
profiles.
Onto-Tools - set of four integrated
databases:
Onto-Express: translates lists of genes differentially regulated
into functional profiles,
Onto-Compare: allows determination of which array or set of arrays
covers hypothesis studied.
Onto-Design: allows selection of genes that represent given functional
categories
Discovering Regulatory Modules and their Context Specific Regulators from Gene Expression Data - Stanford course, Eran Segal,
Project report "Survey of Computational Algorithms to Analyze Patterns of Gene Expression from cDNA/DNA chip Microarray Data" by Peter Feng. (Contain links to important experiments)
Analysis of Regulation of Yeast Gene Expression at Church Lab
XENOPUS at the Rockefeller University
Exercise
1: the following genes were found to be up- or downregulated
in mouse lung following toxicant exposure:
Upregulated:
AB030252 |
NM_020570 |
U58633 |
AF233333 |
AF126063 |
M17243 |
AJ132098 |
NM_011580 |
AK005731 |
X70298 |
AF177146 |
U12785 |
U03283 |
AK003950 |
U19463 |
X62622 |
D00208 |
U75215 |
X99641 |
D16464 |
NM_020510 |
AK007483 |
AB031386 |
AF230074 |
U89491 |
AK018332 |
AJ242954 |
AK005491 |
X89749 |
X70392 |
AF083215 |
NM_013713 |
U58882 |
Downregulated:
NM_026412 |
BC005475 |
X12944 |
NM_025415 |
AK019168 |
BC003903 |
NM_025754 |
AB017616 |
AK015017 |
NM_007774 |
U79144 |
AK005449 |
AK011341 |
AK003660 |
AF088910 |
NM_025319 |
AK003661 |
AK006236 |
AF155647 |
AK009100 |
NM_019671 |
AK010511 |
AK014208 |
AK018311 |
AF173681 |
AK003549 |
AF316872 |
AK002376 |
Y18365 |
NM_026535 |
AK007567 |
NM_026065 |
AF022223 |
What are the predominant gene ontologies in these sets?
Are some ontologies predominantly represented in one set
compared to the other?
Exercise
II:
Analyze the sample data sets provided in Fatigo with respect
to function & ontology
Translate the lists to alternate ids
Exercise
III:
Find which Caspase genes are on the human oncochip (using
Medminer)?
MICROARRAY EXPRESSION DATABASES
EDGE:
Environmental Database for Environmental Gene Expression. Contains databases
and analyses of gene expression studies following
exposure to a variety of chemicals or physiological changes.
Gene Expression Atlas : Datasets from 45 mouse and 46 human tissue samples and cell lines.
Gene Expression
Omnibus at NCBI. A gene expression and hybridization array data repository,
as well as an online resource for the retrieval of gene
expression data from any organism or artificial source.
2. SERIAL ANALYSIS OF GENE EXPRESSION
Based on the isolation of unique sequence tags from individual transcripts. Serial analysis of gene expression, or SAGE, is a technique designed to take advantage of high-throughput sequencing technology to obtain a quantitative profile of cellular gene expression. Essentially, the SAGE technique measures not the expression level of a gene, but quantifies a "tag" which represents the transcription product of a gene. A tag, for the purposes of SAGE, is a nucleotide sequence of a defined length, directly 3'-adjacent to the 3'-most restriction site for a particular restriction enzyme. As originally described, the length of the tag was nine bases, and the restriction enzyme NlaIII. Current SAGE protocols produce a ten to eleven base tag, and, although NlaIII remains the most widely used restriction enzyme, enzyme substitutions are possible. The data product of the SAGE technique is a list of tags, with their corresponding count values, and thus is a digital representation of cellular gene expression.
Schematic of SAGE Method:
Serial analysis of Gene Expression (SAGE)
SAGEMap
at NCBI
3. MASSIVELY PARALLEL SIGNATURE SEQUENCING (MPSS)
: Megaclone
(Lynx Corporation)
Megaclone™ technology uses a proprietary library of approximately 16.7
million short synthetic DNA sequences, called tags, and their complementary
anti-tags, to uniquely mark and process each DNA molecule in a sample. Each
unique tag is a permanent identifier of the DNA molecule it is attached to,
and all of the tagged molecules in a sample are amplified together to create
multiple copies of the tagged molecules. Another proprietary process is used
to generate five-micron diameter micro-beads, each of which carries multiple
copies of a short anti-tag DNA sequence
complementary to one of the 16.7 million tags. The amplified tagged DNA molecules
are then collected onto the micro-beads through hybridization of the tags
to the complementary anti-tags. Each micro-bead carries on its surface enough
complementary anti-tags to collect approximately
100,000 identical copies of the corresponding tagged DNA molecule.
By this process, each tagged DNA molecule in the original sample is converted into a micro-bead carrying about 100,000 copies of the same sequence. Therefore, in a few steps, Megaclone™ technology can transform a complex mixture of a million or more identified DNA molecules into a usable format that provides the following:
(slightly modified from the Megaclone
website)
Pictures at:: http://www.nature.com/nbt/journal/v18/n6/fig_tab/nbt0600_597_F1.html
http://www.nature.com/nbt/journal/v18/n6/fig_tab/nbt0600_597_F2.html
4. Random Activation of Gene Expression (RAGE)
The basic strategy of the RAGE technique
is to use PCR methods to amplify fragments of the cDNAs present in two populations,
e.g. experimental and control treatments of a single cell type, or wild
type and transgenic cells, or tumor and normal cells. The amplifications
are done under conditions where each cDNA gives rise to a unique amplification
product (amplimer) and the amount of product formed is directly proportional
to the concentration of template. The amount of amplimer produced in reactions
from paired populations is then quantitated and compared. This gives
a measure of the relative expression of the chosen gene in the two cell
populations. |
Overview
of technique from Global gene Expression Group, MD Anderson Cancer Center,
Science Park
Description of RAGE and SAGE experiments
Summary by Athersys
Corporation
LARGE SCALE TRANSCRIPTOME ANALYSIS
Large Scale analysis of the mouse and human transcriptomes. Gene Expression Atlas and Su et al. PNAS (2002) 99: 4465-4470
Profile of gene expression from 91 human and mouse samples across a diverse array of tissues, organs, and cell lines. Because these samples predominantly come from the normal physiological state in the human and mouse, this dataset represents a preliminary, but substantial, description of the normal mammalian transcriptome. Dataset was used to illustrate methods of mining these data, and to reveal insights into molecular and physiological gene function.
(modified from publication abstract)
TRANSCRIPTION FACTOR BINDING SITE IDENTIFICATION IN RELATION TO CONSERVED NONCODING SEQUENCES
Noncoding sequences conserved across species (Phylogenetic Footprints) often include important regulatory sequences.
Several tools are available for identification of these sites in collections of large genomic sequences:
ConSite: Explores transcription factor binding sites shared by two genomic sequences
PromH:
Promoter identification using orthologous genomic sequences
MultiPipMaker and
supporting tools: alignments and analysis of multiple genomic DNA sequences
RSAT: Collection
of Regulatory Sequence Analysis Tools at Université Libre de Bruxelles.
THEATRE:
a software tool for detailed comparative analysis and visualization
of genomic sequence
Gene expression: Special issue Science: October
22, 2004
OTHER MICROARRAY LINKS
Xenopus project
links
MICROARRAY MANUFACTURERS CORPORATE SITES
1. Affymetrix - includes
movie of chip manufacture, scientific papers, ....
SELECTIVE SILENCING OF TRANSCRIPTION
Protein synthesis can be stifled selectively by small
RNA strands (mi- or SiRNAs) that prevent translation of
target mRNAs
Registry
at Sanger Institute: miRNA sequences can be identified on submitted sequences
or specified chromosomes.
MicroRNA target prediction
at Memorial Sloan-Kettering Cancer Center.
GeneACT: identifies
TF binding sites and miRNA targets.