VARIANTS OF BLAST SEARCH ALGORITHMS

1 BLAST (Advanced Blast)

Inserts gaps, output presents longer regions of homology between Subject and Query sequences

Sites: BLAST at NCBI - convenient but very busy at times!  At peak times try:
         WU-BLAST at Washington University; gives gapped, ungapped examples
         BLAST at GenomeNet, Kyoto University, Japan
       
         Sequerome

A web-based Sequence Profiling Tool
developed by the Bioinformatics and Computational Biosciences Unit (BCBU) at the Georgetown University - http://bioinformatics.georgetown.edu. This tool provides a unique and useful functionality of profiling each of the BLAST results to enable the users to extract the maximum out sequence alignment reports.

"Sequence homology searches are commonly performed using Basic Local Alignment Search Tool (BLAST; http://www.ncbi.nlm.nih.gov/BLAST). However, the BLAST results interface does not provide direct simple access to useful sequence analysis tools, such as restriction enzyme maps for DNA sequences and secondary structure prediction for protein sequences. Moreover, it is difficult to navigate smoothly between different sequence alignment records. Therefore, there exists a need for a single interface that would allow a user to directly link each of the sequences from alignment reports to different domains/servers offering analysis and manipulation options. To meet this need, we developed SEQUEROME, a Java-based tool that acts as a front-end to BLAST queries and provides simplified access to web-distributed resources for protein and nucleic acid analysis."

2. Position - Specific Iterated BLAST (PSI-BLAST)

Finds remote homologies by searching with Profiles, i.e. a description of several sequences at once.
A profile describes the probability of each nucleotide or aminoacid (Ala, Asp etc.) at each position in the multiple alignment of the sequence collection.

For instance considering the four related peptide sequences S1:

1 2 3 45 6
VABCDE
VABGDE
CABCDE
LAFCDE

Position 1 of set S1 can be described by the profile: {0.5V, 0.25C, 0.25L}.

What profile describes position 3?

For the sequences S2:

1
PFGHI
AFGHI

What profile describes position 1 of set S2?

Distant homologs of a sequence or set of related sequences such as S1 can be detected using by searching with the profile that describes the entire set rather than a single sequence.

In PSI-BLAST sequences with homology to the original sequence are identified in the first iteration.

In subsequent iterations searches are carried out with the profile of the original sequence and other user-specified homologs.

Example: What is the cost of matching the profiles of position 1 for sets S1 and S2 using the PAM250 matrix?

(Refer to the VSNS course, Chapter 3, for example and solution, and the PAM250 matrix).
 

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs" Nucleic Acids Res. 25:3389-3402.

Example 1:

Examination of 3-D protein structures of Histidine Triad (HIT) and Galactose - 1- phosphate uridylyltransferase (GalT) proteins suggests significant similarity.  A Blast search reveals homology mainly to other HIT proteins, however, and no mention of GalT.  Use PSI - BLAST using the human FHIT sequence below as Query to reveal HIT and GalT similarity.

>gi|7428051|pir||A58802 probable tumor suppressor FHIT - human

MSFRFGQHLIKPSVVFLKTELSFALVNRKPVVPGHVLVCPLRPVERFHDLRPDEVADLFQTTQRVG
TVVEKHFHGTSLTFSMQDGPEAGQTVKHVHVHVLPRKAGDFHRNDSIYEELQKHDKEDFPASWR
SEEEMAAEAAALRVYFQ

Example 2:

Use the Shope papilloma virus Uracil-DNA glycosylase:

>gi|418154|sp|P32941|UNG_SFVKA URACIL-DNA GLYCOSYLASE (UDG)
MRRVFLSHEPYVIEYHEDWENIITRLVDMYNEVAEWILKDDTSPTPDKFFKQLSVSLKDKRVCVCGIDPY
PRDATGVPFESHNFTKKTIKYIAETVSNITGVRYYKGYNLNNVEGVFPWNYYLSCKIGETKSHALHWKRI
SKLLLQHITKYVNVLYCLGKTDFANIRSILETPVTTVIGYHPAAREKQFEKDKGFEIVNVLLEINDKPSI
RWEQGFSY

as query in a PSI-BLAST search for the human ortholog

Hint: In iterations use only nonredundant sequences, e.g avoid:

For homologs in the same species limit the search to that & related species,
e.g. mammals or eukaryotes for human queries


For homologs in widely-separated phyla use hits from widely - diverged organisms in iterations

Sites: PSI-BLAST at NCBI

3.  Pattern Hit Initiated BLAST (PHI-BLAST)

The pattern-hit initiated BLAST (PHI-BLAST) program takes as input both a protein sequence and a pattern of interest that it contains.  PHI-BLAST searches a protein database for other instances of the input pattern, and uses those found as seeds for the construction of local alignments to the query sequence.

Example: The human cyclin inhibitor p57 (Swiss-Prot accession no.  P49918) contains the N-terminal motif LFGPVDHEEL that is believed to be critical for its function.  Use PHI-BLAST to detect homologs of this protein that contain the same motif. Compare the results with those from a standard BLASTP search.

Other examples: CED4-like cell death regulators and others  at:  Zhang, Zheng, Alejandro A. Schäffer, Webb Miller, Thomas L. Madden, David J. Lipman, Eugene V. Koonin, and Stephen F. Altschul (1998), "Protein sequence similarity searches using patterns as seeds", Nucleic Acids Res.26:3986-3990.

Sites: PHI-BLAST at NCBI

4.  BLAST 2 sequences against each other

Performs pairwise alignments using the BLAST algorithm. Compare results with other pairwise alignment tools such as ALION.

5. MEGABLAST

Compares long sequences such as genomes. Enables more than one sequence to be used as queries.  Sequences should be similar.