Education page
BLAST tutorial

Setting up a BLAST search
Step 1.  Plan the search.
Decide the goal of the comparison and choose the most appropriate database and BLAST subroutine.
Goal/Question Database BLAST Program
Is the query sequence represented in the database? Choose a current nucleic acid database. Select from among organism-specific (e.g.: yeast), inclusive (e.g., nonredundant), or specialized set (e.g., dbEST, dbSTS, GSS, HTG) databases. blastn
Are there homologs or evolutionary relatives of the query sequence in the database? Are there proteins whose function is related to the query sequence? Choose a protein database if the query is protein or DNA expected to encode a protein because amino acid searches are more sensitive blastp for amino acid queries; blastx for translated nucleic acid queries. Use tblastn or tblastx for comparisons of an amino acid or translated nucleic acid query versus a translated nucleic acid database.


Step 2.  Enter the query sequence
  • Manual data entry. Type the data into the window.
  • OR copy and paste using FASTA formatted sequence which consists of a greater-than symbol (>) followed first by a single-line description and then (starting a new line) by the sequence data.
  • OR enter an NCBI Accession number or a Genbank Identification (gi) number.


  • Step 3.  Choose the appropriate search parameters or use default settings.
    Choosing Parameters for Protein-Based BLAST Searches.

      Default Special Cases
    Short Query Large Sequence Family Ungapped BLAST
    Filter on off on on
    Scoring Matrix BLOSUM62 PAM30 for 35 and under BLOSUM62BLOSUM62
    Word Size 33, or reduce to 2 3 3
    E value 10 1000 or more 10 10
    Gap costs 11,1 11,1 11,1 4
    Alignments 50 50 2000 50


    Filter
  • The default setting will filter repetitive or low-complexity sequences from the query using the SEG (protein) or DUST (nucleic acid) programs. If a low complexity region in the query is of interest, filtering will need to be turned off.
  • Other potentially problematic sequences such as coiled coil regions and transmembrane domains are not removed by SEG. These sequences should be removed manually. When retained in the query these types of sequences may become over-represented, obscuring more informative hits.
  • If the number of hits returned is small when searching with a short query, it may help to re-search with filtering turned off.
  • The Human repeat filter option human repeats such as LINEs and SINEs and is especially useful for human sequences that may contain these repeats. This option is still experimental and under development, so it may change in the near future.

  • Scoring matrices
  • BLOSUM62 is the default matrix. The BLOSUM matrix assigns a probability score for each position in an alignment that is based on the frequency with which that substitution is known to occur within conserved blocks of related proteins.
  • BLOSUM62 has been empirically shown to be among the best for detecting weak protein similarities.
  • Other supported options include PAM30, PAM70, BLOSUM80, and BLOSUM45.
  • The matrix should be chosen to optimize a search with a query of a given length.
  • More details
  • No alternate scoring matrices are available for BLASTN

  • Gap opening and gap extension penalties
  • A gap is a space introduced into an alignment to compensate for insertions and deletions in one sequence relative to another. To prevent the accumulation of too many gaps in an alignment, introduction of a gap causes the deduction of a fixed amount from the alignment score. Extension of the gap to encompass additional nucleotides or amino acid is also penalized in the scoring of an alignment.
  • BLAST 2.0 allows gaps in alignments by default. The gap option of BLAST 2.0 enables the program to detect local as well as global alignments and is suitable for most applications.
  • An ungapped search may be desirable when hits that align to the entire length of the query are most interesting. An ungapped search can be specified by checking the ungapped option box or by increasing the gap existence penalty to -4 (see advanced options in the tutorial).
  • Gap and Extension penalties are chosen empirically.
  • The default and other supported values are given in this summary table.

  • E value threshold
  • The E value for an alignment score "S" represents the number of hits with a score equal to or better than "S" that would be "expected" by chance (the background noise) when searching a database of a particular size.
  • In BLAST 2.0 the E value is used instead of a P value (probability) to report the significance of a match.
  • The default E value for blastn, blastp, blastx and tblastn is 10. At this setting, 10 hits with scores equal to or better than the defined alignment score, S, are expected to occur by chance (in a search of the database using a random query with similar length).
  • The E value can be increased or decreased to alter the stringency of the search.
  • Increase the E value to 1000 or more when searching with a short query, since it is likely to be found many times by chance in a given database. (See Deciphering the Output for more information on E values.)

  • Alignments
  • 50 alignments are shown by default.
  • If the number of alignments requested (x) is fewer than those exceeding the significance threshold only the top (x) hits will be reported.
  • To detect low-similarity matches, the number of alignments to be shown should be increased when searching with a member of a large sequence family.


  • Step 4.  Submit the query
  • For additional assistance with this step, see the step by step Query Tutorial developed to walk first time users through the process of submitting a query to the BLAST server.
  • Q-BLAST is a recently implemented sequence retrieval system that allows users to retrieve results at their convenience and format their results multiple times with different formatting options. Upon submission of your BLAST search request, a Request ID is returned on an intermediate page. The formatting parameters may be changed at this point, if desired. Select the "Format results" button to proceed to the Final Output page. The results will appear in another browser window. If they are not yet ready, they will be automatically re-requested at regular intervals. Most results will be held for up to 24 hours. A few very large result files will be deleted after 30 minutes.
  • Revised May 1, 2000

    BLAST tutorial glossary Query tutorial PSI-BLAST tutorial Guide BLAST information