Return to MAST introduction.
Each section of the results file contains an explanation of how
to interpret them.
When nucleotide sequences are searched, the strand (+ or -) is
indicated. When nucleotide sequences are searched with peptide
motifs, the reading frame (a, b or c) of the best matches is
is also indicated. Matches are not all required to be in the
same reading frame but must all be on the same strand.
In the
ANNOTATED SEQUENCES
section of the output, diagrams are shown like this:
When nucleotide databases are searched, all matches must be on
the same strand and the strand (+ or -) is indicated in the output.
When peptide motifs are used to search nucleotide sequences,
the reading frame (a, b or c) of each match is indicated next to the
motif numbers in the motif diagrams found in the ANNOTATED SEQUENCES
section of the output.
For example,
When peptide motifs are used to search nucleotide sequences,
the reading frame (a, b or c) of each match is indicated with the
motif number and the peptide translation of the matching sequence is
shown just above the motif occurrence.
Search using MAST
Confirmation message
The first e-mail message you receive should be a confirmation message
to let you know that your search request has been received.
You should receive an e-mail message that looks something like this:
Subject: MAST confirmation: alcohol dehydrogenase motifs
Your MAST search request 14019 is being processed:
Motif file: adh
Database to search: SwissProt
If you fail to receive the confirmation message, check your e-mail
address and try resubmitting your MAST request.
Search Results
The second e-mail message you should receive contains the results of the MAST
search. It contains:
Match Scores
The match score of a motif to a position in a sequence is the sum of the
score from each row of the position-dependent scoring matrix corresponding
to the letter at that position in the sequence. For example, if the sequence
is
TAATGTTGGTGCTGGTTTTTGTGGCATCGGGCGAGAATAGCGC
========
and the motif is represented by the position-dependent scoring matrix (where
each row of the matrix corresponds to a position in the motif)
=========|=================================
POSITION | A C G T
=========|=================================
1 | 1.447 0.188 -4.025 -4.095
2 | 0.739 1.339 -3.945 -2.325
3 | 1.764 -3.562 -4.197 -3.895
4 | 1.574 -3.784 -1.594 -1.994
5 | 1.602 -3.935 -4.054 -1.370
6 | 0.797 -3.647 -0.814 0.215
7 |-1.280 1.873 -0.607 -1.933
8 |-3.076 1.035 1.414 -3.913
=========|=================================
then the match score of the fourth position in the sequence (underlined)
would be found by summing the score for T in position 1, G in
position 2 and so on until G in position 8. So
the match score would be
score = -4.095 + -3.945 + -3.895 + -1.994
+ -4.054 + -0.814 + -1.933 + 1.414
= -19.316
The match scores for other positions in the sequence are calculated
in the same way. Match scores are only calculated if the match completely
fits within the sequence. Match scores are not calculated if the
motif would overhang either end of the sequence.
P-values
MAST reports all matches of a sequence to a motif or group of
motifs in terms of the p-value of the match. MAST considers the
p-values of four types of events:
All p-values are based on a random sequence model that assumes
each position in a random sequence is generated according to the
average letter frequencies of all sequences in the
the appropriate (peptide or nucleotide)
non-redundant database
(ftp://ncbi.nlm.nih.gov/blast/db/)
on September 22, 1996.
Position p-value
The p-value of a match of a given position
within a sequence to a motif is defined as the probability
of a randomly selected position in a randomly generated sequence
having a
match score at least as large as that of
the given position.
Sequence p-value
The p-value of a match of a sequence
to a motif is defined as the probability of a randomly generated
sequence of the same length having a match score at least as large
as the largest match score of any position in the sequence.
Combined p-value
The p-value of a match of a sequence to
a group of motifs is defined as the probability of
a randomly generated sequence of the same length having
sequence p-values whose product
is at least as small as the product of the sequence p-values
of the matches of the motifs to the given sequence.
E-value
The e-value of the match of a sequence in a database to a
a group of motifs is defined as the
expected number of sequences in a random database of the same size
that would match the motifs as well as the sequence does and is equal
to the combined p-value of the sequence times the number of sequences
in the database.
Database and Motifs
This section shows information on the database that was searched
and the motifs in the search query. The database section gives
the date the database was last updated as well as the number
of sequences and total sequence characters in it.
The motifs are listed by motif number. The width and
subsequence which would be given the best possible score
for each motif is shown.
If there is more than one motif in the query, all pairwise
correlations between the motifs are shown. The correlations
can range from -1 to +1, with +1 meaning that the shorter motif
is exactly identical to part or all of the longer motif. High
correlations can cause some combined p-values and e-values
to be inaccurate (too low). It may be advisable to remove enough
motifs from the query to insure that no pairs of motifs have high
correlations. Any high correlations are indicated along
with the suggestion that one of the motifs be removed from the query.
High-scoring Sequences
MAST lists the names and part of the descriptive text
of all sequences whose
e-value is less than E.
Sequences shorter than one or more of the motifs are skipped.
The sequences are sorted by increasing e-value.
The value of E is set to 10 for the WEB server but is
user-selectable in the down-loadable version of MAST.
Motif Diagrams
Motif diagrams show the order and spacing of non-overlapping
matches to the motifs in each high-scoring sequence.
Motif occurrences are determined based on
the position p-value of matches to the motif.
In the MOTIF DIAGRAMS
section of the output, diagrams are shown like this:
6
4
3
5
7
27-[3]-44-<4>-99-[1]-7
97-[6b]-17-[4a]-36-[3a]-45-[5a]-96-[7a]-59
Annotated Sequences
MAST annotates each high-scoring sequence by printing
the sequence along with the position and strength of all
the non-overlapping motif occurrences.
The four lines above each motif occurrence contain,
respectively,
The best possible match to a motif is the sequence of letters
which would achieve the highest match score.
Sample MAST Search Results
Here is an actual
MAST search results file of a search of a nucleotide database
with peptide motifs. It has been edited
slightly to reduce its size by removing most of the 832 sequences
which matched the motifs.
MAST introduction
MEME SYSTEM introduction