APPENDIX V
¡@
Sequence Typing
As you work with the
Wisconsin Package, you will find
that some programs accept only
nucleotide
sequences while others accept only
proteins. Many programs allow
both nucleotide and protein sequences
as
input but perform their analysis
differently depending on the input
sequence type.
You can determine the type
of a sequence by looking
at the sequence file.
Sequences in GCG format
contain a dividing line between
an optional text heading and
the sequence data. Consider
the following
example of a typical dividing
line:
Gamma.Seq Length: 11375 January 1, 1997 10:09 Type: N Checksum: 6474 ..
-
The sequence type should
appear on the dividing line
as either Type: N for nucleotide or
Type: P for protein. (See "Types
of Sequence Files" in Chapter 2,
Using Sequence Files and Databases
of
the User's Guide for a
complete description of sequence file
formats.) Sequences created before version 7.0
of
the Wisconsin Package (April 1991)
do not have this Type:
field on the dividing line.
If the dividing line
doesn't contain a Type: field,
the Wisconsin Package infers the
sequence type from the characters
in the
sequence. This inference may
not always be correct.
In previous versions of the
Wisconsin Package, you could ensure
that programs inferred the correct
sequence type by specifying the
sequence type on the command
line when you ran a
program. However,
starting with Version 8.0 of
the Package, the sequence type
is now an inherent part
of the sequence; it cannot
be
changed from the command line.
If the Type: field of
any sequence is incorrect or
missing, you can correct it
with the Reformat
program. Type
% reformat /NUCleotide filename or
% reformat /PROtein filename