APPENDIX V

Ą@

Sequence Typing

As you work with the Wisconsin Package, you will find that some programs accept only nucleotide sequences while others accept only proteins. Many programs allow both nucleotide and protein sequences as input but perform their analysis differently depending on the input sequence type.

You can determine the type of a sequence by looking at the sequence file. Sequences in GCG format contain a dividing line between an optional text heading and the sequence data. Consider the following example of a typical dividing line:


Gamma.Seq Length: 11375 January 1, 1997 10:09 Type: N Checksum: 6474  ..

The sequence type should appear on the dividing line as either Type: N for nucleotide or Type: P for protein. (See "Types of Sequence Files" in Chapter 2, Using Sequence Files and Databases of the User's Guide for a complete description of sequence file formats.) Sequences created before version 7.0 of the Wisconsin Package (April 1991) do not have this Type: field on the dividing line. If the dividing line doesn't contain a Type: field, the Wisconsin Package infers the sequence type from the characters in the sequence. This inference may not always be correct.

In previous versions of the Wisconsin Package, you could ensure that programs inferred the correct sequence type by specifying the sequence type on the command line when you ran a program. However, starting with Version 8.0 of the Package, the sequence type is now an inherent part of the sequence; it cannot be changed from the command line.

If the Type: field of any sequence is incorrect or missing, you can correct it with the Reformat program. Type

% reformat /NUCleotide filename or

% reformat /PROtein filename