[ Program Manual | User's Guide | Data Files | Databases ]
This appendix lists and briefly describes programs in the Wisconsin Package. Programs are grouped by function and may appear under multiple functional headings. For more information on using these programs, see the Program Manual.
+ - Denotes a program that generates graphics which require a graphics output device.
Pairwise Comparison | |
Gap | Uses the algorithm of Needleman and Wunsch to find the alignment of two complete sequences that maximizes the number of matches and minimizes the number of gaps. |
BestFit | Makes an optimal alignment of the best segment of similarity between two sequences. Optimal alignments are found by inserting gaps to maximize the number of matches using the local homology algorithm of Smith and Waterman. |
FrameAlign | Creates an optimal alignment of the best segment of similarity (local alignment) between a protein sequence and the codons in all possible reading frames on a single strand of a nucleotide sequence. Optimal alignments may include reading frame shifts. |
Compare | Compares two protein or nucleic acid sequences and creates a file of the points of similarity between them for plotting with DotPlot. Compare finds the points using either a window/stringency or a word match criterion. The word comparison is 1,000 times faster than the window/stringency comparison, but somewhat less sensitive. |
DotPlot+ | Makes a dot-plot with the output file from Compare or StemLoop. |
GapShow+ | Displays an alignment by making a graph that shows the distribution of similarities and gaps. The two input sequences should be aligned with either Gap or BestFit before they are given to GapShow for display. |
ProfileGap | Makes an optimal alignment between a profile and one or more sequences. |
Multiple Comparison | |
PileUp+ | Creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. It can also plot a tree showing the clustering relationships used to create the alignment. |
SeqLab | Is the graphical user interface for the Wisconsin Package. For additional information, refer to the SeqLab Guide. |
PlotSimilarity+ | Plots the running average of the similarity among the sequences in a multiple sequence alignment. |
Pretty | Displays multiple sequence alignments and calculates a consensus sequence. It does not create the alignment; it simply displays it. |
PrettyBox+ | Displays multiple sequence alignments as shaded boxes in Postscript format for printing or displaying with a Postscript-compatible device. PrettyBox optionally calculates a consensus sequence. The program does not create the alignment; it simply displays it. |
MEME | (Multiple EM for Motif Elicitation) Finds conserved motifs in a group unaligned sequences. MEME saves these motifs as a set of profiles. You can search a database of sequences with these profiles using the MotifSearch program. |
ProfileMake | Creates a position-specific scoring table, called a profile, that quantitatively represents the information from a group of aligned sequences. The profile can then be used for database searching (ProfileSearch) or sequence alignment (ProfileGap). |
ProfileGap | Makes an optimal alignment between a profile and one or more sequences. |
Overlap | Compares two sets of DNA sequences to each other in both orientations using a WordSearch style comparison. |
NoOverlap | Identifies the places where a group of nucleotide sequences do not share any common subsequences. |
OldDistances | Makes a table of the pairwise similarities within a group of aligned sequences. |
Reference Searching | |
LookUp | Identifies sequence database entries by name, accession number, author, organism, keyword, title, reference, feature, definition, length, or date. The output is a list of sequences. |
StringSearch | Identifies sequences by searching for character patterns such as "globin" or "human" in the sequence documentation. |
Names | Identifies GCG data files and sequence entries by name. It can show you what set of sequences is implied by any sequence specification. |
Sequence Searching | |
BLAST | Searches for sequences similar to a query sequence. The query and the database searched can be either peptide or nucleic acid in any combination. BLAST can search databases on your own computer or databases maintained at the National Center for Biotechnology Information (NCBI) in Bethesda, Maryland, USA. |
NetBLAST | Searches for sequences similar to a query sequence. The query and the database searched can be either peptide or nucleic acid in any combination. NetBLAST can search only databases maintained at the National Center for Biotechnology Information (NCBI) in Bethesda, Maryland, USA. |
FastA | Does a Pearson and Lipman search for similarity between a query sequence and a group of sequences of the same type (nucleic acid or protein). For nucleotide searches, FastA may be more sensitive than BLAST. |
SSearch | Does a rigorous Smith-Waterman search for similarity between a query sequence and a group of sequences of the same type (nucleic acid or protein). This may be the most sensitive method available for similarity searches. Compared to BLAST and FastA, it is very slow. |
TFastA | Does a Pearson and Lipman search for similarity between a protein query sequence and any group of nucleotide sequences. TFastA translates the nucleotide sequences in all six reading frames before performing the comparison. It is designed to answer the question, "What implied protein sequences in a nucleotide sequence database are similar to my protein sequence?" |
TFastX | Does a Pearson and Lipman search for similarity between a protein query sequence and any group of nucleotide sequences, taking frameshifts into account. It is designed to be a replacement for TFastA, and like TFastA, it is designed to answer the question, "What implied protein sequences in a nucleotide sequence database are similar to my protein sequence?" TFastA treats each of the six reading frames of a nucleotide sequence as a separate sequence, resulting in three separate alignments for each strand. TFastX, on the other hand, compares the protein query sequence to only one translated protein per strand of the nucleotide sequence, resulting in one alignment per strand. It calculates a similarity score for alignments that takes frameshifts into account, allowing it to "join" short regions separated by frameshifts into a single long alignment. TFastX may alert you to more meaningful hits than TFastA does when the nucleotide sequences contain frameshift errors. |
FastX | Does a Pearson and Lipman search for similarity between a protein query sequence and any group of nucleotide sequences. TFastA translates the nucleotide sequences in all six reading frames before performing the comparison. It is designed to answer the question, "What implied protein sequences in a nucleotide sequence database are similar to my protein sequence?" |
FrameSearch+ | Searches a group of protein sequences for similarity to one or more nucleotide query sequences, or searches a group of nucleotide sequences for similarity to one or more protein query sequences. For each sequence comparison, the program finds an optimal alignment between the protein sequence and all possible codons on each strand of the nucleotide sequence. Optimal alignments may include reading frame shifts. |
MotifSearch | Uses a set of profiles (representing similarities within a family of sequences) as a query to either a) search a database for new sequences similar to the original family, or b) annotate the members of the the original family with details of the matches between the profiles and each of the members. Normally, the profiles are created with the program MEME. |
ProfileSearch | Uses a profile (representing a group of aligned sequences) as a query to search the database for new sequences with similarity to the group. The profile is created with the program ProfileMake. |
ProfileSegments | Makes optimal alignments showing the segments of similarity found by ProfileSearch. |
FindPatterns | Identifies sequences that contain short patterns like GAATTC or YRYRYRYR. You can define the patterns ambiguously and allow mismatches. You can provide the patterns in a file or simply type them in from the terminal. |
Motifs | Looks for sequence motifs by searching through proteins for the patterns defined in the PROSITE Dictionary of Protein Sites and Patterns. Motifs can display an abstract of the current literature on each of the motifs it finds. |
WordSearch+ | Identifies sequences in the database that share large numbers of common words in the same register of comparison with your query sequence. The output of WordSearch can be displayed with Segments. |
Segments | Aligns and displays the segments of similarity found by WordSearch. |
Sequence Retrieval | |
Fetch | Copies GCG sequences or data files from the GCG database into your directory or displays them on your terminal screen. |
NetFetch | Retrieves entries from NCBI listed in a NetBLAST output file. It can also be used to retrieve entries individually by entry name or accession number. The output of NetFetch is an RSF file. |
SeqEd | Is an interactive editor for entering and modifying sequences and for assembling parts of existing sequences into new genetic constructs. You can enter sequences from the keyboard or from a digitizer. |
SeqLab | Is the graphical user interface for the Wisconsin Package. For additional information, refer to the SeqLab Guide. |
Assemble | Constructs new sequences from pieces of existing sequences. It concatenates the fragments you specify and writes them out as a new sequence file. SeqEd is a better tool for assembling sequences interactively, but Assemble is best for assembling sequences from fragments defined in a list file. |
Pretty | Displays multiple sequence alignments and calculates a consensus sequence. It does not create the alignment; it simply displays it. |
PrettyBox+ | Displays multiple sequence alignments as shaded boxes in Postscript format for printing or displaying with a Postscript-compatible device. PrettyBox optionally calculates a consensus sequence. The program does not create the alignment; it simply displays it. |
Publish | Arranges sequences for publication. It creates a text file that you can modify to your own needs with a text editor. |
PlasmidMap+ | Draws a circular plot of a plasmid construct. It can display restriction patterns, inserts, and known genetic elements. The plot is suitable for publication, record keeping, or analysis. It is drawn from one or more labeling files such as those written by MapSort. |
LineUp | Is a screen editor for editing multiple sequence alignments. You can edit up to 30 sequences simultaneously. New sequences can be typed in by hand or added from existing sequence files. A consensus sequence identifies places where the sequences are in conflict. |
Figure+ | Makes figures and posters by drawing graphics and text together. You can include output from other Wisconsin Package graphics programs as part of a figure. |
Red | Is a text formatter that creates publication-quality documents on a PostScript printer such as the Apple LaserWriter. You can use 13 different fonts, scaling each font to any size. You can also include figures and graphics from any Wisconsin Package graphics program within the text of the document. |
PAUPSearch | Provides a GCG interface to the tree-searching options in PAUP (Phylogenetic Analysis Using Parsimony). Starting with a set of aligned sequences, you can search for phylogenetic trees that are optimal according to parsimony, distance, or maximum likelihood criteria; reconstruct a neighbor-joining tree; or perform a bootstrap analysis. The program PAUPDisplay can produce a graphical version of a PAUPSearch trees file. PAUP is the copyrighted property of the Smithsonian Institution. Use the program Fetch to obtain a copy of paup-license.txt to read about rights and limitations for using PAUP. |
PAUPDisplay+ | Provides a GCG interface to tree manipulation, diagnosis, and display options in PAUP (Phylogenetic Analysis Using Parsimony). Starting with a trees file that contains a sequence alignment and one or more trees reconstructed from this alignment (such as the output from PAUPSearch), you can plot the tree(s); compute the score of the tree(s) according to the criteria of parsimony, distance, or maximum likelihood; or calculate a consensus tree (two or more input trees). PAUPDisplay can also plot the trees from a GrowTree trees file. PAUP is the copyrighted property of the Smithsonian Institution. Use the program Fetch to obtain a copy of paup-license.txt to read about rights and limitations for using PAUP. |
Distances | Creates a table of the pairwise distances within a group of aligned sequences. |
GrowTree+ | Creates a phylogenetic tree from a distance matrix created by Distances using either the UPGMA or neighbor-joining method. You can create a text or graphics output file. |
Diverge | Estimates the pairwise number of synonymous and nonsynonymous substitutions per site between two or more aligned nucleic acid sequences that code for proteins. It uses a variant of the method published by Li et al. |
GelStart | Begins a fragment assembly session by creating a new fragment assembly project or by identifying an existing project. |
GelEnter | Adds fragment sequences to a fragment assembly project. It accepts sequence data from your terminal keyboard, a digitizer, or existing sequence files. |
GelMerge | Aligns the sequences in a fragment assembly project into assemblies called contigs. You can view and edit these assemblies in GelAssemble. |
GelAssemble | Is a multiple sequence editor for viewing and editing contigs assembled by GelMerge. |
GelView | Displays the structure of the contigs in a fragment assembly project. |
GelDisassemble | Breaks up the contigs in a fragment assembly project into single fragments. |
TestCode+ | Helps you identify protein coding sequences by plotting a measure of the non-randomness of the composition at every third base. The statistic does not require a codon frequency table. |
CodonPreference+ | Is a frame-specific gene finder that tries to recognize protein coding sequences by virtue of the similarity of their codon usage to a codon frequency table or by the bias of their composition (usually GC) in the third position of each codon. |
Frames+ | Shows open reading frames for the six translation frames of a DNA sequence. Frames can superimpose the pattern of rare codon choices if you provide it with a codon frequency table. |
Terminator | Searches for prokaryotic factor-independent RNA polymerase terminators according to the method of Brendel and Trifonov. |
Motifs | Looks for sequence motifs by searching through proteins for the patterns defined in the PROSITE Dictionary of Protein Sites and Patterns. Motifs can display an abstract of the current literature on each of the motifs it finds. |
MEME | (Multiple EM for Motif Elicitation) Finds conserved motifs in a group unaligned sequences. MEME saves these motifs as a set of profiles. You can search a database of sequences with these profiles using the MotifSearch program. |
Repeat | Finds direct repeats in sequences. You must set the size, stringency, and range within which the repeat must occur; all the repeats of that size or greater are displayed as short alignments. |
FindPatterns | Identifies sequences that contain short patterns like GAATTC or YRYRYRYR. You can define the patterns ambiguously and allow mismatches. You can provide the patterns in a file or simply type them in from the terminal. |
Composition | Determines the composition of sequence(s). For nucleotide sequence(s), Composition also determines dinucleotide and trinucleotide content. |
CodonFrequency | Tabulates codon usage from sequences and/or existing codon usage tables. The output file is correctly formatted for input to the CodonPreference, Correspond, and Frames programs. |
Correspond | Looks for similar patterns of codon usage by comparing codon frequency tables. |
Window | Makes a table of the frequencies of different sequence patterns within a window as it is moved along a sequence. A pattern is any short sequence like GC or R or ATG. You can plot the output with the program StatPlot. |
StatPlot+ | Plots a set of parallel curves from a table of numbers like the table written by the Window program. The statistics in each column of the table are associated with a position in the analyzed sequence. |
FitConsensus | Uses a consensus table written by Consensus as a probe to find the best examples of the consensus in a DNA sequence. You can specify the number of fits you want to see, and FitConsensus tabulates them with their position, frame, and a statistical measure of their quality. |
Consensus | Calculates a consensus sequence for a set of pre-aligned short nucleic acid sequences by tabulating the percent of G, A, T, and C for each position in the set. FitConsensus uses the Consensus output table as a probe to search for the best examples of the derived consensus in other nucleotide sequences. |
Xnu | Replaces statistically significant tandem repeats in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored. |
Seg | Replaces low complexity regions in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored. |
Reformat | Rewrites sequence file(s), scoring matrix file(s), or enzyme data file(s) so that they can be read by GCG programs. |
BreakUp | Reads a GCG-format sequence file containing more than 350,000 sequence characters and writes it as a set of separate, shorter, overlapping sequence files that can be analyzed by Wisconsin Package programs. |
ChopUp | Converts a non-GCG sequence file containing lines as long as 32,000 characters into a new file containing lines no longer than 50 characters. The new file can be read by Reformat to create a GCG-format sequence file. |
FromStaden | Changes a sequence from Staden format into GCG format. If the file contains a nucleotide sequence, the ambiguity codes are converted as shown in Appendix III of the Program Manual. |
FromEMBL | Reformats sequences from the distribution (flat file) format of the EMBL database into individual sequence files in GCG format. |
FromGenBank | Reformats one or more sequences in the flat file format of the GenBank database into individual sequence files in GCG format. |
FromPIR | Reformats sequences from the protein database of the Protein Identification Resource (PIR) into individual files in GCG format. |
FromIG | Reformats one or more sequences from IntelliGenetics format into individual files in GCG format. |
FromFasta | Reformats one or more sequences from FastA format into individual files in GCG format. |
ToStaden | Writes a GCG sequence into a file in Staden format. If the file contains a nucleotide sequence, the ambiguity codes are converted as shown in Appendix III of the Program Manual. |
ToPIR | Writes GCG sequence(s) into a single file in PIR format. |
ToIG | Converts GCG sequence file(s) into a single file in IntelliGenetics format. |
ToFastA | Converts GCG sequence(s) into FastA format. |
GetSeq | Reads a sequence from a computer that is acting as a terminal and writes it into a new sequence file in GCG format on the computer running the Wisconsin Package. |
Spew | Sends a GCG sequence from the computer that runs the Wisconsin Package to a personal computer acting as a terminal. |
Map | Maps a DNA sequence and displays both strands of the mapped sequence with restriction enzyme cut points above the sequence and protein translations below. Map can also create a peptide map of an amino acid sequence. |
MapPlot+ | Displays restriction sites graphically. If you don't have a plotter, MapPlot can write a text file that approximates the graph. |
MapSort | Finds the coordinates of the restriction enzyme cuts in a DNA sequence and sorts the fragments of the resulting digest by size. MapSort can sort the fragments from single or multiple enzyme digests. |
FingerPrint | Identifies the products of T1 ribonuclease digestion. |
PeptideMap | Creates a peptide map of an amino acid sequence. |
PlasmidMap+ | Draws a circular plot of a plasmid construct. It can display restriction patterns, inserts, and known genetic elements. The plot is suitable for publication, record keeping, or analysis. It is drawn from one or more labeling files such as those written by MapSort. |
PeptideSort | Shows the peptide fragments from a digest of an amino acid sequence. It sorts the peptides by weight, position, and HPLC retention at pH 2.1, and shows the composition of each peptide. It also prints a summary of the composition of the whole protein. |
Prime+ | Selects oligonucleotide primers for a template DNA sequence. The primers may be useful for the polymerase chain reaction (PCR) or for DNA sequencing. You can allow Prime to choose primers from the whole template or limit the choices to a particular set of primers listed in a file. |
Motifs | Looks for sequence motifs by searching through proteins for the patterns defined in the PROSITE Dictionary of Protein Sites and Patterns. Motifs can display an abstract of the current literature on each of the motifs it finds. |
ProfileScan | Uses a database of profiles to find structural and sequence motifs in protein sequences. |
CoilScan | Locates coiled-coil segments in protein sequences. |
HTHScan | Scans protein sequences for the presence of helix-turn-helix motifs, indicative of sequence-specific DNA-binding structures often associated with gene regulation. |
SPScan | Scans protein sequences for the presence of secretory signal peptides (SPs). |
PeptideSort | Shows the peptide fragments from a digest of an amino acid sequence. It sorts the peptides by weight, position, and HPLC retention at pH 2.1, and shows the composition of each peptide. It also prints a summary of the composition of the whole protein. |
Isoelectric+ | Plots the charge as a function of pH for any peptide sequence. |
PeptideMap | Creates a peptide map of an amino acid sequence. |
PepPlot+ | Plots measures of protein secondary structure and hydrophobicity in parallel panels of the same plot. |
PeptideStructure | Makes secondary structure predictions for a peptide sequence. The predictions include (in addition to alpha, beta, coil, and turn) measures for antigenicity, flexibility, hydrophobicity, and surface probability. PlotStructure displays the predictions graphically. |
PlotStructure+ | Plots the measures of protein secondary structure in the output file from PeptideStructure. The measures can be shown on parallel panels of a graph or with a two-dimensional "squiggly" representation. |
Moment+ | Makes a contour plot of the helical hydrophobic moment of a peptide sequence. |
HelicalWheel+ | Plots a peptide sequence as a helical wheel to help you recognize amphiphilic regions. |
Xnu | Replaces statistically significant tandem repeats in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored. |
Seg | Replaces low complexity regions in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored. |
MFold | Predicts optimal and suboptimal secondary structures for an RNA or DNA molecule using the most recent energy minimization method of Zuker. |
PlotFold+ | Displays the optimal and suboptimal secondary structures for an RNA or DNA molecule predicted by MFold. |
StemLoop | Finds stems (inverted repeats) within a sequence. You specify the minimum stem length, minimum and maximum loop sizes, and the minimum number of bonds per stem. All loops or only the best loops can be displayed on your screen or written into a file. |
DotPlot+ | Makes a dot-plot with the output file from Compare or StemLoop. |
Translate | Translates nucleotide sequences into peptide sequences. |
BackTranslate | Backtranslates an amino acid sequence into a nucleotide sequence. The output helps you recognize minimally ambiguous regions that might be good for constructing synthetic probes. |
Map | Maps a DNA sequence and displays both strands of the mapped sequence with restriction enzyme cut points above the sequence and protein translations below. Map can also create a peptide map of an amino acid sequence. |
ExtractPeptide | Writes a peptide sequence from one or more of the translation frames displayed in the output from Map. Translate supercedes ExtractPeptide for most applications. |
Pepdata | Translates DNA sequence(s) in all six frames. |
Reverse | Reverses and/or complements a sequence. |
Dataset | Creates a GCG data library from any set of sequences in GCG format. To translate nucleotide sequences into peptide sequences, include the ToProt parameter. |
Sequence Utilities | |
Reverse | Reverses and/or complements a sequence. |
Shuffle | Randomizes the order of the symbols in a sequence without changing the composition. |
Simplify | Lets you reduce the number of symbols in a sequence. Such a simplification would allow you, for instance, to treat all hydrophobic amino acids as equivalent. |
Comptable | Creates a scoring matrix using equivalences defined in a simplification scheme such as the one used for Simplify. |
Corrupt | Randomly introduces small numbers of substitutions, insertions, and deletions into nucleotide sequence(s). |
Xnu | Replaces statistically significant tandem repeats in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored. |
Seg | Replaces low complexity regions in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored. |
Sample | Extracts sequence fragments randomly from sequence(s). You can set a sampling rate to determine how many fragments Sample extracts. |
Database Utilities | |
DataSet | Creates a GCG data library from any set of sequences in GCG format. |
GCGtoBLAST | Combines any set of GCG sequences into a database that you can search with BLAST. |
Sample | Extracts sequence fragments randomly from sequence(s). You can set a sampling rate to determine how many fragments Sample extracts. |
Printing / Plotting Utilities | |
LPrint | Prints text file(s) on a PostScript printer connected to LPrintPort. |
ListFile | Prints a file on a printer attached to your terminal's pass-through printer port. |
SetPlot | Allows you to choose a plotting configuration from a menu of available graphics devices at your site. |
Figure+ | Makes figures and posters by drawing graphics and text together. You can include output from other Wisconsin Package graphics programs as part of a figure. |
PlotTest+ | Plots a test pattern to test of your graphics configuration. The pattern created by PlotTest uses every Wisconsin Package graphics feature. It should resemble the example test pattern in the documentation for PlotTest in the Program Manual. |
File Utilities | |
Chopup | Converts a non-GCG sequence file containing lines as long as 32,000 characters into a new file containing lines no longer than 50 characters. The new file can be read by Reformat to create a GCG-format sequence file. |
Replace | Makes character string replacements in text file(s). You provide a table of replacements in a file showing each existing string and its replacement. |
CompressText | Removes any or all of the following from files: A) trailing space; B) blank lines; C) extra space between words; D) all space; or E) leading space. |
OneCase | Puts all of the alphabetic characters in a file into lower or UPPER case. It can also capitalize every word. |
ShiftOver | Moves a file to the right or to the left as many columns as you specify. |
Detab | Replaces the tab characters in one or more files with spaces. The files can be written out in card-image format with records of fixed length. |
Miscellaneous Utilities | |
SetKeys | Writes a file in your current directory that redefines your keyboard's keys for easier sequence entry with the SeqEd, LineUp, GelEnter and GelAssemble programs and the SeqLab sequence editor. The output file, called set.keys, can be edited if you want to redefine keys that were not considered by the SetKeys program. |
Reformat | Rewrites sequence file(s), scoring matrix file(s), or enzyme data file(s) so that they can be read by GCG programs. |
Red | Is a text formatter that creates publication-quality documents on a PostScript printer such as the Apple LaserWriter. You can use 13 different fonts, scaling each font to any size. You can also include figures and graphics from any Wisconsin Package graphics program within the text of the document. |
Name | Creates, changes, deletes, or displays GCG logical name(s) from the GCG logical names table. |
Symbol | Creates, changes, deletes, or displays GCG symbol(s) from the GCG symbol table. |
[ Program Manual | User's Guide | Data Files | Databases ]
Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com
Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997, 1998 Genetics Computer Group Inc., a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.
Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.