Assemble constructs new sequences from pieces of existing sequences. It concatenates the fragments you specify and writes them out as a new sequence file. SeqEd is a better tool for assembling sequences interactively, but Assemble is best for assembling sequences from fragments defined in a list file.
Assemble lets you choose segments from existing sequences. The segments can be of any length and can come from either strand. Unlike most GCG programs, Assemble lets you specify segments that extend across the end and into the beginning of the sequence. Assemble concatenates all of the segments you specify in the order in which you specify them and then writes the resulting construct into a new sequence file.
You can specify each segment interactively, one after another, in response to the program prompts, or alternatively, you can identify the segments to be assembled in a list file.
Here is a session using Assemble to assemble a coding sequence from the file gamma.seq interactively:
% assemble ASSEMBLE from what sequence ? gamma.seq Begin (* 1 *) ? 2179 End (* 11375 *) ? 2270 Reverse (* No *) ? That range begins ATGGG and ends GGAAG. Is this correct (* Yes *) ? That is done, now would you like to: A)dd another segment from this sequence G)et segments from another sequence W)rite out this assembly into a file Please choose one (* W *): a Begin (* 1 *) ? 2393 End (* 11375 *) ? 2615 Reverse (* No *) ? That range begins GCTCC and ends TCAAG. Is this correct (* Yes *) ? That is done, now would you like to: A)dd another segment from this sequence G)et segments from another sequence W)rite out this assembly to file Please choose one (* W *): a Begin (* 1 *) ? 3502 End (* 11375 *) ? 3630 Reverse (* No *) ? That range begins CTCCT and ends ACTGA. Is this correct (* Yes *) ? That is done, now would you like to: A)dd another segment from this sequence G)et segments from another sequence W)rite out this assembly to file Please choose one (* W *): What should I call the output file (* gamma.seg *) ? %
Here is some of the output file:
!!NA_SEQUENCE 1.0 ASSEMBLE October 5, 1998 10:13 Symbols: 1 to: 92 from: gamma.seq ck: 6474, 2179 to: 2270 Symbols: 93 to: 315 from: gamma.seq ck: 6474, 2393 to: 2615 Symbols: 316 to: 444 from: gamma.seq ck: 6474, 3502 to: 3630 Human fetal beta globins G and A gamma from Shen, Slightom and Smithies, Cell 26; 191-203. Analyzed by Smithies et al. Cell 26; 345-353. gamma.seg Length: 444 October 5, 1998 10:14 Type: N Check: 2906 .. 1 ATGGGTCATT TCACAGAGGA GGACAAGGCT ACTATCACAA GCCTGTGGGG 51 CAAGGTGAAT GTGGAAGATG CTGGAGGAGA AACCCTGGGA AGGCTCCTGG /////////////////////////////////////////////////////////// 351 CCATTTCGGC AAAGAATTCA CCCCTGAGGT GCAGGCTTCC TGGCAGAAGA 401 TGGTGACTGG AGTGGCCAGT GCCCTGTCCT CCAGATACCA CTGA
SeqEd is a general purpose, screen-oriented sequence editor. Reformat puts a sequence file that has been modified with a text editor into GCG sequence file format.
Assemble accepts multiple (one or more) nucleotide or protein sequences as input. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenEMBL:*.
Assemble does not check the type of sequence in each segment being concatenated so it would allow you to concatenate peptide and nucleotide sequences together without complaining. Embedded comments from the input segments are lost in the output sequence.
If you choose a single sequence on the command line or in response to the first program prompt, Assemble prompts you for the sequence range and strand. After processing that sequence segment, the program allows you to choose another sequence. You can continue to choose single sequences as input until you decide to write out the entire assembly (interactive mode).
You can specify multiple sequences on the command line or in response to the first program prompt. Assemble then will process all sequences and write out the entire assembly without prompting you for the range and strand of each sequence (non-interactive mode).
If you use a list file to specify multiple sequences as input, you can add begin, end, and strand sequence attributes to specify the range and strand for each sequence. You can use the join sequence attribute to create different assemblies from the sequences in a single list file. All sequences listed contiguously in the list file that share the same join attribute (i.e. share the same sequence name following the join token) are concatenated into a single assembly and the assembly is named after the sequence name following the join attribute. All sequences listed contiguously in the list file that do not have any join sequence attribute are concatenated into a single assembly and the assembly is named after the last input sequence in the assembly. Here is an example of an input list file, dros_cds.list, for Assemble:
!!SEQUENCE_LIST 1.0 Example list file of coding sequences from D. melanagaster used as input for ASSEMBLE First 3 exons are for the transformer gene. Next 4 exons are for the glucose-6-phosphate dehydrogenase gene. Last 2 exons are for the metallothionein gene. .. Gb_In:Drotga Begin: 271 End: 310 Gb_In:Drotga Begin: 559 End: 962 Gb_In:Drotga Begin: 1020 End: 1169 Gb_In:M26673 Begin: 438 End: 454 Join: g6pd_drome Gb_In:M26674 Begin: 53 End: 316 Join: g6pd_drome Gb_In:M26674 Begin: 377 End: 592 Join: g6pd_drome Gb_In:M26674 Begin: 655 End: 1729 Join: g6pd_drome Gb_In:Drometx Begin: 498 End: 519 Gb_In:Drometx Begin: 862 End: 962
Using this file as input, Assemble writes three output files. The first file, called drotga.seg, contains the assembly from the first three sequence segments in the list file. The second output file, g6pd_drome.seg, contains the assembly from the next four sequence segments. The third output file, drometx.seg, contains the assembly from the last two sequence segments in the list file. For more information about list files, see "Using List Files" in Chapter 2, Using Sequence Files and Databases in the User's Guide.
All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
Minimal Syntax: % assemble -[INfile=]@transformer.list -Default Prompted Parameters: [-OUTfile=]drotga.seg sets the output file name (single seq. output only) Local Data Files: None Optional Parameters: -BEGin=1 -END=100 sets the range of interest for each sequence -REVerse specifies the strand for each sequence -NOJOIN ignores join operators in list file -LIStfile[=assemble.list] writes a list file of output sequence names -NOMONitor suppresses the screen monitor
None.
You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
sets the beginning position for all input sequences. When the beginning position is set from the command line, Assemble ignores beginning positions specified for individual sequences in a list file.
sets the ending position for all input sequences. When the ending position is set from the command line, Assemble ignores ending positions specified for sequences in a list file.
sets the program to use the reverse strand for each input sequence. When -REVerse or -NOREVerse is on the command line, Assemble ignores any strand designation for individual sequences in a list file.
sets Assemble to ignore all join sequence attributes specified in the input list file. All sequence segments specified in the list file are assembled into a single output sequence file.
writes a list file with the names of the output sequence files. This list file is suitable for input to other Wisconsin Package programs that support list files (see Chapter 2, Using Sequence Files and Databases in the User's Guide.) If you don't specify a file name, then Assemble makes one up using assemble for the file name and .list for the file name extension.
This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.