Name Last modified Size Description
Parent Directory - fasta-36.1.1.tar.gz 18-Mar-2010 10:07 706K GZIP compressed docume> fasta-36.2.3.tar.gz 05-May-2010 06:17 652K GZIP compressed docume> fasta-36.2.5.tar.gz 22-Jul-2010 10:16 698K GZIP compressed docume> fasta-36.2.6-macosxu..> 04-Aug-2010 21:38 5.0M GZIP compressed docume> fasta-36.2.6.tar.gz 05-Aug-2010 11:48 700K GZIP compressed docume> fasta-36.2.6.zip 05-Aug-2010 09:56 6.1M fasta-36.2.7-macosxu..> 01-Oct-2010 13:09 11M GZIP compressed docume> fasta-36.2.7.tar.gz 11-Oct-2010 11:24 703K GZIP compressed docume> fasta-36.3.1.tar.gz 05-Jan-2011 17:48 688K GZIP compressed docume> fasta-36.3.1.zip 05-Jan-2011 19:51 6.2M fasta-36.3.2.tar.gz 21-Jan-2011 12:48 695K GZIP compressed docume> fasta-36.3.3.tar.gz 10-Feb-2011 11:48 716K GZIP compressed docume> fasta-36.3.4.tar.gz 17-May-2011 14:43 870K GZIP compressed docume> fasta-36.3.5a.tar.gz 14-Jun-2011 17:07 918K GZIP compressed docume> fasta-36.3.5b-macosx..> 18-Nov-2011 15:33 6.9M GZIP compressed docume> fasta-36.3.5b.tar.gz 18-Nov-2011 15:12 919K GZIP compressed docume> fasta-36.3.5b.zip 18-Nov-2011 16:28 6.7M fasta-36.3.5c-macosx..> 19-Feb-2012 11:07 4.9M GZIP compressed docume> fasta-36.3.5c.tar.gz 27-Jun-2012 07:01 920K GZIP compressed docume> fasta-36.3.5d.tar.gz 21-Aug-2012 13:46 921K GZIP compressed docume> fasta-36.3.5d.zip 21-Aug-2012 13:36 6.7M fasta-36.3.5e.tar.gz 20-Mar-2013 11:42 922K GZIP compressed docume> fasta-36.3.6.tar.gz 05-Jul-2013 15:59 972K GZIP compressed docume> fasta-36.3.6a.tar.gz 20-Jul-2013 09:59 1.0M GZIP compressed docume> fasta-36.3.6a.zip 20-Jul-2013 08:59 8.1M fasta-36.3.6b.tar.gz 09-Aug-2013 07:24 972K GZIP compressed docume> fasta-36.3.6c.tar.gz 27-Aug-2013 10:50 973K GZIP compressed docume> fasta-36.3.6d.tar.gz 29-Jan-2014 11:35 947K GZIP compressed docume> fasta-36.3.6d.zip 10-Jun-2014 13:20 6.9M
$Id: changes_v36.html 1128 2013-03-17 21:14:05Z wrp $ $Revision: 210 $
FASTA version 36.3.6 provides two new features:
(fasta-36.3.5 January 2013) The NCBI's transition from BLAST to BLAST+ several years ago broke the ability of ssearch36 to use PSSMs, because psiblast did not produce the binary ASN.1 PSSMs that ssearch36 could parse. With the January 2013 fasta-36.3.5f, release ssearch36 can read binary ASN.1 PSSM files produced by the NCBI datatool utility. See fasta_guide.pdf for more information (look for the -P option).
Likewise, the score histogram is no longer shown by default; use the -H option to show the histogram (or compile with -DSHOW_HIST for previous behavior).
The _t (fasta36_t) versions of the programs are built automatically on Linux/MacOSX machines and named fasta36, etc. (the programs are threaded by default, and only one program version is built).
Documentation has been significantly revised and updated. See doc/fasta_guide.pdf for a description of the programs and options.
By default, the statistical threshold for alternate alignments (HSPs) is the E()-threshold / 10.0. For proteins, the default expect threshold is E()< 10.0, the secondary threshold for showing alternate alignments is thus E() < 1.0. Fror translated comparisons, the E()-thresholds are 5.0/0.5; for DNA:DNA 2.0/0.2.
Both the primary and secondary E()-thresholds are set with the -E "prim sec" command line option. If the secondary value is betwee zero and 1.0, it is taken as the actual threshold. If it is > 1.0, it is taken as a divisor for the primary threshold. If it is negative, alternative alignments are disabled and only the best alignment is shown.
(fasta-36.3.4) Alignment option -m B provides BLAST-like alignments (no context, coordinates at the beginning and end of the alignment line, Query/Sbjct.
Statistical thresholds can dramatically reduce the number of "optimized" scores, from which statistical estimates are calculated. To address this problem, the statistical estimation procedure has been adjusted to correct for the fraction of scores that were optimized. This process can dramatically improve statistical accuracy for some matrices and gap pentalies, e.g. BLOSUM62 -11/-1.
With the new joining thresholds, the -c "E-opt E-join" options have expanded meanings. -c "E-opt E-join" calculates a threshold designed (but not guaranteed) to do band optimization and joining for that fraction of sequences. Thus, -c "0.02 0.1" seeks to do band optimization (E-opt) on 2% of alignments, and joining on 10% of alignments. -c "40 10" sets the gap threshold as in earlier versions.
By default, the program will read up to 2 GB (32-bit systems) or 12 GB (64-bit systems) of the database into memory for multi-query searches. The amount of memory available for databases can be set with the -XM4G option.
In translated sequence comparisons, annotations are only available for the protein sequence.
Add ability to search a subset of a library using a file name and a list of accession/gi numbers. This version introduces a new filetype, 10, which consists of a first line with a target filename, format, and accession number format-type, and optionally the accession number format in the database, followed by a list of accession numbers. For example:
</slib2/blast/swissprot.lseg 0:2 4| 3121763 51701705 7404340 74735515 ...Tells the program that the target database is swissprot.lseg, which is in FASTA (library type 0) format.
The accession format comes after the ":". Currently, there are four accession formats, two that require ordered accessions (:1, :2), and two that hash the accessions (:3, :4) so they do not need to be ordered. The number and character after the accession format (e.g. "4|") indicate the offset of the beginning of the accession and the character that terminates the accession. Thus, in the typical NCBI Fasta definition line:
>gi|1170095|sp|P46419|GSTM1_DERPT Glutathione S-transferase (GST class-mu)The offset is 4 and the termination character is '|'. For databases distributed in FASTA format from the European Bioinformatics Institute, the offset depends on the name of the database, e.g.
>SW:104K_THEAN Q4U9M9 104 kDa microneme/rhoptry antigen precursor (p104).and the delimiter is ' ' (space, the default).
Accession formats 1 and 3 expect strings; accession formats 2 and 4 work with integers (e.g. gi numbers).
lalign35 -q mchu.aa:1-74 mchu.aa:75-148Note, however, that the subset range applied to the library will be applied to every sequence in the library - not just the first - and that the same subset range is applied to each sequence. This probably makes sense only if the library contains a single sequence (this is also true for the query sequence file).
Add Mueller and Vingron (2000) J. Comp. Biol. 7:761-776 VT160 matrix, "-s VT160", and OPTIMA_5 (Kann et al. (2000) Proteins 41:498-503).
lalign35 -m 11 | lav2psreplaces plalign (from FASTA2).
>>gi|121716|sp|P10649|GSTM1_MOUSE Glutathione S-transfer (218 aa) s-w opt: 1497 Z-score: 1857.5 bits: 350.8 E(): 8.3e-97 Smith-Waterman score: 1497; 100.0% identity (100.0% similar) in 218 aa overlap (1-218:1-218) ^^^^^^^^^^^^^^where the highlighted text was either: "Smith-Waterman" or "banded Smith-Waterman". In fact, scores were calculated in other ways, including global/local for fasts and fastf. With the addition of ggsearch35, glsearch35, and lalign35, there are many more ways to calculate alignments: "Smith-Waterman" (ssearch and protein fasta), "banded Smith-Waterman" (DNA fasta), "Waterman-Eggert", "trans. Smith-Waterman", "global/local", "trans. global/local", "global/global (N-W)". The last option is a global global alignment, but with the affine gap penalties used in the Smith-Waterman algorithm.