seqMMLigner
Minimum Message Length (MML) based aligner of protein (amino acid) sequences
seqMMLigner Program
Instructions for running seqMMLigner
Synopsis:
seqmmligner [sequence-1 fasta file] [sequence-2 fasta] [OPTION]...Description:
Align two amino acid sequence fasta FILEs (default operation).
--criterion [optimal/marginal/marginal-interactive]
--ivalue [SEQUENCE_ALIGNMENT_FASTA_FILE]
--criterion [optimal/marginal/marginal-interactive]
optimal: (default) | finds optimal alignment (and infers associated params) under the information-theoretic measure (i.e., finds the alignment that maximizes the joint probability of the alignment and the two input sequences) |
marginal: | finds the marginal probability (and infers associated params) that the two input sequences are related, under the information-theoretic measure |
marginal-interactive: | same as above, but allows interactive exploration of the marginal probability landscape and probe competing sequence alignments |
Score any SEQUENCE_ALIGNMENT_FILE (in fasta format) using I-value. Changes default operation: an alignment is not computed. Instead, SEQUENCE_ALIGNMENT_FILE (FASTA format) is scored using the I-value measure.
Some example command line runs:
Generating an alignment using seqMMLigner:
./seqmmligner seq1.fa seq2.fa./seqmmligner seq1.fa seq2.fa --criterion marginal
./seqmmligner --ivalue alignment.afasta
Instructions for building seqMMLigner (v2.5-1)
Dependencies: GNUMake or equivalent. A modern C++ compiler. seqMMLigner is known to build with g++ (GCC) >= 4.1.2. If these dependencies are met, follow these instructions:
- Download the source code from the link above.
- Extract the archive with: tar -zxf seqmmligner_2.5-1.tgz
- Type: cd seqmmligner_2.5-1/
- Build seqMMLigner with: make
- The built binary, seqmmligner, will appear in the bin/ subdirectory.
Copyright license
seqMMLigner is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License. seqMMLigner is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with seqMMLigner. If not, see http://www.gnu.org/licenses/.
Bug reports
Please contact the following people for bug reports, web page errors, or questions:
- Arun Konagurthu <arun DOT konagurthu AT monash DOT edu>
- Dinithi Sumanaweera <dinithi DOT sumanaweera AT monash DOT edu>
Supplementary Material
Supporting data:
SCOP domain sequence pairs used to infer Dirichlet priors: (click here)
Distribution of #alignments vs sequence-distance parameter (i.e. n of PAM-n): (click here)
Inferred Dirichlet (priors') parameters for sequence-distance parameter n in [1,1000]: (click here)
A selection of marginal probability landscapes: (click here)
Benchmark statistics across the programs seqMMLigner, ClustalW, CONTRAlign, KAlign, MAFFT, MUSCLE, ProbCons, T-Coffee:
- Human fungal mitrochondrial proteins (remote ortholog) data set:
(click here)
-
SABMark "Twilight" zone (twi) data set:
(click here)