Changelog
Version 1.7.2:
- Fix building without TBB Malloc library (#98)
Version 1.7.1:
- Fix rounding error in lca classifier (#93)
Version 1.7.0:
- allow multiple output types at once
- add dedicated CSV/TSV output (#10)
- fix loading reference database from running ARB (#76)
- report errors when sequence can’t be read from ARB (#73)
- add
--arb-list-fields
listing fields available in ARB
database
Version 1.6.1:
- fix progress bar not honoring verbosity (#85)
Version 1.6.0:
- make internal kmer engine the default (#23)
- add pretty progress monitor
- run search stage in parallel (#32)
--num-pts
defaults to number of cores available
(previous: 1)
- add
--search-engine
setting search engine for search
module
- always run internal engine without thread limit
- split num pt servers evently between search and align
- use fixed point format for logging (instead of scientific format)
- rewrote family selection (use
--fs-oldmatch
for old
implementation)
- replace boost::mutex with std::mutex (c++11)
- fix
--show-dist
if alignment width don’t match
- fix race starting pt servers (library code not threadsafe)
- fix engine type not shown in
--show-conf
- fix writing to ARB sequence cache not threadsafe
- use lock free map for ARB sequence cache (speedup)
- add pod buffer to replace std::vector (speedup)
- add FIFO cache for kmer search results (speedup for
--search
and --turn
)
Version 1.5.0:
- update documentation (#20)
- reinstate
--show-dist
- reinstate
--fs-msc-max
- add choice
exact
to --search-iupac
- change default for
--search-kmer-len
to match
--fs-kmer-len
- parallelize launch of background PT servers
- lower memory usage:
- avoid redundant sequence caching by libARBDB
- use compact aligned base (50% on internal sequence cache)
- improve internal kmer search performace
- add caching of kmer index on disk
- parallelize kmer index construction
- add presence/absence optimization
- fix field align_ident_slv added for 100% matches even when not
enabled
- fix crash on overhang past alignment edge
- fix libARBDB writing to stdout, clobbering sequence output
- fix out-of-bounds access on iterator in NAST implementation
- remove dependency on boost serialization library
- build release binaries with GCC 7 and C++11 ABI
- add integration tests watching for accuracy regressions
(#25)
Version 1.4.0:
- process sequences in parallel (#17, #31)
- add support for gzipped read/write (#29)
- add support for “-” to read/write using pipes
- remove internal pipeline in favor of TBB
- add
--add-relatives
; adding search result to output
(#19)
- add logging with variable verbosity (#14)
- be smart about locating arb_pt_server binary (#30)
Version 1.3.5:
- report number of references discarded due to configured constraints
- fix crash if no acceptable references found for a query
- fix
--search
causes a program option error (#28)
- fix race condition in terminating PT server
Version 1.3.4:
- build binary releases for macOS and Linux (#26)
- fix “search.h” missing in source tar ball (#27)
Version 1.3.3:
- add option
--fasta-write-dots
; writes dots on edges
- add option
--fasta-write-dna
; writes T/t instead of U/u
(#24)
- fix PT server fails to build if ARBHOME not set (#15)
- fix psina not installed to $bindir
- fix tab character in sequence causes sequence to be skipped
(#21)
- fix last line of input FASTA ignored if missing newline
(#16)
- fix
--db
parameter demanded even if not required due to
use of --prealigned
- fix SIGPIPE race on PT server shutdown (#11)
Version 1.3.2:
- split
--help
into “common” and advanced options
(--help-all
)
- add psina wrapper script (runs parallel instances of SINA to align
a single FASTA file)
- fix memory access failure in cseq
- fix memory access failure in mseq
- fix crash on all references removed by filters
- don’t exit(1) on
--help
(#9)
- added README.md (#5)
Version 1.3.1:
- add OSX support
- change license to GPL
- remove limitation on ARB integration mode
- move revisioning to git
- fix compilation with CLANG
Version 1.3.0:
- dropped support for ARB 5.x
Version 1.2.13:
- uppercase aligned bases if lowercase=unaligned
- fix manual typos (thx to Mohamed El-hadidi)
- search-db defaults to pt-db
- search-port defaults to pt-port if search/align DBs are identical
fixes unnecessary start of two PT servers (thx to Christian
Wurzbacher)
- change default for lca-quorum to 0.7
- change default for search-min-sim to 0.7
- be smarter about recoginizing FASTA format files and creating
output FASTA name (“.frn”, “.fna”, “.fas”, “/dev/stdin” as input,
“.fasta.aligned” and “/dev/stdout” as output)
- write sequence ID in first column of CSV output
- add fasta-block and fasta-idx options allowing to process only
specific smaller blocks of larger fasta files (for parallelization)
Version 1.2.12:
- use same ARB field type for align_ident_slv as SILVA uses
- skip sequences with non-IUPAC characters when building reference
and when loading sequences to be aligned from ARB file (complaint
is issued on stderr)
Version 1.2.11:
- fix
--fs-req
was ignored
- added option
--calc-idty
Computes the minimum identity of
the aligned query sequence with any of the reference sequences used
for alignment. The value is exported in align_slv_idty.
- added option
--min-idty
IDTY Excludes sequences with
align_slv_idty < IDTY from FASTA output. Implies
--calc-idty
.
Version 1.2.10:
- added option
--fs-no-graph
Uses a column profile with PSP
score as template (instead of the POA method) This feature is
merely for completeness sake and evaluation. With SILVA SSU the POA
based method is much more accurate.
- changed default for
--fs-cover-gene
to 0 (faster) The
cover-gene feature only makes sense if :option:–gene-start` and
--gene-end
are set such that the reference actually
contains sequences touching these boundaries. If this is not the
case, the reference selection algorithm wastes time with a futile
search.
- use unix socket as default for
--ptport
and
--search-port
Using “/tmp/sina_<PID>.socket” is a more
suitable default than “localhost:4040”, as it runs less risk of
accessing a different PT server than intended.
- fix inconsistencies in generated meta data fields and log output
- updated ARB components to SVN revision 8225
- added option
--write-used-rels
The field used_rels is
interpreted by ARB as the field containing the reference sequences
that were used during alignment.
- no longer write full_name content when exporting meta data encoded
in the FASTA header
- re-add clamped align_quality_slv
- fix score normalization (scores > 1 were possible when fs-weight
> 0)
- fix calculation of bp score when orig-db no set (default ptdb)
- added option
--fs-req-gaps
n Ignores reference sequences
having less than n gaps before the last base. I.e.: Ignores
“unaligned” sequences. This is useful when running SINA out of ARB
to prevent accidental alignment against unaligned sequences.
- added options
--search-iupac
,
--search-correction
and --search-cover
These
options configure how the “distance” (identity, similarity, …)
is calculated.
- skip FASTA input sequences that contain invalid characters
(i.e. not IUPAC encoded bases, ‘.’, ‘-‘ or white space)
Version 1.2.9:
- fixed sequence not filled with gap characters after copying full
alignment
Version 1.2.8:
- made –extra-fields actually load multiple fields from arb file
- fixed sequence not filled with gap characters after copying
subalignment
- updated ARB components to SVN revision 7985
- added changelog :)