Debian Med Project
Help us to see Debian used by medical practitioners and biomedical researchers! Join us on the Alioth page.
Summary
Next generation sequencing
Debian Med bioinformatics applications usable in Next Generation Sequencing

It aims at gettting packages which specialize in the processing or interpretation of data generated with next- (and later-) generation high-thoughput sequencing technologies.

Description

For a better overview of the project's availability as a Debian package, each head row has a color code according to this scheme:

If you discover a project which looks like a good candidate for Debian Med to you, or if you have prepared an unofficial Debian package, please do not hesitate to send a description of that project to the Debian Med mailing list

Links to other tasks

Debian Med Next generation sequencing packages

Official Debian packages with high relevance

Bcftools
genomic variant calling and manipulation of VCF/BCF files
Versions of package bcftools
ReleaseVersionArchitectures
stretch1.3.1-1amd64,arm64,armel,mips64el,mipsel,ppc64el
sid1.6-3amd64,arm64,armel,armhf,mips64el,mipsel,ppc64el
buster1.5-4amd64,arm64,armel,armhf,mips64el,mipsel,ppc64el
sid1.5-4kfreebsd-amd64
Popcon: 7 users (13 upd.)*
Versions and Archs
License: DFSG free
Git

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.

Registry entries: Bio.Tools  SciCrunch  OMICtools 
Bedtools
suite of utilities for comparing genomic features
Versions of package bedtools
ReleaseVersionArchitectures
wheezy2.16.1-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
stretch2.26.0+dfsg-3amd64,arm64,armel,i386,mips64el,mipsel,ppc64el
sid2.26.0+dfsg-5amd64,arm64,armel,armhf,hurd-i386,i386,kfreebsd-amd64,kfreebsd-i386,mips,mips64el,mipsel,powerpc,ppc64el,s390x
buster2.26.0+dfsg-5amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie2.21.0-1amd64,arm64,armhf,i386,mips,mipsel,powerpc,ppc64el,s390x
upstream2.27.1
Debtags of package bedtools:
fieldbiology, biology:bioinformatics
interfacecommandline
roleprogram
scopesuite
useanalysing, comparing, converting, filtering
works-withbiological-sequence
Popcon: 40 users (52 upd.)*
Newer upstream!
License: DFSG free
Git

The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM. Using BEDTools, one can develop sophisticated pipelines that answer complicated research questions by streaming several BEDTools together.

The groupBy utility is distributed in the filo package.

Please cite: Aaron R. Quinlan and Ira M. Hall: BEDTools: a flexible suite of utilities for comparing genomic features. (PubMed,eprint) Bioinformatics 26(6):841-842 (2010)
Registry entries: Bio.Tools  RRID  OMICtools 
Blasr
mapping single-molecule sequencing reads
Versions of package blasr
ReleaseVersionArchitectures
sid0~20151014+git8e668be-1hurd-i386,kfreebsd-i386
buster5.3+0-1amd64,arm64,mips64el,ppc64el
sid5.3+0-1amd64,arm64,kfreebsd-amd64,mips64el,ppc64el
stretch5.3+0-1amd64,arm64,mips64el,ppc64el
Popcon: 6 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

Basic local alignment with successive refinement (BLASR) is a method for mapping single-molecule sequencing reads against a reference genome. Such reads are thousands of bases long, with divergence between them and the genome being dominated by insertion and deletion error.

Registry entries: SciCrunch  OMICtools 
Bowtie
Ultrafast memory-efficient short read aligner
Versions of package bowtie
ReleaseVersionArchitectures
stretch1.1.2-6amd64,arm64,mips64el,ppc64el,s390x
wheezy0.12.7-3amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,powerpc,s390,s390x,sparc
sid1.2.1.1+dfsg-1amd64,arm64,mips64el,ppc64el,s390x
buster1.2.1.1+dfsg-1amd64,arm64,mips64el,ppc64el,s390x
sid1.1.2-6kfreebsd-amd64
jessie1.1.1-2amd64
upstream1.2.2-beta
Debtags of package bowtie:
biologynuceleic-acids
fieldbiology:bioinformatics
interfacecommandline
roleprogram
sciencecalculation
scopeutility
useanalysing, comparing
works-withbiological-sequence
Popcon: 25 users (14 upd.)*
Newer upstream!
License: DFSG free
Git

This package addresses the problem to interpret the results from the latest (2010) DNA sequencing technologies. Those will yield fairly short stretches and those cannot be interpreted directly. It is the challenge for tools like Bowtie to give a chromosomal location to the short stretches of DNA sequenced per run.

Bowtie aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).

The package is enhanced by the following packages: bowtie-examples
Please cite: Ben Langmead, Cole Trapnell, Mihai Pop and Steven L Salzberg: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. (eprint) Genome Biology 10:R25 (2009)
Registry entries: SciCrunch  OMICtools 
Bwa
Burrows-Wheeler Aligner
Versions of package bwa
ReleaseVersionArchitectures
buster0.7.17-1amd64
squeeze0.5.8c-1amd64,armel,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,sparc
wheezy0.6.2-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
jessie0.7.10-1amd64
stretch0.7.15-2amd64
sid0.7.17-1amd64,kfreebsd-amd64
Debtags of package bwa:
biologynuceleic-acids, peptidic
fieldbiology, biology:bioinformatics
interfacecommandline, text-mode
roleprogram
useanalysing, comparing
Popcon: 31 users (22 upd.)*
Versions and Archs
License: DFSG free
Git

BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.

Please cite: Heng Li and Richard Durbin: Fast and accurate short read alignment with Burrows-Wheeler transform. (PubMed,eprint) Bioinformatics 25(14):1754-1760 (2009)
Registry entries: Bio.Tools  SciCrunch  OMICtools 
Daligner
local alignment discovery between long nucleotide sequencing reads
Versions of package daligner
ReleaseVersionArchitectures
buster1.0+20171010-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.0+20171010-2amd64,arm64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mips64el,mipsel,powerpc,ppc64el,s390x
stretch1.0+20161119-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 7 users (8 upd.)*
Versions and Archs
License: DFSG free
Git

These tools permit one to find all significant local alignments between reads encoded in a Dazzler database. The assumption is that the reads are from a Pacific Biosciences RS II long read sequencer. That is, the reads are long and noisy, up to 15% on average.

Please cite: Gene Myers: Efficient Local Alignment Discovery amongst Noisy Long Reads. 8701:52-67 (2014)
Registry entries: OMICtools 
Fastx-toolkit
FASTQ/A short nucleotide reads pre-processing tools
Versions of package fastx-toolkit
ReleaseVersionArchitectures
buster0.0.14-5amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie0.0.14-1amd64,arm64,armel,armhf,i386,mips,mipsel,ppc64el,s390x
stretch0.0.14-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
wheezy0.0.13.2-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
sid0.0.14-5amd64,arm64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mips64el,mipsel,powerpc,ppc64el,s390x
Debtags of package fastx-toolkit:
roleprogram
Popcon: 18 users (8 upd.)*
Versions and Archs
License: DFSG free
Git

The FASTX-Toolkit is a collection of command line tools for preprocessing short nucleotide reads in FASTA and FASTQ formats, usually produced by Next-Generation sequencing machines. The main processing of such FASTA/FASTQ files is mapping (aligning) the sequences to reference genomes or other databases using specialized programs like BWA, Bowtie and many others. However, it is sometimes more productive to preprocess the FASTA/FASTQ files before mapping the sequences to the genome—manipulating the sequences to produce better mapping results. The FASTX-Toolkit tools perform some of these preprocessing tasks.

Registry entries: Bio.Tools  RRID  OMICtools 
Last-align
genome-scale comparison of biological sequences
Versions of package last-align
ReleaseVersionArchitectures
buster885-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid885-1amd64,arm64,armel,armhf,hurd-i386,i386,kfreebsd-amd64,kfreebsd-i386,mips,mips64el,mipsel,powerpc,ppc64el,s390x
stretch830-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie490-1amd64,arm64,armel,armhf,i386,mips,mipsel,powerpc,ppc64el,s390x
wheezy199-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
squeeze128-1amd64,armel,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,sparc
upstream914
Debtags of package last-align:
biologynuceleic-acids
fieldbiology, biology:bioinformatics
roleprogram
Popcon: 10 users (21 upd.)*
Newer upstream!
License: DFSG free
Git

LAST is software for comparing and aligning sequences, typically DNA or protein sequences. LAST is similar to BLAST, but it copes better with very large amounts of sequence data. Here are two things LAST is good at:

  • Comparing large (e.g. mammalian) genomes.
  • Mapping lots of sequence tags onto a genome.

The main technical innovation is that LAST finds initial matches based on their multiplicity, instead of using a fixed size (e.g. BLAST uses 10-mers). This allows one to map tags to genomes without repeat-masking, without becoming overwhelmed by repetitive hits. To find these variable-sized matches, it uses a suffix array (inspired by Vmatch). To achieve high sensitivity, it uses a discontiguous suffix array, analogous to spaced seeds.

Please cite: Martin C. Frith, Raymond Wan and Paul Horton: Incorporating sequence quality data into alignment improves DNA read mapping. (PubMed,eprint) Nucl. Acids Res. 38(7):e100 (2010)
Registry entries: Bio.Tools  RRID  OMICtools 
Libvcflib-tools
C++ library for parsing and manipulating VCF files (tools)
Versions of package libvcflib-tools
ReleaseVersionArchitectures
sid1.0.0~rc1+dfsg1-6amd64
stretch1.0.0~rc1+dfsg1-3amd64
buster1.0.0~rc1+dfsg1-6amd64
Popcon: 3 users (7 upd.)*
Versions and Archs
License: DFSG free
Git

The Variant Call Format (VCF) is a flat-file, tab-delimited textual format intended to concisely describe reference-indexed variations between individuals. VCF provides a common interchange format for the description of variation in individuals and populations of samples, and has become the defacto standard reporting format for a wide array of genomic variant detectors.

vcflib provides methods to manipulate and interpret sequence variation as it can be described by VCF. It is both:

  • an API for parsing and operating on records of genomic variation as it can be described by the VCF format,
  • and a collection of command-line utilities for executing complex manipulations on VCF files.

This package contains several tools using the library.

Registry entries: Bio.Tools  SciCrunch  OMICtools 
Maq
maps short fixed-length polymorphic DNA sequence reads to reference sequences
Versions of package maq
ReleaseVersionArchitectures
squeeze0.7.1-3amd64,armel,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,sparc
stretch0.7.1-7amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid0.7.1-7amd64,arm64,armel,armhf,hurd-i386,i386,kfreebsd-amd64,kfreebsd-i386,mips,mips64el,mipsel,powerpc,ppc64el,s390x
wheezy0.7.1-5amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
buster0.7.1-7amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie0.7.1-5amd64,arm64,armel,armhf,i386,mips,mipsel,powerpc,ppc64el,s390x
Debtags of package maq:
biologynuceleic-acids
fieldbiology, biology:bioinformatics
interfacecommandline
roleprogram
scopeutility
useanalysing, comparing, searching
works-with-formatplaintext
Popcon: 22 users (9 upd.)*
Versions and Archs
License: DFSG free
Git

Maq (short for Mapping and Assembly with Quality) builds mapping assemblies from short reads generated by the next-generation sequencing machines. It was particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has a preliminary functionality to handle ABI SOLiD data. Maq is previously known as mapass2.

Developmemt of Maq stopped in 2008. Its successors are BWA and SAMtools.

Please cite: Heng Li, Jue Ruan and Richard Durbin: Mapping short DNA sequencing reads and calling variants using mapping quality scores. (PubMed,eprint) Genome Research 18(11):1851-1858 (2008)
Registry entries: Bio.Tools  SciCrunch  OMICtools 
Mhap
locality-sensitive hashing to detect long-read overlaps
Versions of package mhap
ReleaseVersionArchitectures
stretch2.1.1+dfsg-1all
sid2.1.1+dfsg-1all
buster2.1.1+dfsg-1all
Popcon: 6 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

The MinHash Alignment Process (MHAP--pronounced MAP) is a reference implementation of a probabilistic sequence overlapping algorithm. Designed to efficiently detect all overlaps between noisy long-read sequence data. It efficiently estimates Jaccard similarity by compressing sequences to their representative fingerprints composed on min-mers (minimum k-mer).

Please cite: Konstantin Berlin, Sergey Koren, Chen-Shan Chin, James P Drake, Jane M Landolin and Adam M Phillippy: Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. (PubMed) Nature Biotechnology 33(6):623–630 (2015)
Registry entries: OMICtools 
Picard-tools
Command line tools to manipulate SAM and BAM files
Versions of package picard-tools
ReleaseVersionArchitectures
sid2.8.1+dfsg-3all
buster2.8.1+dfsg-3all
stretch2.8.1+dfsg-1all
squeeze1.27-1all
wheezy1.46-1all
jessie1.113-1all
upstream2.16.0
Popcon: 22 users (8 upd.)*
Newer upstream!
License: DFSG free
Git

SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments. Picard Tools includes these utilities to manipulate SAM and BAM files:

 AddCommentsToBam                  FifoBuffer
 AddOrReplaceReadGroups            FilterSamReads
 BaitDesigner                      FilterVcf
 BamIndexStats                     FixMateInformation
 BamToBfq                          GatherBamFiles
 BedToIntervalList                 GatherVcfs
 BuildBamIndex                     GenotypeConcordance
 CalculateHsMetrics                IlluminaBasecallsToFastq
 CalculateReadGroupChecksum        IlluminaBasecallsToSam
 CheckIlluminaDirectory            LiftOverIntervalList
 CheckTerminatorBlock              LiftoverVcf
 CleanSam                          MakeSitesOnlyVcf
 CollectAlignmentSummaryMetrics    MarkDuplicates
 CollectBaseDistributionByCycle    MarkDuplicatesWithMateCigar
 CollectGcBiasMetrics              MarkIlluminaAdapters
 CollectHiSeqXPfFailMetrics        MeanQualityByCycle
 CollectIlluminaBasecallingMetrics MergeBamAlignment
 CollectIlluminaLaneMetrics        MergeSamFiles
 CollectInsertSizeMetrics          MergeVcfs
 CollectJumpingLibraryMetrics      NormalizeFasta
 CollectMultipleMetrics            PositionBasedDownsampleSam
 CollectOxoGMetrics                QualityScoreDistribution
 CollectQualityYieldMetrics        RenameSampleInVcf
 CollectRawWgsMetrics              ReorderSam
 CollectRnaSeqMetrics              ReplaceSamHeader
 CollectRrbsMetrics                RevertOriginalBaseQualitiesAndAddMateCigar
 CollectSequencingArtifactMetrics  RevertSam
 CollectTargetedPcrMetrics         SamFormatConverter
 CollectVariantCallingMetrics      SamToFastq
 CollectWgsMetrics                 ScatterIntervalsByNs
 CompareMetrics                    SortSam
 CompareSAMs                       SortVcf
 ConvertSequencingArtifactToOxoG   SplitSamByLibrary
 CreateSequenceDictionary          SplitVcfs
 DownsampleSam                     UpdateVcfSequenceDictionary
 EstimateLibraryComplexity         ValidateSamFile
 ExtractIlluminaBarcodes           VcfFormatConverter
 ExtractSequences                  VcfToIntervalList
 FastqToSam                        ViewSam
Registry entries: Bio.Tools  SciCrunch  OMICtools 
R-bioc-hilbertvis
GNU R package to visualise long vector data
Versions of package r-bioc-hilbertvis
ReleaseVersionArchitectures
squeeze1.5.0-2amd64,armel,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,sparc
wheezy1.14.0-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
jessie1.24.0-1amd64,arm64,armel,armhf,i386,mips,mipsel,powerpc,ppc64el,s390x
stretch1.32.0-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster1.36.0-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.36.0-1amd64,arm64,armel,armhf,hurd-i386,i386,kfreebsd-amd64,kfreebsd-i386,mips,mips64el,mipsel,powerpc,ppc64el,s390x
Debtags of package r-bioc-hilbertvis:
biologynuceleic-acids
fieldbiology, biology:bioinformatics
useanalysing
Popcon: 13 users (23 upd.)*
Versions and Archs
License: DFSG free
Git

This tool allows one to display very long data vectors in a space-efficient manner, by organising it along a 2D Hilbert curve. The user can then visually judge the large scale structure and distribution of features simultaenously with the rough shape and intensity of individual features.

In bioinformatics, a typical use case is ChIP-Chip and ChIP-Seq, or basically all the kinds of genomic data, that are conventionally displayed as quantitative track ("wiggle data") in genome browsers such as those provided by Ensembl or UCSC.

Please cite: Simon Anders: Visualization of genomic data with the Hilbert curve. (PubMed,eprint) Bioinformatics 25(10):1231-1235 (2009)
Registry entries: Bio.Tools  SciCrunch  OMICtools 
Rna-star
ultrafast universal RNA-seq aligner
Versions of package rna-star
ReleaseVersionArchitectures
stretch2.5.2b+dfsg-1amd64,arm64,mips64el,ppc64el
buster2.5.3a+dfsg-3amd64,arm64,mips64el,ppc64el
sid2.5.3a+dfsg-3amd64,arm64,mips64el,ppc64el
Popcon: 4 users (6 upd.)*
Versions and Archs
License: DFSG free
Git

Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, the authors experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy.

Please cite: Alexander Dobin, Carrie A. Davis, Felix Schlesinger, Jorg Drenkow, Chris Zaleski, Sonali Jha, Philippe Batut, Mark Chaisson and Thomas R. Gingeras: STAR: ultrafast universal RNA-seq aligner. (PubMed,eprint) Bioinformatics 29(1):15-21 (2012)
Registry entries: Bio.Tools  OMICtools 
Samtools
processing sequence alignments in SAM and BAM formats
Versions of package samtools
ReleaseVersionArchitectures
wheezy0.1.18-1amd64,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390
sid1.6-3amd64,arm64,armel,hurd-i386,i386,mips64el,powerpc,ppc64el,s390x
sid1.5-1armhf,kfreebsd-amd64,kfreebsd-i386,mips,mipsel
stretch1.3.1-3amd64,arm64,armel,i386,mips64el,mipsel,ppc64el
squeeze0.1.8-1amd64,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390
buster1.5-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie0.1.19-1amd64,arm64,armhf,i386,mips,mipsel,powerpc,ppc64el,s390x
Debtags of package samtools:
fieldbiology
interfacecommandline
networkclient
roleprogram
scopeutility
uitoolkitncurses
useanalysing, calculating, filtering
works-withbiological-sequence
Popcon: 75 users (87 upd.)*
Versions and Archs
License: DFSG free
Git

Samtools is a set of utilities that manipulate nucleotide sequence alignments in the binary BAM format. It imports from and exports to the ascii SAM (Sequence Alignment/Map) format, does sorting, merging and indexing, and allows one to retrieve reads in any regions swiftly. It is designed to work on a stream, and is able to open a BAM (not SAM) file on a remote FTP or HTTP server.

The package is enhanced by the following packages: libbio-samtools-perl
Please cite: Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, Goncalo Abecasis, Richard Durbin and 1000 Genome Project Data Processing Subgroup: The Sequence Alignment/Map (SAM) Format and SAMtools. (PubMed,eprint) Bioinformatics 25(16):2078-2079 (2009)
Registry entries: RRID  OMICtools 
Screenshots of package samtools
Sra-toolkit
utilities for the NCBI Sequence Read Archive
Versions of package sra-toolkit
ReleaseVersionArchitectures
sid2.8.2-3+dfsg-1amd64,i386
wheezy2.1.7a-1amd64,i386,kfreebsd-amd64,kfreebsd-i386
jessie2.3.5-2+dfsg-1amd64,i386
stretch2.8.1-2+dfsg-2amd64,i386
buster2.8.2-3+dfsg-1amd64,i386
upstream2.8.2-5
Popcon: 19 users (4 upd.)*
Newer upstream!
License: DFSG free
Git

Tools for reading the SRA archive, generally by converting individual runs into some commonly used format such as fastq.

The textual dumpers "sra-dump" and "vdb-dump" are provided in this release as an aid in visual inspection. It is likely that their actual output formatting will be changed in the near future to a stricter, more formalized representation[s]. PLEASE DO NOT RELY UPON THE OUTPUT FORMAT SEEN IN THIS RELEASE.

The "help" information will be improved in near future releases, and the tool options will become standardized across the set. More documentation will also be provided documentation on the NCBI web site.

Tool options may change in the next release. Version 1 tool options will remain supported wherever possible in order to preserve operation of any existing scripts.

Please cite: Rasko Leinonen, Ruth Akhtar, Ewan Birney, James Bonfield, Lawrence Bower, Matt Corbett, Ying Cheng, Fehmi Demiralp, Nadeem Faruque, Neil Goodgame, Richard Gibson, Gemma Hoad, Christopher Hunter, Mikyung Jang, Steven Leonard, Quan Lin, Rodrigo Lopez, Michael Maguire, Hamish McWilliam, Sheila Plaister, Rajesh Radhakrishnan, Siamak Sobhany, Guy Slater, Petra Ten Hoopen, Franck Valentin, Robert Vaughan, Vadim Zalunin, Daniel Zerbino and Guy Cochrane: Improvements to services at the European Nucleotide Archive. (PubMed,eprint) Nucleic Acids Research 38(Database issue):D39-45 (2010)
Registry entries: Bio.Tools 
Ssake
genomics application for assembling millions of very short DNA sequences
Versions of package ssake
ReleaseVersionArchitectures
sid3.8.5-1all
jessie3.8.2-1all
buster3.8.5-1all
wheezy3.8-2all
squeeze3.5-1all
stretch3.8.4-1all
Debtags of package ssake:
biologynuceleic-acids
fieldbiology
interfaceshell
roleprogram
scopeutility
useanalysing
Popcon: 12 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

The Short Sequence Assembly by K-mer search and 3′ read Extension (SSAKE) is a genomics application for aggressively assembling millions of short nucleotide sequences by progressively searching for perfect 3′-most k-mers using a DNA prefix tree. SSAKE is designed to help leverage the information from short sequences reads by stringently clustering them into contigs that can be used to characterize novel sequencing targets.

Please cite: Rene L. Warren, Granger G. Sutton, Steven J. M. Jones and Robert A. Holt: Assembling millions of short DNA sequences using SSAKE. (PubMed,eprint) Bioinformatics 23(4):500-501 (2007)
Registry entries: Bio.Tools  RRID  OMICtools 
Tabix
generic indexer for TAB-delimited genome position files
Versions of package tabix
ReleaseVersionArchitectures
jessie0.2.6-2armhf,mips,powerpc,s390x
sid1.6-2amd64,arm64,armel,armhf,hurd-i386,i386,mips,mips64el,mipsel,powerpc,ppc64el,s390x
sid1.5-6kfreebsd-amd64,kfreebsd-i386
jessie1.1-1amd64,arm64,armel,i386,mipsel,ppc64el
buster1.5-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch1.3.2-2amd64,arm64,armel,i386,mips64el,mipsel,ppc64el
wheezy0.2.6-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
Debtags of package tabix:
roleprogram
works-with-formathtml
Popcon: 29 users (14 upd.)*
Versions and Archs
License: DFSG free
Git

Tabix indexes files where some columns indicate sequence coordinates: name (usually a chromosome), start and stop. The input data file must be position sorted and compressed by bgzip (provided in this package), which has a gzip like interface. After indexing, tabix is able to quickly retrieve data lines by chromosomal coordinates. Fast data retrieval also works over network if an URI is given as a file name.

This version of tabix is built from the HTSlib source.

Please cite: Heng Li: Tabix: fast retrieval of sequence features from generic TAB-delimited files. (PubMed,eprint) Bioinformatics 27(5):718-719 (2011)
Registry entries: OMICtools 
Screenshots of package tabix
Tophat
fast splice junction mapper for RNA-Seq reads
Versions of package tophat
ReleaseVersionArchitectures
jessie2.0.13+dfsg-1amd64
stretch2.1.1+dfsg-2amd64
buster2.1.1+dfsg1-1amd64,arm64,mips64el,ppc64el,s390x
sid2.1.1+dfsg1-1amd64,arm64,kfreebsd-amd64,mips64el,ppc64el,s390x
Popcon: 13 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

TopHat aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. TopHat is a collaborative effort between the University of Maryland Center for Bioinformatics and Computational Biology and the University of California, Berkeley Departments of Mathematics and Molecular and Cell Biology.

The package is enhanced by the following packages: cufflinks
Please cite: Cole Trapnell, Lior Pachter and Steven L. Salzberg: TopHat: discovering splice junctions with RNA-Seq. (PubMed,eprint) Bioinformatics 25(9):1105-1111 (2009)
Registry entries: Bio.Tools  RRID  OMICtools 
Vcftools
Collection of tools to work with VCF files
Versions of package vcftools
ReleaseVersionArchitectures
buster0.1.14+dfsg-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
wheezy0.1.9-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
stretch0.1.14+dfsg-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid0.1.14+dfsg-4amd64,arm64,armel,armhf,hurd-i386,i386,kfreebsd-amd64,kfreebsd-i386,mips,mips64el,mipsel,powerpc,ppc64el,s390x
jessie0.1.12+dfsg-1amd64,arm64,armel,armhf,i386,mips,mipsel,powerpc,ppc64el,s390x
upstream0.1.15
Debtags of package vcftools:
roleprogram
Popcon: 28 users (9 upd.)*
Newer upstream!
License: DFSG free
Git

VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide methods for working with VCF files: validating, merging, comparing and calculate some basic population genetic statistics.

Please cite: Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert E. Handsaker, Gerton Lunter, Gabor T. Marth, Stephen T. Sherry, Gilean McVean and Richard Durbin: The variant call format and VCFtools. (PubMed,eprint) Bioinformatics 27(15):2156-8 (2011)
Registry entries: Bio.Tools  RRID  OMICtools 
Velvet
Nucleic acid sequence assembler for very short reads
Versions of package velvet
ReleaseVersionArchitectures
buster1.2.10+dfsg1-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
squeeze1.0.02~nozlibcopy-1amd64,armel,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,sparc
sid1.2.10+dfsg1-3amd64,arm64,armel,armhf,hurd-i386,i386,kfreebsd-amd64,kfreebsd-i386,mips,mips64el,mipsel,powerpc,ppc64el,s390x
experimental1.2.10+dfsg1-4amd64,arm64,armel,armhf,hurd-i386,i386,kfreebsd-amd64,kfreebsd-i386,mips,mips64el,mipsel,powerpc,ppc64el,s390x
stretch1.2.10+dfsg1-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie1.2.10+dfsg1-1amd64,arm64,armel,armhf,i386,mips,mipsel,powerpc,ppc64el,s390x
wheezy1.2.03~nozlibcopy-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
Debtags of package velvet:
biologynuceleic-acids
fieldbiology, biology:bioinformatics
interfacecommandline
roleprogram
useanalysing
Popcon: 13 users (13 upd.)*
Versions and Archs
License: DFSG free
Git

Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge, in the United Kingdom.

Velvet currently takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired read information, if available, to retrieve the repeated areas between contigs.

Please cite: Daniel R. Zerbino and Ewan Birney: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. (PubMed,eprint) Genome Research 18(5):821-829 (2008)
Registry entries: Bio.Tools  SciCrunch  OMICtools 
*Popularitycontest results: number of people who use this package regularly (number of people who upgraded this package recently) out of 203532