Debian Med Project
Help us to see Debian used by medical practitioners and biomedical researchers! Join us on the Salsa page.
Summary
Biology
Debian Med bioinformatics packages

This metapackage will install Debian packages for use in molecular biology, structural biology and other biological sciences.

Description

For a better overview of the project's availability as a Debian package, each head row has a color code according to this scheme:

If you discover a project which looks like a good candidate for Debian Med to you, or if you have prepared an unofficial Debian package, please do not hesitate to send a description of that project to the Debian Med mailing list

Links to other tasks

Debian Med Biology packages

Official Debian packages with high relevance

Abacas
close gaps in genomic alignments from short reads
Versions of package abacas
ReleaseVersionArchitectures
stretch1.3.1-3all
jessie1.3.1-2all
sid1.3.1-8all
wheezy1.3.1-1all
bullseye1.3.1-8all
buster1.3.1-5all
Debtags of package abacas:
roleprogram
Popcon: 2 users (7 upd.)*
Versions and Archs
License: DFSG free
Git

ABACAS (Algorithm Based Automatic Contiguation of Assembled Sequences) intends to rapidly contiguate (align, order, orientate), visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence.

ABACAS uses MUMmer to find alignment positions and identify syntenies of assembled contigs against the reference. The output is then processed to generate a pseudomolecule taking overlapping contigs and gaps in to account. ABACAS generates a comparison file that can be used to visualize ordered and oriented contigs in ACT. Synteny is represented by red bars where colour intensity decreases with lower values of percent identity between comparable blocks. Information on contigs such as the orientation, percent identity, coverage and overlap with other contigs can also be visualized by loading the outputted feature file on ACT.

The package is enhanced by the following packages: abacas-examples
Please cite: Samuel Assefa, Thomas M. Keane, Thomas D. Otto, Chris Newbold and Matthew Berriman: ABACAS: algorithm-based automatic contiguation of assembled sequences. (PubMed,eprint) Bioinformatics 25(15):1968-1969 (2009)
Registry entries: OMICtools 
Topics: Probes and primers
Abyss
de novo, parallel, sequence assembler for short reads
Versions of package abyss
ReleaseVersionArchitectures
wheezy1.3.4-3 (non-free)amd64
bullseye2.2.4-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid2.2.4-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
jessie1.5.2-1 (non-free)amd64
stretch2.0.2-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster2.1.5-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Debtags of package abyss:
roleprogram
Popcon: 4 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

ABySS is a de novo, parallel, sequence assembler that is designed for short reads. It may be used to assemble genome or transcriptome sequence data. Parallelization is achieved using MPI, OpenMP and pthread.

Please cite: Shaun D. Jackman, Benjamin P. Vandervalk, Hamid Mohamadi, Justin Chu, Sarah Yeo, S. Austin Hammond, Golnaz Jahesh, Hamza Khan, Lauren Coombe, Rene L. Warren and İnanç Birol: "ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter". (PubMed,eprint) Genome Research 27(5):768-777 (2017)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Topics: Sequence assembly
Ampliconnoise
removal of noise from 454 sequenced PCR amplicons
Versions of package ampliconnoise
ReleaseVersionArchitectures
stretch1.29-6amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.29-8amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.29-8amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
jessie1.29-2amd64,armel,armhf,i386
buster1.29-8amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
wheezy1.25-1amd64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,powerpc,sparc
Debtags of package ampliconnoise:
roleprogram
Popcon: 3 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

AmpliconNoise is a package of applications to clean up high-throughput sequence data. It consists of three main parts:

Pyronoise - does flowgram-based clustering to spot misreads SeqNoise - removes PCR point mutations Perseus - removes PCR chimeras without the need for a set of reference sequences

Previously there was a standalone "Pyronoise" by the same authors and this package includes an updated version. There is also a "Denoiser" in Qiime which is related but distinct.

Please cite: Christopher Quince, Anders Lanzen, Russell J Davenport and Peter J Turnbaugh: Removing Noise From Pyrosequenced Amplicons. (PubMed,eprint) BMC Bioinformatics 12:38 (2011)
Registry entries: Bio.tools  SciCrunch  OMICtools 
Topics: Sequencing
Beads
2-DE electrophoresis gel image spot detection
Versions of package beads
ReleaseVersionArchitectures
buster1.1.18+dfsg-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.1.18+dfsg-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.1.20-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

Beads is a program for spot detection on 2-D gel images. It is based on an analogy with beads flowing uphill on the surface of the gel image and on the analysis of their paths (Langella & Zivy, 2008).

Please cite: Olivier Langella and Michel Zivy: A method based on bead flows for spot detection on 2-D gel images. (PubMed) Proteomics 8(23-24):4914-8 (2008)
Registry entries: OMICtools 
Canu
single molecule sequence assembler for genomes
Versions of package canu
ReleaseVersionArchitectures
sid2.0+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
buster1.8+dfsg-2amd64
bullseye2.0+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
Popcon: 2 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II or Oxford Nanopore MinION).

Canu is a hierarchical assembly pipeline which runs in four steps:

  • Detect overlaps in high-noise sequences using MHAP
  • Generate corrected sequence consensus
  • Trim corrected sequences
  • Assemble trimmed corrected sequences
Please cite: Sergey Koren, Brian P. Walenz, Konstantin Berlin, Jason R. Miller and Adam M. Phillippy: Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. bioRxiv (2016)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Remark of Debian Med team: Genome assembly and large-scale genome alignment (http://www.cbcb.umd.edu/software/)
Changeo
Repertoire clonal assignment toolkit (Python 3)
Versions of package changeo
ReleaseVersionArchitectures
sid1.0.0-1all
bullseye1.0.0-1all
buster0.4.5-1all
Popcon: 3 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

Change-O is a collection of tools for processing the output of V(D)J alignment tools, assigning clonal clusters to immunoglobulin (Ig) sequences, and reconstructing germline sequences.

Dramatic improvements in high-throughput sequencing technologies now enable large-scale characterization of Ig repertoires, defined as the collection of trans-membrane antigen-receptor proteins located on the surface of B cells and T cells. Change-O is a suite of utilities to facilitate advanced analysis of Ig and TCR sequences following germline segment assignment. Change-O handles output from IMGT/HighV-QUEST and IgBLAST, and provides a wide variety of clustering methods for assigning clonal groups to Ig sequences. Record sorting, grouping, and various database manipulation operations are also included.

This package installs the library for Python 3.

Please cite: Namita T. Gupta, Jason A. Vander Heiden, Mohamed Uduman, Daniel Gadala-Maria, Gur Yaari and Steven H. Kleinstein: Link to publication (PubMed,eprint) Bioinformatics 31(20):3356-3358 (2015)
Registry entries: OMICtools  Bioconda 
Circos
plotter for visualizing data
Versions of package circos
ReleaseVersionArchitectures
wheezy0.61-3all
sid0.69.9+dfsg-2all
bullseye0.69.9+dfsg-2all
buster0.69.6+dfsg-2all
stretch0.69.4+dfsg-1all
jessie0.66-1all
Debtags of package circos:
fieldbiology:bioinformatics
roleprogram
useviewing
Popcon: 12 users (21 upd.)*
Versions and Archs
License: DFSG free
Git

Circos visualizes data in a circular layout — ideal for exploring relationships between objects or positions, and creating highly informative publication-quality graphics.

This package provides the Circos plotting engine, which is command-line driven (like gnuplot) and fully scriptable.

Please cite: Martin I Krzywinski, Jacqueline E Schein, Inanc Birol, Joseph Connors, Randy Gascoyne, Doug Horsman, Steven J Jones and Marco A Marra: Circos: An information aesthetic for comparative genomics. (PubMed,eprint) Genome Research 19(9):1639-45 (2009)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Concavity
predictor of protein ligand binding sites from structure and conservation
Versions of package concavity
ReleaseVersionArchitectures
buster0.1+dfsg.1-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie0.1-2amd64,armel,armhf,i386
bullseye0.1+dfsg.1-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid0.1+dfsg.1-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch0.1+dfsg.1-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 5 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

ConCavity predicts protein ligand binding sites by combining evolutionary sequence conservation and 3D structure.

ConCavity takes as input a PDB format protein structure and optionally files that characterize the evolutionary sequence conservation of the chains in the structure file.

The following result files are produced by default:

  • Residue ligand binding predictions for each chain (*.scores).
  • Residue ligand binding predictions in a PDB format file (residue scores placed in the temp. factor field, *_residue.pdb).
  • Pocket prediction locations in a DX format file (*.dx).
  • PyMOL script to visualize the predictions (*.pml).
The package is enhanced by the following packages: conservation-code
Please cite: John A. Capra, Roman A. Laskowski, Janet M. Thornton, Mona Singh and Thomas A. Funkhouser: Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure. (PubMed) PLoS Computational Biology 5(12):e1000585 (2009)
Registry entries: SciCrunch  OMICtools 
Conservation-code
protein sequence conservation scoring tool
Versions of package conservation-code
ReleaseVersionArchitectures
sid20110309.0-8all
jessie20110309.0-3all
bullseye20110309.0-8all
buster20110309.0-7all
stretch20110309.0-5all
Popcon: 7 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

This package provides score_conservation(1), a tool to score protein sequence conservation.

The following conservation scoring methods are implemented:

  • sum of pairs
  • weighted sum of pairs
  • Shannon entropy
  • Shannon entropy with property groupings (Mirny and Shakhnovich 1995, Valdar and Thornton 2001)
  • relative entropy with property groupings (Williamson 1995)
  • von Neumann entropy (Caffrey et al 2004)
  • relative entropy (Samudrala and Wang 2006)
  • Jensen-Shannon divergence (Capra and Singh 2007)

A window-based extension that incorporates the estimated conservation of sequentially adjacent residues into the score for each column is also given. This window approach can be applied to any of the conservation scoring methods.

The program accepts alignments in the CLUSTAL and FASTA formats.

The sequence-specific output can be used as the conservation input for concavity.

Conservation is highly predictive in identifying catalytic sites and residues near bound ligands.

Please cite: John A. Capra and Mona Singh: Predicting functionally important residues from sequence conservation. (PubMed) Bioinformatics 23(15):1875-82 (2007)
Registry entries: OMICtools 
Daligner
local alignment discovery between long nucleotide sequencing reads
Versions of package daligner
ReleaseVersionArchitectures
buster1.0+git20180524.fd21879-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.0+git20200608.c18a2fb-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.0+20161119-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.0+git20200608.c18a2fb-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (6 upd.)*
Versions and Archs
License: DFSG free
Git

These tools permit one to find all significant local alignments between reads encoded in a Dazzler database. The assumption is that the reads are from a Pacific Biosciences RS II long read sequencer. That is, the reads are long and noisy, up to 15% on average.

Please cite: Gene Myers: Efficient Local Alignment Discovery amongst Noisy Long Reads. 8701:52-67 (2014)
Registry entries: SciCrunch  OMICtools  Bioconda 
Discosnp
discovering Single Nucleotide Polymorphism from raw set(s) of reads
Versions of package discosnp
ReleaseVersionArchitectures
bullseye2.4.3-1amd64,arm64,i386,mips64el,ppc64el,s390x
stretch1.2.6-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid2.4.3-1amd64,arm64,i386,mips64el,ppc64el,s390x
buster2.3.0-2amd64,arm64,i386,mips64el,ppc64el,s390x
jessie1.2.5-1amd64,armel,armhf,i386
upstream4.4.4
Popcon: 3 users (4 upd.)*
Newer upstream!
License: DFSG free
Git

Software discoSnp is designed for discovering Single Nucleotide Polymorphism (SNP) from raw set(s) of reads obtained with Next Generation Sequencers (NGS).

Note that number of input read sets is not constrained, it can be one, two, or more. Note also that no other data as reference genome or annotations are needed.

The software is composed by two modules. First module, kissnp2, detects SNPs from read sets. A second module, kissreads, enhance the kissnp2 results by computing per read set and for each found SNP:

 1) its mean read coverage
 2) the (phred) quality of reads generating the polymorphism.

This program is superseded by DiscoSnp++.

Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Disulfinder
cysteines disulfide bonding state and connectivity predictor
Versions of package disulfinder
ReleaseVersionArchitectures
bullseye1.2.11-8amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.2.11-8amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.2.11-6amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
wheezy1.2.11-2amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
jessie1.2.11-4amd64,armel,armhf,i386
buster1.2.11-8amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Debtags of package disulfinder:
roleprogram
Popcon: 4 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

'disulfinder' is for predicting the disulfide bonding state of cysteines and their disulfide connectivity starting from sequence alone. Disulfide bridges play a major role in the stabilization of the folding process for several proteins. Prediction of disulfide bridges from sequence alone is therefore useful for the study of structural and functional properties of specific proteins. In addition, knowledge about the disulfide bonding state of cysteines may help the experimental structure determination process and may be useful in other genomic annotation tasks.

'disulfinder' predicts disulfide patterns in two computational stages: (1) the disulfide bonding state of each cysteine is predicted by a BRNN-SVM binary classifier; (2) cysteines that are known to participate in the formation of bridges are paired by a Recursive Neural Network to obtain a connectivity pattern.

Please cite: Alessio Ceroni, Andrea Passerini, Alessandro Vullo and Paolo Frasconi: DISULFIND: a disulfide bonding state and cysteine connectivity prediction server. (PubMed) Nucleic Acids Res 34(Web Server issue):W177-181 (2006)
Registry entries: Bio.tools  SciCrunch  OMICtools 
Dnaclust
tool for clustering millions of short DNA sequences
Versions of package dnaclust
ReleaseVersionArchitectures
stretch3-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie3-2amd64,armel,armhf,i386
sid3-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye3-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster3-6amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 6 users (6 upd.)*
Versions and Archs
License: DFSG free
Git

dnaclust is a tool for clustering large number of short DNA sequences. The clusters are created in such a way that the "radius" of each clusters is no more than the specified threshold.

The input sequences to be clustered should be in Fasta format. The id of each sequence is based on the first word of the seqeunce in the Fasta format. The first word is the prefix of the header up to the first occurrence of white space characters in the header.

Please cite: Mohammadreza Ghodsi, Bo Liu and Mihai Pop: DNACLUST: accurate and efficient clustering of phylogenetic marker genes. (PubMed,eprint) BMC Bioinformatics 12:271 (2011)
Registry entries: Bio.tools  SciCrunch  OMICtools 
Estscan
ORF-independent detector of coding DNA sequences
Versions of package estscan
ReleaseVersionArchitectures
buster3.0.3-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid3.0.3-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye3.0.3-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

ESTScan is a program that can detect coding regions in DNA sequences, even if they are of low quality. ESTScan will also detect and correct sequencing errors that lead to frameshifts. ESTScan is not a gene prediction program , nor is it an open reading frame detector. In fact, its strength lies in the fact that it does not require an open reading frame to detect a coding region. As a result, the program may miss a few translated amino acids at either the N or the C terminus, but will detect coding regions with high selectivity and sensitivity.

ESTScan takes advantages of the bias in hexanucleotide usage found in coding regions relative to non-coding regions. This bias is formalized as an inhomogeneous 3-periodic fifth-order Hidden Markov Model (HMM). Additionally, the HMM of ESTScan has been extended to allows insertions and deletions when these improve the coding region statistics.

Please cite: C. Lottaz, C. Iseli, CV. Jongeneel and Philipp Bucher: Modeling sequencing errors by combining Hidden Markov models Bioinformatics 19:103-112 (2003)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Remark of Debian Med team: This package ships with BioLinux http://envgen.nox.ac.uk/biolinux.html
Gatb-core
Genome Analysis Toolbox with de-Bruijn graph
Versions of package gatb-core
ReleaseVersionArchitectures
bullseye1.4.2+dfsg-2amd64,arm64,i386,mips64el,ppc64el,s390x
sid1.4.2+dfsg-2amd64,arm64,i386,mips64el,ppc64el,s390x
buster1.4.1+git20181225.44d5a44+dfsg-3amd64,arm64,i386,mips64el,ppc64el,s390x
Popcon: 4 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

The GATB-CORE project provides a set of highly efficient algorithms to analyse NGS data sets. These methods enable the analysis of data sets of any size on multi-core desktop computers, including very huge amount of reads data coming from any kind of organisms such as bacteria, plants, animals and even complex samples (e.g. metagenomes). Read more about GATB at https://gatb.inria.fr/. By itself GATB-CORE is not an NGS data analysis tool. However, it can be used to create such tools. There already exist a set of ready-to-use tools relying on GATB-CORE library: see https://gatb.inria.fr/software/

Please cite: Erwan Drezen, Guillaume Rizk, Rayan Chikhi, Charles Deltel, Claire Lemaitre, Pierre Peterlongo and Dominique Lavenier: GATB: Genome Assembly & Analysis Tool Box. Bioinformatics 30(20):2959-2961 (2014)
Registry entries: OMICtools  Bioconda 
Genometester
toolkit for performing set operations on k-mer lists
Versions of package genometester
ReleaseVersionArchitectures
bullseye4.0+git20200511.91cecb5+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid4.0+git20200511.91cecb5+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster4.0+git20180508.a9c14a6+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

Toolkit for performing set operations - union, intersection and complement - on k-mer lists.

GenomeTester4 toolkit, which contains a novel tool GListCompare for performing union, intersection and complement (difference) set operations on k-mer lists. It contains examples of how these general operations can be combined to solve a variety of biological analysis tasks.

Please cite: Lauris Kaplinski, Maarja Lepamets and Maido Remm: GenomeTester4: a toolkit for performing basic set operations - union, intersection and complement on k-mer lists. (PubMed,eprint) GigaScience 4(1):58 (2015)
Registry entries: Bio.tools  OMICtools  Bioconda 
Genomethreader
software tool to compute gene structure predictions
Versions of package genomethreader
ReleaseVersionArchitectures
sid1.7.3+dfsg-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.7.3+dfsg-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 0 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

GenomeThreader is a software tool to compute gene structure predictions. The gene structure predictions are calculated using a similarity-based approach where additional cDNA/EST and/or protein sequences are used to predict gene structures via spliced alignments. GenomeThreader was motivated by disabling limitations in GeneSeqer, a popular gene prediction program which is widely used for plant genome annotation.

Please cite: G. Gremme, V. Brendel, M.E. Sparks and S. Kurtz: Engineering a software tool for gene structure prediction in higher organisms. Information and Software Technology 47(15):965-978 (2005)
Registry entries: Bio.tools  OMICtools  Bioconda 
Genometools
versatile genome analysis toolkit
Versions of package genometools
ReleaseVersionArchitectures
bullseye1.6.1+ds-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.6.1+ds-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.5.9+ds-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster1.5.10+ds-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie1.5.3-2amd64,armel,armhf,i386
Debtags of package genometools:
biologynuceleic-acids
fieldbiology, biology:bioinformatics
interfacecommandline
roleprogram
uitoolkitncurses
Popcon: 8 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

The GenomeTools contains a collection of useful tools for biological sequence analysis and -presentation combined into a single binary.

The toolkit contains binaries for sequence and annotation handling, sequence compression, index structure generation and access, annotation visualization, and much more.

Please cite: Gordon Gremme, Sascha Steinbiss and Stefan Kurtz: GenomeTools: a comprehensive software library for efficient processing of structured genome annotations.. (PubMed) IEEE/ACM Transactions on Computational Biology and Bioinformatics 10(3):645-656 (2013)
Registry entries: Bio.tools  OMICtools 
Gffread
GFF/GTF format conversions, region filtering, FASTA sequence extraction
Versions of package gffread
ReleaseVersionArchitectures
bullseye0.11.8-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid0.11.8-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 1 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

Gffread is a GFF/GTF parsing utility providing format conversions, region filtering, FASTA sequence extraction and more.

Registry entries: Bio.tools  OMICtools  Bioconda 
Grabix
wee tool for random access into BGZF files
Versions of package grabix
ReleaseVersionArchitectures
buster0.1.7-1amd64,arm64,armel,armhf,mips,mips64el,mipsel,ppc64el,s390x
bullseye0.1.7-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
sid0.1.7-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

In biomedical research it is increasing practice to study the genetic basis of disease. This now frequently comprises the sequencing of human sequences. The output of the machine however is redundant, and the real sequence is the best sequence to explain the redundancy. The exchange of data happens only with compressed files - to huge and redundant to perform otherwise. One should avoid uncompression whenever possible.

grabix leverages the fantastic BGZF library of the samtools package to provide random access into text files that have been compressed with bgzip. grabix creates it's own index (.gbi) of the bgzipped file. Once indexed, one can extract arbitrary lines from the file with the grab command. Or choose random lines with the, well, random command.

Registry entries: OMICtools  Bioconda 
Graphlan
circular representations of taxonomic and phylogenetic trees
Versions of package graphlan
ReleaseVersionArchitectures
bullseye1.1.3-2all
stretch1.1-2all
sid1.1.3-2all
buster1.1.3-1all
Popcon: 4 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

GraPhlAn is a software tool for producing high-quality circular representations of taxonomic and phylogenetic trees. It focuses on concise, integrative, informative, and publication-ready representations of phylogenetically- and taxonomically-driven investigation.

Registry entries: OMICtools  Bioconda 
Gubbins
phylogenetic analysis of genome sequences
Versions of package gubbins
ReleaseVersionArchitectures
buster2.3.4-1amd64,i386
stretch2.2.0-1amd64,i386
sid2.4.1-2amd64,i386
bullseye2.4.1-2amd64,i386
Popcon: 2 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

Gubbins supports rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences.

Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions. Simulations demonstrate the algorithm generates highly accurate reconstructions under realistic models of short-term bacterial evolution, and can be run in only a few hours on alignments of hundreds of bacterial genome sequences.

Please cite: Nicholas J. Croucher, Andrew J. Page, Thomas R. Connor, Aidan J. Delaney, Jacqueline A. Keane, Stephen D. Bentley, Julian Parkhill and Simon R. Harris: Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. (PubMed,eprint) Nucleic Acids Research 43(3):e15 (2014)
Registry entries: OMICtools  Bioconda 
Gwama
Genome-Wide Association Meta Analysis
Versions of package gwama
ReleaseVersionArchitectures
buster2.2.2+dfsg-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye2.2.2+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid2.2.2+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch2.2.2+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

GWAMA (Genome-Wide Association Meta Analysis) software performs meta-analysis of the results of GWA studies of binary or quantitative phenotypes. Fixed- and random-effect meta-analyses are performed for both directly genotyped and imputed SNPs using estimates of the allelic odds ratio and 95% confidence interval for binary traits, and estimates of the allelic effect size and standard error for quantitative phenotypes. GWAMA can be used for analysing the results of all different genetic models (multiplicative, additive, dominant, recessive). The software incorporates error trapping facilities to identify strand alignment errors and allele flipping, and performs tests of heterogeneity of effects between studies.

Please cite: Reedik Mägi and Andrew P. Morris: GWAMA: software for genome-wide association meta-analysis. (eprint) BMC Bioinformatics 11(May):288 (2010)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Harvest-tools
archiving and postprocessing for reference-compressed genomic multi-alignments
Versions of package harvest-tools
ReleaseVersionArchitectures
stretch1.3-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster1.3-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.3-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.3-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

HarvestTools is a utility for creating and interfacing with Gingr files, which are efficient archives that the Harvest Suite uses to store reference-compressed multi-alignments, phylogenetic trees, filtered variants and annotations. Though designed for use with Parsnp and Gingr, HarvestTools can also be used for generic conversion between standard bioinformatics file formats.

Please cite: Todd J. Treangen, Brian D. Ondov, Sergey Koren and Adam M. Phillippy: Rapid Core-Genome Alignment and Visualization for Thousands of Intraspecific Microbial Genomes. (PubMed,eprint) bioRxiv 15(11):524 (2014)
Registry entries: OMICtools  Bioconda 
Hilive
realtime alignment of Illumina reads
Versions of package hilive
ReleaseVersionArchitectures
bullseye2.0a-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch0.3-2amd64,arm64,armel,i386,mips64el,mipsel,ppc64el
buster1.1-2amd64,arm64,armel,armhf,mips,mips64el,mipsel,ppc64el,s390x
sid2.0a-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 3 users (7 upd.)*
Versions and Archs
License: DFSG free
Git

HiLive is a read mapping tool that maps Illumina HiSeq (or comparable) reads to a reference genome right in the moment when they are produced. This means, read mapping is finished as soon as the sequencer is finished generating the data.

Please cite: Martin S. Lindner, Benjamin Strauch, Jakob M. Schulze, Simon H. Tausch, Piotr W. Dabrowski, Andreas Nitsche and Bernhard Y. Renard: HiLive: real-time mapping of illumina reads while sequencing. (PubMed) Bioinformatics 33(6):917-919 (2017)
Registry entries: OMICtools 
Hisat2
graph-based alignment of short nucleotide reads to many genomes
Versions of package hisat2
ReleaseVersionArchitectures
buster2.1.0-2amd64
stretch2.0.5-1amd64
bullseye2.2.0-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid2.2.0-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 6 users (7 upd.)*
Versions and Archs
License: DFSG free
Git

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as against a single reference genome). Based on an extension of BWT for graphs a graph FM index (GFM) was designed and implementd. In addition to using one global GFM index that represents a population of human genomes, HISAT2 uses a large set of small GFM indexes that collectively cover the whole genome (each index representing a genomic region of 56 Kbp, with 55,000 indexes needed to cover the human population). These small indexes (called local indexes), combined with several alignment strategies, enable rapid and accurate alignment of sequencing reads. This new indexing scheme is called a Hierarchical Graph FM index (HGFM).

The package is enhanced by the following packages: multiqc
Please cite: Daehwan Kim, Joseph M. Paggi, Chanhee Park, Christopher Bennett and Steven L. Salzberg: Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology 37(8):907-915 (2019)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Hmmer2
profile hidden Markov models for protein sequence analysis
Versions of package hmmer2
ReleaseVersionArchitectures
sid2.3.2+dfsg-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye2.3.2+dfsg-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster2.3.2+dfsg-6amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch2.3.2-13amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie2.3.2-8amd64,armel,armhf,i386
Popcon: 5 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

HMMER is an implementation of profile hidden Markov model methods for sensitive searches of biological sequence databases using multiple sequence alignments as queries.

Given a multiple sequence alignment as input, HMMER builds a statistical model called a "hidden Markov model" which can then be used as a query into a sequence database to find (and/or align) additional homologues of the sequence family.

Please cite: Eddy, Sean R.: Profile hidden Markov models. (PubMed) Bioinformatics 14(9):755-763 (1998)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Remark of Debian Med team: This older version of HMMER is used in some applications

While Debian has HMMER 3 since some time there are users of HMMER 2 interested in having this old version available and thus the package is reintroduced.

Idba
iterative De Bruijn Graph short read assemblers
Versions of package idba
ReleaseVersionArchitectures
bullseye1.1.3-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
jessie1.1.2-1amd64,armel,armhf,i386
sid1.1.3-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.1.3-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch1.1.3-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.1.3-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
jessie1.1.2-1amd64,armel,armhf,i386
sid1.1.3-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.1.3-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch1.1.3-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 3 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

IDBA stands for iterative de Bruijn graph assembler. In computational sequence biology, an assembler solves the puzzle coming from large sequencing machines that feature many gigabytes of short reads from a large genome.

This package provides several flavours of the IDBA assembler, as they all share the same source tree but serve different purposes and evolved over time.

IDBA is the basic iterative de Bruijn graph assembler for second-generation sequencing reads. IDBA-UD, an extension of IDBA, is designed to utilize paired-end reads to assemble low-depth regions and use progressive depth on contigs to reduce errors in high-depth regions. It is a generic purpose assembler and especially good for single-cell and metagenomic sequencing data. IDBA-Hybrid is another update version of IDBA-UD, which can make use of a similar reference genome to improve assembly result. IDBA-Tran is an iterative de Bruijn graph assembler for RNA-Seq data.

Please cite: Yu Peng, Henry C. M. Leung, S. M. Yiu and Francis Y. L. Chin: IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. (PubMed,eprint) Bioinformatics 28(11):1420-1428 (2012)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Igdiscover
analyzes antibody repertoires to find new V genes
Versions of package igdiscover
ReleaseVersionArchitectures
bullseye0.11-3all
sid0.11-3all
upstream0.12.3
Popcon: users ( upd.)*
Newer upstream!
License: DFSG free
Git

IgDiscover analyzes antibody repertoires and discovers new V genes from high-throughput sequencing reads. Heavy chains, kappa and lambda light chains are supported (to discover VH, VK and VL genes).

Please cite: Martin M. Corcoran, Ganesh E. Phad, Néstor Vázquez Bernat, Christiane Stahl-Hennig, Noriyuki Sumida, Mats A.A. Persson, Marcel Martin and Gunilla B. Karlsson Hedestam: Production of individualized V gene databases reveals high levels of immunoglobulin genetic diversity.. (eprint) Nature Communications 7:13642 (2016)
Registry entries: Bio.tools  OMICtools  Bioconda 
Igor
infers V(D)J recombination processes from sequencing data
Versions of package igor
ReleaseVersionArchitectures
sid1.4.0+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.3.0+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.4.0+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 1 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

IGoR (Inference and Generation of Repertoires) is a versatile software to analyze and model immune receptors generation, selection, mutation and all other processes.

Please cite: Quentin Marcou, Thierry Mora and Aleksandra M. Walczak: High-throughput immune repertoire analysis with IGoR. (PubMed,eprint) Nature Communications 9(1):561 (2018)
Registry entries: OMICtools  Bioconda 
Indelible
powerful and flexible simulator of biological evolution
Versions of package indelible
ReleaseVersionArchitectures
sid1.03-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.03-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.03-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster1.03-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 5 users (9 upd.)*
Versions and Archs
License: DFSG free
Git

INDELible is a new, portable, and flexible application for biological sequence simulation that combines many features in the same place for the first time. Using a length-dependent model of indel formation it can simulate evolution of multi-partitioned nucleotide, amino-acid, or codon data sets through the processes of insertion, deletion, and substitution in continuous time.

Nucleotide simulations may use the general unrestricted model or the general time reversible model and its derivatives, and amino-acid simulations can be conducted using fifteen different empirical rate matrices. Substitution rate heterogeneity can be modeled via the continuous and discrete gamma distributions, with or without a proportion of invariant sites. INDELible can also simulate under non-homogeneous and non-stationary conditions where evolutionary models are permitted to change across a phylogeny.

Unique among indel simulation programs, INDELible offers the ability to simulate using codon models that exhibit nonsynonymous/synonymous rate ratio heterogeneity among sites and/or lineages.

Please cite: William Fletcher and Ziheng Yang: INDELible: A Flexible Simulator of Biological Sequence Evolution. (eprint) Molecular Biology and Evolution 26(8):1879-1888 (2009)
Registry entries: OMICtools 
Topics: Sequencing
Iqtree
efficient phylogenetic software by maximum likelihood
Versions of package iqtree
ReleaseVersionArchitectures
stretch1.5.3+dfsg-2amd64,i386
buster1.6.9+dfsg-1amd64,i386
bullseye1.6.12+dfsg-1amd64,i386
sid1.6.12+dfsg-1amd64,i386
upstream2.0.7
Popcon: 3 users (1 upd.)*
Newer upstream!
License: DFSG free
Git

IQ-TREE is a very efficient maximum likelihood phylogenetic software with following key features among others:

  • A novel fast and effective stochastic algorithm to estimate maximum likelihood trees. IQ-TREE outperforms both RAxML and PhyML in terms of likelihood while requiring similar amount of computing time (see Nguyen et al., 2015)
  • An ultrafast bootstrap approximation to assess branch supports (see Minh et al., 2013).
  • A wide range of substitution models for binary, DNA, protein, codon, and morphological alignments.
  • Ultrafast model selection for all data types, 10 to 100 times faster than jModelTest and ProtTest.
  • Finding best partition scheme like PartitionFinder.
  • Partitioned models with mixed data types for phylogenomic (multi- gene) alignments, allowing for separate, proportional, or joint branch lengths among genes.
  • Supporting the phylogenetic likelihod library (PLL) (see Flouri et al., 2014)
Please cite: Lam Tung Nguyen, Heiko A. Schmidt, Arndt von Haeseler and Bui Quang Minh: IQ-TREE: A fast and effective stochastic algorithm for estimating maximum likelihood phylogenies. (PubMed,eprint) Mol. Biol. Evol. 32(1):268-274 (2015)
Registry entries: Bio.tools  OMICtools  Bioconda 
Iva
iterative virus sequence assembler
Versions of package iva
ReleaseVersionArchitectures
buster1.0.9+ds-6amd64,arm64,mips64el,ppc64el
sid1.0.9+ds-10amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
stretch1.0.8+ds-1amd64,arm64,mips64el,ppc64el
bullseye1.0.9+ds-10amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
Popcon: 3 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

IVA is a de novo assembler designed to assemble virus genomes that have no repeat sequences, using Illumina read pairs sequenced from mixed populations at extremely high depth.

IVA's main algorithm works by iteratively extending contigs using aligned read pairs. Its input can be just read pairs, or additionally you can provide an existing set of contigs to be extended. Alternatively, it can take reads together with a reference sequence.

Please cite: M. Hunt, A. Gall, S. H. Ong, J. Brener, B. Ferns, P. Goulder, E. Nastouli, J. A. Keane, P. Kellam and T. D. Otto: IVA: accurate de novo assembly of RNA virus genomes. (PubMed) Bioinformatics 31(14):2374-2376 (2015)
Registry entries: Bio.tools  OMICtools  Bioconda 
Jaligner
Smith-Waterman algorithm with Gotoh's improvement
Versions of package jaligner
ReleaseVersionArchitectures
sid1.0+dfsg-6all
bullseye1.0+dfsg-6all
stretch1.0+dfsg-4all
buster1.0+dfsg-6all
Popcon: 5 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

JAligner is an open source Java implementation of the Smith-Waterman algorithm with Gotoh's improvement for biological local pairwise sequence alignment with the affine gap penalty model.

Registry entries: OMICtools 
Jalview
multiple alignment editor
Versions of package jalview
ReleaseVersionArchitectures
sid2.7.dfsg-5all
wheezy2.7.dfsg-2all
wheezy-security2.7.dfsg-2+deb7u1all
jessie2.7.dfsg-4all
upstream2.11.1.0
Popcon: 1 users (0 upd.)*
Newer upstream!
License: DFSG free
Git

JalView is a Java alignment editor that can work with sequence alignment produced by programs implementing alignment algorithms such as clustalw, kalign and t-coffee.

It has lots of features, is actively developed, and will compare advantageously to BioEdit, while being free as in free speech !

Please cite: Andrew M. Waterhouse, James B. Procter, David M. A. Martin, Michèle Clamp and Geoffrey J. Barton: Jalview Version 2-a multiple sequence alignment editor and analysis workbench. (PubMed,eprint) Bioinformatics 25:1189-1191 (2009)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Screenshots of package jalview
Jmodeltest
HPC selection of models of nucleotide substitution
Versions of package jmodeltest
ReleaseVersionArchitectures
buster2.1.10+dfsg-7all
sid2.1.10+dfsg-8all
bullseye2.1.10+dfsg-8all
stretch2.1.10+dfsg-5all
Popcon: 2 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

jModelTest is a tool to carry out statistical selection of best-fit models of nucleotide substitution. It implements five different model selection strategies: hierarchical and dynamical likelihood ratio tests (hLRT and dLRT), Akaike and Bayesian information criteria (AIC and BIC), and a decision theory method (DT). It also provides estimates of model selection uncertainty, parameter importances and model-averaged parameter estimates, including model-averaged tree topologies. jModelTest 2 includes High Performance Computing (HPC) capabilities and additional features like new strategies for tree optimization, model- averaged phylogenetic trees (both topology and branch length), heuristic filtering and automatic logging of user activity.

Please cite: Diego Darriba, Guillermo L Taboada, Ramón Doallo and David Posada: jModelTest 2: more models, new heuristics and parallel computing. (PubMed) Nature Methods 9(8):772 (2012)
Registry entries: SciCrunch  OMICtools 
Kallisto
near-optimal RNA-Seq quantification
Versions of package kallisto
ReleaseVersionArchitectures
bullseye0.46.2+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid0.46.2+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 0 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

Kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment. On benchmarks with standard RNA-Seq data, kallisto can quantify 30 million human reads in less than 3 minutes on a Mac desktop computer using only the read sequences and a transcriptome index that itself takes less than 10 minutes to build. Pseudoalignment of reads preserves the key information needed for quantification, and kallisto is therefore not only fast, but also as accurate than existing quantification tools. In fact, because the pseudoalignment procedure is robust to errors in the reads, in many benchmarks kallisto significantly outperforms existing tools.

The package is enhanced by the following packages: multiqc
Please cite: Nicolas L Bray, Harold Pimentel, Páll Melsted and Lior Pachter: Near-optimal probabilistic RNA-seq quantification. (PubMed) Nature Biotechnology 34(5):525–527 (2016)
Registry entries: Bio.tools  OMICtools  Bioconda 
Kaptive
obtain information about K and O types for Klebsiella genome assemblies
Versions of package kaptive
ReleaseVersionArchitectures
sid0.7.0-2all
bullseye0.7.0-2all
Popcon: 0 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

Kaptive reports information about K and O types for Klebsiella genome assemblies.

Given a novel genome and a database of known loci (K or O), Kaptive will help a user to decide whether their sample has a known or novel locus. It carries out the following for each input assembly:

  • BLAST for all known locus nucleotide sequences (using blastn) to identify the best match ('best' defined as having the highest coverage).
  • Extract the region(s) of the assembly which correspond to the BLAST hits (i.e. the locus sequence in the assembly) and save it to a FASTA file.
  • BLAST for all known locus genes (using tblastn) to identify which expected genes (genes in the best matching locus) are present/missing and whether any unexpected genes (genes from other loci) are present.
  • Output a summary to a table file.

In cases where your input assembly closely matches a known locus, Kaptive should make that obvious. When your assembly has a novel type, that too should be clear. However, Kaptive cannot reliably extract or annotate locus sequences for totally novel types - if it indicates a novel locus is present then extracting and annotating the sequence is up to you! Very poor assemblies can confound the results, so be sure to closely examine any case where the locus sequence in your assembly is broken into multiple pieces.

The package is enhanced by the following packages: kaptive-data kaptive-example
Please cite: Kelly L. Wyres, Ryan R. Wick, Claire Gorrie, Adam Jenney, Rainer Follador, Nicholas R. Thomson and Kathryn E. Holt: Identification of Klebsiella capsule synthesis loci from whole genome data. (PubMed) Microbial Genomics 2(12):e000102 (2016)
Registry entries: OMICtools  Bioconda 
Khmer
in-memory DNA sequence kmer counting, filtering & graph traversal
Versions of package khmer
ReleaseVersionArchitectures
buster2.1.2+dfsg-6amd64,arm64
stretch2.0+dfsg-10amd64,arm64,mips64el,ppc64el
bullseye2.1.2+dfsg-7amd64,arm64
sid2.1.2+dfsg-7amd64,arm64
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

khmer is a library and suite of command line tools for working with DNA sequence. It is primarily aimed at short-read sequencing data such as that produced by the Illumina platform. khmer takes a k-mer-centric approach to sequence analysis, hence the name.

Please cite: Michael R. Crusoe, Hussien F. Alameldin, Sherine Awad, Elmar Bucher, Adam Caldwell, Reed Cartwright, Amanda Charbonneau, Bede Constantinides, Greg Edvenson, Scott Fay, Jacob Fenton, Thomas Fenzl, Jordan Fish, Leonor Garcia-Gutierrez, Phillip Garland, Jonathan Gluck, Iván González, Sarah Guermond, Jiarong Guo, Aditi Gupta, Joshua R. Herr, Adina Howe, Alex Hyer, Andreas Härpfer, Luiz Irber, Rhys Kidd, David Lin, Justin Lippi, Tamer Mansour, Pamela McA'Nulty, Eric McDonald, Jessica Mizzi, Kevin D. Murray, Joshua R. Nahum, Kaben Nanlohy, Alexander Johan Nederbragt, Humberto Ortiz-Zuazaga, Jeramia Ory, Jason Pell, Charles Pepe-Ranney, Zachary N Russ, Erich Schwarz, Camille Scott, Josiah Seaman, Scott Sievert, Jared Simpson, Connor T. Skennerton, James Spencer, Ramakrishnan Srinivasan, Daniel Standage, James A. Stapleton, Joe Stein, Susan R Steinman, Benjamin Taylor, Will Trimble, Heather L. Wiencko, Michael Wright, Brian Wyss, Qingpeng Zhang, en zyme and C. Titus Brown: The khmer software package: enabling efficient sequence analysis. (2015)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Kineticstools
detection of DNA modifications
Versions of package kineticstools
ReleaseVersionArchitectures
bullseye0.6.1+git20200325.3558942+dfsg-1all
sid0.6.1+git20200325.3558942+dfsg-1all
stretch0.6.1+20161222-1all
buster0.6.1+git20180425.27a1878-2all
Popcon: 2 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

Tools for detecting DNA modifications from single molecule, real-time (SMRT®) sequencing data. This tool implements the P_ModificationDetection module in SMRT® Portal, used by the RS_Modification_Detection and RS_Modifications_and_Motif_Detection protocol. Researchers interested in understanding or extending the modification detection algorithms can use these tools as a starting point.

This package is part of the SMRTAnalysis suite.

Registry entries: OMICtools 
Kissplice
Detection of various kinds of polymorphisms in RNA-seq data
Versions of package kissplice
ReleaseVersionArchitectures
jessie2.2.1-3amd64
bullseye2.5.3-2amd64,arm64,mips64el,ppc64el
sid2.5.3-2amd64,arm64,mips64el,ppc64el
buster2.4.0-p1-4amd64,arm64,mips64el,ppc64el
stretch2.4.0-p1-1amd64,arm64,mips64el,ppc64el
Debtags of package kissplice:
biologynuceleic-acids
fieldbiology, biology:bioinformatics
interfacecommandline
roleprogram
useanalysing
works-withbiological-sequence
Popcon: 2 users (7 upd.)*
Versions and Archs
License: DFSG free
Git

KisSplice is a piece of software that enables the analysis of RNA-seq data with or without a reference genome. It is an exact local transcriptome assembler that allows one to identify SNPs, indels and alternative splicing events. It can deal with an arbitrary number of biological conditions, and will quantify each variant in each condition. It has been tested on Illumina datasets of up to 1G reads. Its memory consumption is around 5Gb for 100M reads.

Please cite: Gustavo AT Sacomoto, Janice Kielbassa, Rayan Chikhi, Raluca Uricaru, Pavlos Antoniou, Marie-France Sagot, Pierre Peterlongo and Vincent Lacroix: KISSPLICE: de-novo calling alternative splicing events from RNA-seq data. (PubMed,eprint) BMC Bioinformatics 13((Suppl 6)):S5 (2012)
Registry entries: SciCrunch  OMICtools  Bioconda 
Topics: RNA-seq; RNA splicing; Gene structure
Kleborate
tool to screen Klebsiella genome assemblies
Versions of package kleborate
ReleaseVersionArchitectures
sid1.0.0-3amd64,arm64,mips64el,ppc64el,s390x
bullseye1.0.0-3amd64,arm64,mips64el,ppc64el,s390x
Popcon: 0 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

Kleborate is a tool to screen Klebsiella genome assemblies for:

  • MLST sequence type
  • species (e.g. K. pneumoniae, K. quasipneumoniae, K. variicola, etc.)
  • ICEKp associated virulence loci: yersiniabactin (ybt), colibactin (clb)
  • virulence plasmid associated loci: salmochelin (iro), aerobactin (iuc), hypermucoidy (rmpA, rmpA2)
  • antimicrobial resistance genes, including quinolone resistance SNPs and colistin resistance truncations
  • K (capsule) and O antigen (LPS) serotype prediction, via wzi alleles and Kaptive
Please cite: Margaret M. C. Lam, Ryan R. Wick, Kelly L. Wyres, Claire L. Gorrie, Louise M. Judd, Adam W. J. Jenney, Sylvain Brisse and Kathryn E. Holt: Genetic diversity, mobilisation and spread of the yersiniabactin-encoding mobile element ICEKp in Klebsiella pneumoniae populations. (PubMed) Microbiology Society 4(9) (2018)
Registry entries: Bioconda 
Kma
mapping genomic sequences to raw reads directly against redundant databases
Versions of package kma
ReleaseVersionArchitectures
sid1.2.26-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.2.26-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 0 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

KMA is mapping a method designed to map raw reads directly against redundant databases, in an ultra-fast manner using seed and extend. KMA is particularly good at aligning high quality reads against highly redundant databases, where unique matches often does not exist. It works for long low quality reads as well, such as those from Nanopore. Non- unique matches are resolved using the "ConClave" sorting scheme, and a consensus sequence are outputtet in addition to other common attributes.

Please cite: Philip T. L. C. Clausen, Frank M. Aarestrup and Ole Lund: Rapid and precise alignment of raw reads against redundant databases with KMA. (PubMed,eprint) BMC Bioinformatics 19:307 (2018)
Registry entries: OMICtools  Bioconda 
Kmer
suite of tools for DNA sequence analysis
Versions of package kmer
ReleaseVersionArchitectures
buster0~20150903+r2013-6all
sid0~20150903+r2013-8all
stretch0~20150903+r2013-3all
bullseye0~20150903+r2013-8all
Popcon: 0 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

The kmer package is a suite of tools for DNA sequence analysis. It provides tools for searching (ESTs, mRNAs, sequencing reads); aligning (ESTs, mRNAs, whole genomes); and a variety of analyses based on kmers.

This is a metapackage depending on the executable components of the kmer suite.

Please cite: B. Walenz and L. Florea: Sim4db and leaff: Utilities for fast batched spliced alignment and sequence indexing. (PubMed) Bioinformatics 27(13):1869-1870 (2011)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Kmerresistance
correlates mapped genes with the predicted species of WGS samples
Versions of package kmerresistance
ReleaseVersionArchitectures
bullseye2.2.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid2.2.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

KmerResistance correlates mapped genes with the predicted species of WGS samples, where this allows for identification of genes in samples which have been poorly sequenced or high accuracy predictions for samples with contamination. KmerResistance has one dependency, namely KMA to perform the mapping, which is also freely available.

Please cite: Philip T. L. C. Clausen, Ea Zankari, Frank M. Aarestrup and Ole Lund: Benchmarking of methods for identification of antimicrobial resistance genes in bacterial whole genome data. (PubMed,eprint) Journal of Antimicrobial Chemotherapy 71(9):2484-8 (2016)
Kraken
assigning taxonomic labels to short DNA sequences
Versions of package kraken
ReleaseVersionArchitectures
buster1.1-3amd64,arm64,armel,armhf,i386,mips64el,ppc64el,s390x
sid1.1.1-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,s390x
bullseye1.1.1-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,s390x
stretch0.10.5~beta-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 1 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts by other bioinformatics software to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.

In its fastest mode of operation, for a simulated metagenome of 100 bp reads, Kraken processed over 4 million reads per minute on a single core, over 900 times faster than Megablast and over 11 times faster than the abundance estimation program MetaPhlAn. Kraken's accuracy is comparable with Megablast, with slightly lower sensitivity and very high precision.

The package is enhanced by the following packages: jellyfish1 multiqc
Please cite: Derrick E Wood and Steven L Salzberg: Kraken: ultrafast metagenomic sequence classification using exact alignments. (PubMed,eprint) Genome Biol. 15(3):R46 (2014)
Registry entries: Bio.tools  OMICtools  Bioconda 
Kraken2
taxonomic classification system using exact k-mer matches
Versions of package kraken2
ReleaseVersionArchitectures
sid2.0.8~beta-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye2.0.8~beta-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
upstream2.0.9~beta
Popcon: 1 users (1 upd.)*
Newer upstream!
License: DFSG free
Git

Kraken 2 is the newest version of Kraken, a taxonomic classification system using exact k-mer matches to achieve high accuracy and fast classification speeds. This classifier matches each k-mer within a query sequence to the lowest common ancestor (LCA) of all genomes containing the given k-mer. The k-mer assignments inform the classification algorithm. [see: Kraken 1's Webpage for more details].

Kraken 2 provides significant improvements to Kraken 1, with faster database build times, smaller database sizes, and faster classification speeds. These improvements were achieved by the following updates to the Kraken classification program:

 1. Storage of Minimizers: Instead of storing/querying entire k-mers,
    Kraken 2 stores minimizers (l-mers) of each k-mer. The length of
    each l-mer must be ≤ the k-mer length. Each k-mer is treated by
    Kraken 2 as if its LCA is the same as its minimizer's LCA.
 2. Introduction of Spaced Seeds: Kraken 2 also uses spaced seeds to
    store and query minimizers to improve classification accuracy.
 3. Database Structure: While Kraken 1 saved an indexed and sorted list
    of k-mer/LCA pairs, Kraken 2 uses a compact hash table. This hash
    table is a probabilistic data structure that allows for faster
    queries and lower memory requirements. However, this data structure
    does have a <1% chance of returning the incorrect LCA or returning
    an LCA for a non-inserted minimizer. Users can compensate for this
    possibility by using Kraken's confidence scoring thresholds.
 4. Protein Databases: Kraken 2 allows for databases built from amino
    acid sequences. When queried, Kraken 2 performs a six-frame
    translated search of the query sequences against the database.
 5. 16S Databases: Kraken 2 also provides support for databases not
    based on NCBI's taxonomy. Currently, these include the 16S
    databases: Greengenes, SILVA, and RDP.
Please cite: Derrick E Wood and Steven L Salzberg: Kraken: ultrafast metagenomic sequence classification using exact alignments. (PubMed,eprint) Genome Biol. 15(3):R46 (2014)
Registry entries: Bio.tools  OMICtools  Bioconda 
Lagan
highly parametrizable pairwise global genome sequence aligner
Versions of package lagan
ReleaseVersionArchitectures
sid2.0-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye2.0-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster2.0-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 7 users (8 upd.)*
Versions and Archs
License: DFSG free
Git

Lagan takes local alignments generated by CHAOS as anchors, and limits the search area of the Needleman-Wunsch algorithm around these anchors.

Multi-LAGAN is a generalization of the pairwise algorithm to multiple sequence alignment. M-LAGAN performs progressive pairwise alignments, guided by a user-specified phylogenetic tree. Alignments are aligned to other alignments using the sum-of-pairs metric.

Please cite: Michael Brudno, Chuong Do, Gregory Cooper, Michael F. Kim, Eugene Davydov, Eric D. Green, Arend Sidow and Serafim Batzoglou: LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. (PubMed,eprint) Genome Research 13(4):721-31 (2003)
Registry entries: Bio.tools  SciCrunch  OMICtools 
Lamarc
Likelihood Analysis with Metropolis Algorithm using Random Coalescence
Versions of package lamarc
ReleaseVersionArchitectures
sid2.1.10.1+dfsg-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster2.1.10.1+dfsg-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye2.1.10.1+dfsg-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

LAMARC is a program which estimates population-genetic parameters such as population size, population growth rate, recombination rate, and migration rates. It approximates a summation over all possible genealogies that could explain the observed sample, which may be sequence, SNP, microsatellite, or electrophoretic data. LAMARC and its sister program Migrate are successor programs to the older programs Coalesce, Fluctuate, and Recombine, which are no longer being supported. The programs are memory-intensive but can run effectively on workstations.

Please cite: Mary K. Kuhner: Coalescent genealogy samplers: windows into population history. (PubMed) Trends in Ecology & Evolution 24(2):86-93 (2009)
Registry entries: SciCrunch  OMICtools 
Lambda-align
Local Aligner for Massive Biological DatA
Versions of package lambda-align
ReleaseVersionArchitectures
sid1.0.3-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.0.1-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster1.0.3-5amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.0.3-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

Lambda is a local biosequence aligner optimized for many query sequences and searches in protein space. It is compatible to the de facto standard tool BLAST, but often outperforms the best currently available alternatives at reproducing BLAST’s results and is the fastest compared with the current state of the art at comparable levels of sensitivity.

Please cite: Hannes Hauswedell, Jochen Singer and Knut Reinert: Lambda: the local aligner for massive biological data. (PubMed,eprint) Bioinformatics 30(17):i349-i355 (2014)
Registry entries: OMICtools  Bioconda 
Lambda-align2
Local Aligner for Massive Biological DatA - v2
Versions of package lambda-align2
ReleaseVersionArchitectures
bullseye2.0.0-7amd64
sid2.0.0-7amd64
buster2.0.0-6amd64
Popcon: 2 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

Lambda2 is a local biosequence aligner optimized for many query sequences and searches in protein space. It is compatible to the de facto standard tool BLAST, but often outperforms the best currently available alternatives at reproducing BLAST’s results and is the fastest compared with the current state of the art at comparable levels of sensitivity.

This package is for the Lambda (align) v2.x series which has an incompatible command line interface and on disk format from Lambda (align) v1.x.

Please cite: Hannes Hauswedell, Jochen Singer and Knut Reinert: Lambda: the local aligner for massive biological data. (PubMed,eprint) Bioinformatics 30(17):i349-i355 (2014)
Registry entries: OMICtools  Bioconda 
Lastz
pairwise aligning DNA sequences
Versions of package lastz
ReleaseVersionArchitectures
sid1.04.03-2amd64,i386,mips64el,mipsel
bullseye1.04.03-2amd64,i386,mips64el,mipsel
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

LASTZ is a drop-in replacement for BLASTZ, and is backward compatible with BLASTZ’s command-line syntax. That is, it supports all of BLASTZ’s options but also has additional ones, and may produce slightly different alignment results.

Registry entries: Bioconda 
Leaff
biological sequence library utilities and applications
Versions of package leaff
ReleaseVersionArchitectures
stretch0~20150903+r2013-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid0~20150903+r2013-8amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye0~20150903+r2013-8amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster0~20150903+r2013-6amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 3 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

LEAFF (Let's Extract Anything From Fasta) is a utility program for working with multi-fasta files. In addition to providing random access to the base level, it includes several analysis functions.

This package is part of the Kmer suite.

Please cite: B. Walenz and L. Florea: Sim4db and leaff: Utilities for fast batched spliced alignment and sequence indexing. (PubMed) Bioinformatics 27(13):1869-1870 (2011)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Lefse
determine features of organisms, clades, taxonomic units, genes
Versions of package lefse
ReleaseVersionArchitectures
buster1.0.8-2all
bullseye1.0.8-3all
sid1.0.8-3all
stretch1.0+20160802-1all
Popcon: 1 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

LEfSe (Linear discriminant analysis Effect Size) determines the features (organisms, clades, operational taxonomic units, genes, or functions) most likely to explain differences between classes by coupling standard tests for statistical significance with additional tests encoding biological consistency and effect relevance.

Registry entries: SciCrunch  OMICtools  Bioconda 
Librg-utils-perl
parsers and format conversion utilities used by (e.g.) profphd
Versions of package librg-utils-perl
ReleaseVersionArchitectures
sid1.0.43-6all
wheezy1.0.43-1all
jessie1.0.43-2all
stretch1.0.43-4all
buster1.0.43-6all
bullseye1.0.43-6all
Debtags of package librg-utils-perl:
devellang:perl, library
Popcon: 6 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

This package contributes to the PredictProtein server for the automated structural annotation of protein sequences. It features as series of conversion tools like:

  • blast2saf.pl
  • blastpgp_to_saf.pl
  • conv_hssp2saf.pl
  • copf.pl
  • hssp_filter.pl
  • safFilterRed.pl

which are supported by the modules:

  • RG:Utils::Conv_hssp2saf
  • RG:Utils::Copf
  • RG:Utils::Hssp_filter
Ltrsift
postprocessing and classification of LTR retrotransposons
Versions of package ltrsift
ReleaseVersionArchitectures
stretch1.0.2-7amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie1.0.2-1amd64,armel,armhf,i386
buster1.0.2-8amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.0.2-9amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.0.2-9amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Debtags of package ltrsift:
uitoolkitgtk
Popcon: 3 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

LTRsift is a graphical desktop tool for semi-automatic postprocessing of de novo predicted LTR retrotransposon annotations, such as the ones generated by LTRharvest and LTRdigest. Its user-friendly interface displays LTR retrotransposon candidates, their putative families and their internal structure in a hierarchical fashion, allowing the user to "sift" through the sometimes large results of de novo prediction software. It also offers customizable filtering and classification functionality.

Registry entries: OMICtools 
Topics: Mobile genetic elements
Screenshots of package ltrsift
Lucy
DNA sequence quality and vector trimming tool
Versions of package lucy
ReleaseVersionArchitectures
buster1.20-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.20-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.20-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

Lucy is a utility that prepares raw DNA sequence fragments for sequence assembly, possibly using the TIGR Assembler. The cleanup process includes quality assessment, confidence reassurance, vector trimming and vector removal. The primary advantage of Lucy over other similar utilities is that it is a fully integrated, stand alone program.

Lucy was designed and written at The Institute for Genomic Research (TIGR, now the J. Craig Venter Institute), and it has been used here for several years to clean sequence data from automated DNA sequencers prior to sequence assembly and other downstream uses. The quality trimming portion of lucy makes use of phred quality scores, such as those produced by many automated sequencers based on the Sanger sequencing method. As such, lucy’s quality trimming may not be appropriate for sequence data produced by some of the new “next-generation” sequencers.

Registry entries: Bio.tools  OMICtools 
Macs
Model-based Analysis of ChIP-Seq on short reads sequencers
Versions of package macs
ReleaseVersionArchitectures
jessie2.0.9.1-1amd64,armel,armhf,i386
buster2.1.2.1-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye2.2.7.1-2amd64,arm64,armel,armhf,i386,ppc64el,s390x
stretch2.1.1.20160309-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid2.2.7.1-2amd64,arm64,armel,armhf,i386,ppc64el,s390x
Popcon: 9 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, is publicly available open source, and can be used for ChIP-Seq with or without control samples.

Please cite: Yong Zhang, Tao Liu, Clifford A Meyer, Jérôme Eeckhoute, David S. Johnson, Bradley E. Bernstein, Chad Nussbaum, Richard M. Myers, Myles Brown, Wei Li and X Shirley Liu: Model-based Analysis of ChIP-Seq (MACS). (PubMed,eprint) Genome Biol. 9(9):R137 (2008)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Macsyfinder
detection of macromolecular systems in protein datasets
Versions of package macsyfinder
ReleaseVersionArchitectures
stretch1.0.2-3all
buster1.0.5-2all
sid1.0.5-3all
Popcon: 8 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

MacSyFinder is a program to model and detect macromolecular systems, genetic pathways... in protein datasets. In prokaryotes, these systems have often evolutionarily conserved properties: they are made of conserved components, and are encoded in compact loci (conserved genetic architecture). The user models these systems with MacSyFinder to reflect these conserved features, and to allow their efficient detection

This package presents the Open Source Java API to biological databases and a series of mostly sequence-based algorithms.

Please cite: Sophie S. Abby, Bertrand Néron, Hervé Ménager, Marie Touchon and Eduardo P. C. Rocha: MacSyFinder: A Program to Mine Genomes for Molecular Systems with an Application to CRISPR-Cas System. (PubMed,eprint) PLOS ONE 9(10):e110726 (2014)
Registry entries: Bio.tools  OMICtools 
Maffilter
process genome alignment in the Multiple Alignment Format
Versions of package maffilter
ReleaseVersionArchitectures
stretch1.1.0-1+dfsg-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.3.1+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.3.1+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.3.1+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 3 users (7 upd.)*
Versions and Archs
License: DFSG free
Git

MafFilter applies a series of "filters" to a MAF file, in order to clean it, extract data and computer statistics while keeping track of the associated meta-data such as genome coordinates and quality scores.

  • It can process the alignment to remove low-quality / ambiguous / masked regions.
  • It can export data into a single or multiple alignment file in format such as Fasta or Clustal.
  • It can read annotation data in GFF or GTF format, and extract the corresponding alignment.
  • It can perform sliding windows calculations.
  • It can reconstruct phylogeny/genealogy along the genome alignment.
  • It can compute population genetics statistics, such as site frequency spectrum, number of fixed/polymorphic sites, etc.
The package is enhanced by the following packages: maffilter-examples
Please cite: Julien Y Dutheil, Sylvain Gaillard and Eva H Stukenbrock: MafFilter: a highly flexible and extensible multiple genome alignment files processor. (PubMed,eprint) BMC Genomics 15:53 (2014)
Registry entries: OMICtools 
Mapdamage
tracking and quantifying damage patterns in ancient DNA sequences
Versions of package mapdamage
ReleaseVersionArchitectures
bullseye2.2.0+dfsg-1all
stretch2.0.6+dfsg-2all
buster2.0.9+dfsg-1all
sid2.2.1+dfsg-1all
Popcon: 11 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

MapDamage is a computational framework written in Python and R, which tracks and quantifies DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.

MapDamage is developed at the Centre for GeoGenetics by the Orlando Group.

Please cite: Hákon Jónsson, Aurélien Ginolhac, Mikkel Schubert and Philip Johnson and Ludovic Orlando: mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. (PubMed,eprint) Bioinformatics 29(13):1682-4 (2013)
Registry entries: SciCrunch  OMICtools  Bioconda 
Mapsembler2
bioinformatics targeted assembly software
Versions of package mapsembler2
ReleaseVersionArchitectures
bullseye2.2.4+dfsg-4amd64,arm64,armel,armhf,i386,ppc64el,s390x
sid2.2.4+dfsg-4amd64,arm64,armel,armhf,i386,ppc64el,s390x
stretch2.2.3+dfsg-3amd64,arm64,armel,armhf,i386,ppc64el,s390x
buster2.2.4+dfsg-3amd64,arm64,armel,armhf,i386,ppc64el,s390x
jessie2.1.6+dfsg-1amd64,armel,armhf,i386
Popcon: 4 users (8 upd.)*
Versions and Archs
License: DFSG free
Git

Mapsembler2 is a targeted assembly software. It takes as input a set of NGS raw reads (fasta or fastq, gzipped or not) and a set of input sequences (starters).

It first determines if each starter is read-coherent, e.g. whether reads confirm the presence of each starter in the original sequence. Then for each read-coherent starter, Mapsembler2 outputs its sequence neighborhood as a linear sequence or as a graph, depending on the user choice.

Mapsembler2 may be used for (not limited to):

  • Validate an assembled sequence (input as starter), e.g. from a de Bruijn graph assembly where read-coherence was not enforced.
  • Checks if a gene (input as starter) has an homolog in a set of reads
  • Checks if a known enzyme is present in a metagenomic NGS read set.
  • Enrich unmappable reads by extending them, possibly making them mappable
  • Checks what happens at the extremities of a contig
  • Remove contaminants or symbiont reads from a read set
Please cite: Pierre Peterlongo and Rayan Chikhi: Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer. (PubMed) BMC Bioinformatics 13:48 (2012)
Registry entries: Bio.tools  OMICtools  Bioconda 
Mash
fast genome and metagenome distance estimation using MinHash
Versions of package mash
ReleaseVersionArchitectures
bullseye2.2.2+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid2.2.2+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.1.1-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster2.1+dfsg-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

Mash uses MinHash locality-sensitive hashing to reduce large biosequences to a representative sketch and rapidly estimate pairwise distances between genomes or metagenomes. Mash sketch databases effectively delineate known species boundaries, allow construction of approximate phylogenies, and can be searched in seconds using assembled genomes or raw sequencing runs from Illumina, Pacific Biosciences, and Oxford Nanopore. For metagenomics, Mash scales to thousands of samples and can replicate Human Microbiome Project and Global Ocean Survey results in a fraction of the time.

Please cite: Brian D. Ondovi, Todd J. Treangen, Páll Melsted, Adam B. Mallonee, Nicholas H. Bergman, Sergey Koren and Adam M. Phillippy: Mash: fast genome and metagenome distance estimation using MinHash. (PubMed,eprint) Genome Biology 17:132 (2016)
Registry entries: OMICtools  Bioconda 
Mauve-aligner
multiple genome alignment
Versions of package mauve-aligner
ReleaseVersionArchitectures
stretch2.4.0+4734-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster2.4.0+4736-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye2.4.0+4736-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid2.4.0+4736-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Mauve is a system for efficiently constructing multiple genome alignments in the presence of large-scale evolutionary events such as rearrangement and inversion. Multiple genome alignment provides a basis for research into comparative genomics and the study of evolutionary dynamics. Aligning whole genomes is a fundamentally different problem than aligning short sequences.

Mauve has been developed with the idea that a multiple genome aligner should require only modest computational resources. It employs algorithmic techniques that scale well in the amount of sequence being aligned. For example, a pair of Y. pestis genomes can be aligned in under a minute, while a group of 9 divergent Enterobacterial genomes can be aligned in a few hours.

Mauve computes and interactively visualizes genome sequence comparisons. Using FastA or GenBank sequence data, Mauve constructs multiple genome alignments that identify large-scale rearrangement, gene gain, gene loss, indels, and nucleotide substutition.

Mauve is developed at the University of Wisconsin.

The package is enhanced by the following packages: progressivemauve
Please cite: Aaron C. E. Darling, Bob Mau, Frederick R. Blattner and Nicole T. Perna: Mauve: multiple alignment of conserved genomic sequence with rearrangements. (PubMed,eprint) Genome research 14(7):1394-1403 (2004)
Registry entries: Bio.tools  SciCrunch  OMICtools 
Meryl
in- and out-of-core kmer counting and utilities
Versions of package meryl
ReleaseVersionArchitectures
bullseye0~20150903+r2013-8amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch0~20150903+r2013-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster0~20150903+r2013-6amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid0~20150903+r2013-8amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 3 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

meryl computes the kmer content of genomic sequences. Kmer content is represented as a list of kmers and the number of times each occurs in the input sequences. The kmer can be restricted to only the forward kmer, only the reverse kmer, or the canonical kmer (lexicographically smaller of the forward and reverse kmer at each location). Meryl can report the histogram of counts, the list of kmers and their counts, or can perform mathematical and set operations on the processed data files.

This package is part of the Kmer suite.

Please cite: B. Walenz and L. Florea: Sim4db and leaff: Utilities for fast batched spliced alignment and sequence indexing. (PubMed) Bioinformatics 27(13):1869-1870 (2011)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Metaphlan2
Metagenomic Phylogenetic Analysis
Versions of package metaphlan2
ReleaseVersionArchitectures
buster2.7.8-1all
bullseye2.9.22-1all
stretch2.6.0+ds-2all
sid2.9.22-1all
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data with species level resolution. From version 2.0, MetaPhlAn is also able to identify specific strains (in the not-so-frequent cases in which the sample contains a previously sequenced strains) and to track strains across samples for all species.

MetaPhlAn 2.0 relies on ~1M unique clade-specific marker genes (the marker information file can be found at usr/share/metaphlan2/utils/markers_info.txt.bz2) identified from ~17,000 reference genomes (~13,500 bacterial and archaeal, ~3,500 viral, and ~110 eukaryotic), allowing:

  • unambiguous taxonomic assignments;
  • accurate estimation of organismal relative abundance;
  • species-level resolution for bacteria, archaea, eukaryotes and viruses;
  • strain identification and tracking
  • orders of magnitude speedups compared to existing methods.
  • metagenomic strain-level population genomics
Please cite: Duy Tin Truong, Eric A Franzosa, Timothy L Tickle, Matthias Scholz, George Weingart, Edoardo Pasolli, Adrian Tett, Curtis Huttenhower and Nicola Segata: MetaPhlAn2 for enhanced metagenomic taxonomic profiling. (PubMed) Nature Methods 12(10):902–903 (2015)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Metastudent
predictor of Gene Ontology terms from protein sequence
Versions of package metastudent
ReleaseVersionArchitectures
bullseye2.0.1-8all
stretch2.0.1-4all
jessie1.0.11-2all
sid2.0.1-8all
buster2.0.1-6all
Popcon: 16 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

Often, only the sequence of a protein is known, but not its functions. Metastudent will try to predict missing functional annotations through homology searches (BLAST).

All predicted functions correspond to Gene Ontology (GO) terms from the Molecular Function (MFO), the Biological Process (BPO) and the Cellular Component Ontology (CCO) and are associated with a reliability score.

Please cite: Tobias Hamp, Rebecca Kassner, Stefan Seemayer, Esmeralda Vicedo, Christian Schaefer, Dominik Achten, Florian Auer, Ariane Boehm, Tatjana Braun, Maximilian Hecht, Mark Heron, Peter Hönigschmid, Thomas A. Hopf, Stefanie Kaufmann, Michael Kiening, Denis Krompass, Cedric Landerer, Yannick Mahlich, Manfred Roos and Burkhard Rost: Homology-based inference sets the bar high for protein function prediction.. (PubMed) BMC Bioinformatics 14(Suppl 3):S7 (2013)
Registry entries: Bio.tools  OMICtools 
Microbegps
explorative taxonomic profiling tool for metagenomic data
Versions of package microbegps
ReleaseVersionArchitectures
bullseye1.0.0-5all
stretch1.0.0-2all
buster1.0.0-3all
sid1.0.0-5all
Popcon: 4 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

MicrobeGPS is a bioinformatics tool for the analysis of metagenomic sequencing data. The goal is to profile the composition of metagenomic communities as accurately as possible and present the results to the user in a convenient manner. One main focus is reliability: the tool calculates quality metrics for the estimated candidates and allows the user to identify false candidates easily.

Please cite: Martin S. Lindner and Bernhard Y. Renard: Metagenomic Profiling of Known and Unknown Microbes with MicrobeGPS. (PubMed,eprint) PLoS One 10(2):e0117711 (2015)
Registry entries: Bio.tools  OMICtools 
Mindthegap
performs detection and assembly of DNA insertion variants in NGS read datasets
Versions of package mindthegap
ReleaseVersionArchitectures
sid2.2.2-2amd64,arm64,i386,mips64el,ppc64el,s390x
bullseye2.2.2-2amd64,arm64,i386,mips64el,ppc64el,s390x
Popcon: users ( upd.)*
Versions and Archs
License: DFSG free
Git

Designed to call insertions of any size, whether they are novel or duplicated, homozygous or heterozygous in the donor genome. It takes as input a set of reads and a reference genome. It outputs two sets of FASTA sequences: one is the set of breakpoints of detection insertion sites, the other is the set of assembled insertions for each breakpoint. MindTheGap can also be used as a genome assembly finishing tool. It can fill the gaps between a set of input contigs without any a priori on their relative order and orientation. It outputs the results in gfa file.

Registry entries: Bio.tools  OMICtools  Bioconda 
Miniasm
ultrafast de novo assembler for long noisy DNA sequencing reads
Versions of package miniasm
ReleaseVersionArchitectures
buster0.3+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch0.2+dfsg-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid0.3+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye0.3+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 3 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

Miniasm is an experimental very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final unitig sequences. Thus the per-base error rate is similar to the raw input reads.

Registry entries: OMICtools  Bioconda 
Topics: Sequence assembly
Minimac4
Fast Imputation Based on State Space Reduction HMM
Versions of package minimac4
ReleaseVersionArchitectures
bullseye1.0.2-2amd64
buster1.0.0-2amd64
sid1.0.2-2amd64
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Minimac4 is a lower memory and more computationally efficient implementation of "minimac2/3". It is an algorithm for genotypic imputation that works on phased genotypes (say from MaCH).

Minimac4 is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy. This algorithm analyzes only the unique sets of haplotypes in small genomic segments, thereby saving on time-complexity, computational memory but no loss in degree of accuracy.

Please cite: Sayantan Das, Lukas Forer, Sebastian Schönherr, Carlo Sidore, Adam E Locke, Alan Kwong, Scott I Vrieze, Emily Y Chew, Shawn Levy, Matt McGue, David Schlessinger, Dwight Stambolian, Po-Ru Loh, William G Iacono, Anand Swaroop, Laura J Scott, Francesco Cucca, Florian Kronenberg, Michael Boehnke, Gonçalo R Abecasis and Christian Fuchsberger: Next-generation genotype imputation service and methods. Nature Genetics 48(10):1284-1287 (2016)
Registry entries: Bio.tools  SciCrunch  OMICtools 
Minimap
tool for approximate mapping of long biosequences such as DNA reads
Versions of package minimap
ReleaseVersionArchitectures
bullseye0.2-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster0.2-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch0.2-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid0.2-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 3 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

Minimap is an experimental tool to efficiently find multiple approximate mapping positions between two sets of long biological sequences, such as between DNA reads and reference genomes, between genomes and between long noisy reads. Minimap does not generate alignments as of now and because of this, it is usually tens of times faster than mainstream aligners. It does not replace mainstream aligners, but it can be useful when you want to quickly identify long approximate matches at moderate divergence among a huge collection of sequences. For this task, it is much faster than most existing tools.

Please cite: Heng Li: Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. (eprint) Bioinformatics :2103-2110 (2016)
Registry entries: OMICtools  Bioconda 
Topics: Mapping
Minimap2
versatile pairwise aligner for genomic and spliced nucleotide sequences
Versions of package minimap2
ReleaseVersionArchitectures
bullseye2.17+dfsg-9amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster2.15+dfsg-1amd64,i386
sid2.17+dfsg-9amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 6 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database. Typical use cases include: (1) mapping PacBio or Oxford Nanopore genomic reads to the human genome; (2) finding overlaps between long reads with error rate up to ~15%; (3) splice-aware alignment of PacBio Iso-Seq or Nanopore cDNA or Direct RNA reads against a reference genome; (4) aligning Illumina single- or paired-end reads; (5) assembly-to-assembly alignment; (6) full- genome alignment between two closely related species with divergence below ~15%.

For ~10kb noisy reads sequences, minimap2 is tens of times faster than mainstream long-read mappers such as BLASR, BWA-MEM, NGMLR and GMAP. It is more accurate on simulated long reads and produces biologically meaningful alignment ready for downstream analyses. For >100bp Illumina short reads, minimap2 is three times as fast as BWA-MEM and Bowtie2, and as accurate on simulated data. Detailed evaluations are available from the minimap2 paper or the preprint.

Please cite: Heng Li: Minimap2: pairwise alignment for nucleotide sequences. (PubMed,eprint) Bioinformatics :2103-2110 (2018)
Registry entries: OMICtools  Bioconda 
Mirtop
annotate miRNAs with a standard mirna/isomir naming
Versions of package mirtop
ReleaseVersionArchitectures
bullseye0.4.23-1all
sid0.4.23-1all
Popcon: users ( upd.)*
Versions and Archs
License: DFSG free
Git

The main goal of this project is to create a reflection group on metazoan microRNAs (miRNAs), open to all interested researchers, to identify blockages and develop standards and guidelines to improve miRNA research, resources and communication. This can go through the use of standardized file formats, gene and variants nomenclature guidelines, and advancements in miRNA biology understanding. The group will eventually also aim at expanding its breadth to the development of novel tools, data resources, and best-practices guidelines to benefit the scientific community by providing high confidence validated research and analysis strategies, regardless the expertise in this field. This package provides the command line interface to mirtop.

The package is enhanced by the following packages: multiqc
Please cite: Thomas Desvignes, Karen Eilbeck, Ioannis S. Vlachos, Bastian Fromm, Yin Lu, Marc K. Halushka, Michael Hackenberg, Gianvito Urgese, Elisa Ficarra, Shruthi Bandyadka, Jason Sydes, Peter Batzel, John H. Postlethwait, Phillipe Loher, Eric Londin, Aristeidis G. Telonis, Isidore Rigoutsos and Lorena Pantano Rubino: miRTOP: An open source community project for the development of a unified format file for miRNA data [version 1; not peer reviewed]. (eprint) F1000Research 7(ISCB Comm. J.):953 (Slides) (2018)
Registry entries: Bio.tools  OMICtools  Bioconda 
Mmseqs2
ultra fast and sensitive protein search and clustering
Versions of package mmseqs2
ReleaseVersionArchitectures
bullseye11-e1a1c+ds-3amd64,arm64,armel,armhf,mips64el,mipsel,ppc64el,s390x
sid11-e1a1c+ds-3amd64,arm64,armel,armhf,mips64el,mipsel,ppc64el,s390x
Popcon: 0 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge proteins/nucleotide sequence sets. MMseqs2 is open source GPL-licensed software implemented in C++ for Linux, MacOS, and (as beta version, via cygwin) Windows. The software is designed to run on multiple cores and servers and exhibits very good scalability. MMseqs2 can run 10000 times faster than BLAST. At 100 times its speed it achieves almost the same sensitivity. It can perform profile searches with the same sensitivity as PSI-BLAST at over 400 times its speed.

Please cite: Martin Steinegger and Johannes Söding: Clustering huge protein sequence sets in linear time. Nature Communications 9(1) (2018)
Registry entries: Bio.tools  OMICtools  Bioconda 
Mptp
single-locus species delimitation
Versions of package mptp
ReleaseVersionArchitectures
buster0.2.4-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye0.2.4-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid0.2.4-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

Implementation of a fast species delimitation method, based on PTP (Zhang et al. 2013) with a 64-bit multi-threaded design that handles very large datasets.

The tool mPTP can handle very large biodiversity datasets. It implements a fast method to compute the ML delimitation from an inferred phylogenetic tree of the samples. Using MCMC, it also computes the support values for each clade, which can be used to assess the confidence of the ML delimitation.

ML delimitation mPTP implements two flavours of the point-estimate solution. First, it implements the original method from (Zhang et al. 2013) where all within-species processes are modelled with a single exponential distribution. mPTP uses a dynamic programming implementation which estimates the ML delimitation faster and more accurately than the original PTP. The dynamic programming implementation has similar properties as (Gulek et al. 2010). See the wiki for more information. The second method assumes a distinct exponential distribution for the branching events of each of the delimited species allowing it to fit to a wider range of empirical datasets.

MCMC method mPTP generates support values for each clades. They represent the ratio of the number of samples for which a particular node was in the between-species process, to the total number of samples.

Please cite: Paschalia Kapli, Sarah Lutteropp, Jiajie Zhang, Kassian Kobert, Pavlos Pavlidis, Alexandros Stamatakis and Tomas Flouri: Multi-rate Poisson Tree Processes for single-locus species delimitation under Maximum Likelihood and Markov Chain Monte Carlo. (PubMed,eprint) bioRxiv (2016)
Registry entries: OMICtools 
Multiqc
output integration for RNA sequencing across tools and samples
Versions of package multiqc
ReleaseVersionArchitectures
bullseye1.9+dfsg-2all
sid1.9+dfsg-2all
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

The sequencing of DNA or RNA with current high-throughput technologies involves an array of tools and these are applied over a range of samples. It is easy to loose oversight. And gathering the data and forwarding them in a readable manner to the individuals who took the samples is a challenge for a tool in itself. Well. Here it is. MultiQC aggregates the output of multiple tools into a single report.

Reports are generated by scanning given directories for recognised log files. These are parsed and a single HTML report is generated summarising the statistics for all logs found. MultiQC reports can describe multiple analysis steps and large numbers of samples within a single plot, and multiple analysis tools making it ideal for routine fast quality control.

Please cite: Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller: MultiQC: summarize analysis results for multiple tools and samples in a single report. (PubMed,eprint) Bioinformatics 31(19):3047-8 (2016)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Murasaki
homology detection tool across multiple large genomes
Versions of package murasaki
ReleaseVersionArchitectures
bullseye1.68.6-9amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.68.6-6amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster1.68.6-8amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.68.6-9amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

Murasaki is a scalable and fast, language theory-based homology detection tool across multiple large genomes. It enable whole-genome scale multiple genome global alignments. Supports unlimited length gapped-seed patterns and unique TF-IDF based filtering.

Murasaki is an anchor alignment software, which is

  • exteremely fast (17 CPU hours for whole Human x Mouse genome (with 40 nodes: 52 wall minutes))
  • scalable (Arbitrarily parallelizable across multiple nodes using MPI. Even a single node with 16GB of ram can handle over 1Gbp of sequence.)
  • unlimited pattern length
  • repeat tolerant
  • intelligent noise reduction
Please cite: Kris Popendorf, Hachiya Tsuyoshi, Yasunori Osana and Yasubumi Sakakibara: Murasaki: A Fast, Parallelizable Algorithm to Find Anchors from Multiple Genomes. (PubMed,eprint) PLOS one 5(9):e12651 (2010)
Registry entries: OMICtools 
Murasaki-mpi
homology detection tool across multiple large genomes (MPI-version)
Versions of package murasaki-mpi
ReleaseVersionArchitectures
stretch1.68.6-6amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster1.68.6-8amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.68.6-9amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.68.6-9amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 0 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

Murasaki is a scalable and fast, language theory-based homology detection tool across multiple large genomes. It enable whole-genome scale multiple genome global alignments. Supports unlimited length gapped-seed patterns and unique TF-IDF based filtering.

Murasaki is an anchor alignment software, which is

  • exteremely fast (17 CPU hours for whole Human x Mouse genome (with 40 nodes: 52 wall minutes))
  • scalable (Arbitrarily parallelizable across multiple nodes using MPI. Even a single node with 16GB of ram can handle over 1Gbp of sequence.)
  • unlimited pattern length
  • repeat tolerant
  • intelligent noise reduction

This package provides the MPI-enabled binary for murasaki. While this will speed up operation on multi-processor machines it will slow down on a single processor.

Please cite: Kris Popendorf, Hachiya Tsuyoshi, Yasunori Osana and Yasubumi Sakakibara: Murasaki: A Fast, Parallelizable Algorithm to Find Anchors from Multiple Genomes. (PubMed,eprint) PLOS one 5(9):e12651 (2010)
Registry entries: OMICtools 
Nanofilt
filtering and trimming of long read sequencing data
Versions of package nanofilt
ReleaseVersionArchitectures
sid2.6.0-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye2.6.0-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: users ( upd.)*
Versions and Archs
License: DFSG free
Git

Filtering and trimming of long read sequencing data. Filtering on quality and/or read length, and optional trimming after passing filters. Reads from stdin, writes to stdout. Optionally reads directly from an uncompressed file specified on the command line.

Intended to be used:

 1. directly after fastq extraction.
 2. prior to mapping.
 3. in a stream between extraction and mapping.
Please cite: Wouter De Coster, Svenn D'Hert, Darrin T. Schultz, Christine Van Broeckhoven: NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 34 (2018)
Registry entries: Bioconda 
Nanook
pre- and post-alignment analysis of nanopore sequencing data
Versions of package nanook
ReleaseVersionArchitectures
bullseye1.33+dfsg-1all
sid1.33+dfsg-1all
buster1.33+dfsg-1all
Popcon: 1 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

NanoOK is a flexible, multi-reference software for pre- and post- alignment analysis of nanopore sequencing data, quality and error profiles.

NanoOK (pronounced na-nook) is a tool for extraction, alignment and analysis of Nanopore reads. NanoOK will extract reads as FASTA or FASTQ files, align them (with a choice of alignment tools), then generate a comprehensive multi-page PDF report containing yield, accuracy and quality analysis. Along the way, it generates plain text files which can be used for further analysis, as well as graphs suitable for inclusion in presentations and papers.

Please cite: Richard M. Leggett, Darren Heavens, Mario Caccamo, Matthew D. Clark and Robert P. Davey: NanoOK: multi-reference alignment analysis of nanopore sequencing data, quality and error profiles. (PubMed,eprint) Bioinformatics 32(1):142-144 (2016)
Registry entries: Bio.tools  OMICtools 
Nanopolish
consensus caller for nanopore sequencing data
Versions of package nanopolish
ReleaseVersionArchitectures
sid0.13.2-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,s390x
stretch0.5.0-1amd64,arm64,armel,i386,mips64el,mipsel,ppc64el
buster0.11.0-2amd64
bullseye0.13.2-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,s390x
Popcon: 3 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

Nanopolish uses a signal-level hidden Markov model for consensus calling of nanopore genome sequencing data. It can perform signal-level analysis of Oxford Nanopore sequencing data. Nanopolish can calculate an improved consensus sequence for a draft genome assembly, detect base modifications, call SNPs and indels with respect to a reference genome and more.

Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Nanosv
structural variant caller for nanopore data
Versions of package nanosv
ReleaseVersionArchitectures
bullseye1.2.4+git20190409.c1ae30c-2all
sid1.2.4+git20190409.c1ae30c-2all
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

NanoSV is a software package that can be used to identify structural genomic variations in long-read sequencing data, such as data produced by Oxford Nanopore Technologies’ MinION, GridION or PromethION instruments, or Pacific Biosciences RSII or Sequel sequencers. NanoSV has been extensively tested using Oxford Nanopore MinION sequencing data.

Please cite: Mircea Cretu Stancu, Markus J. van Roosmalen, Ivo Renkens, Marleen M. Nieboer, Sjors Middelkamp, Joep de Ligt, Giulia Pregno, Daniela Giachino, Giorgia Mandrile, Jose Espejo Valle-Inclan, Jerome Korzelius, Ewart de Bruijn, Edwin Cuppen, Michael E. Talkowski, Tobias Marschall, Jeroen de Ridder and Wigard P. Kloosterman: Mapping and phasing of structural variation in patient genomes using nanopore sequencing.. (eprint) Nature Communications 8:1326 (2017)
Registry entries: OMICtools  Bioconda 
Ncbi-acc-download
download genome files from NCBI by accession
Versions of package ncbi-acc-download
ReleaseVersionArchitectures
bullseye0.2.6-2all
sid0.2.6-2all
Popcon: 0 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

This package provides a script to download sequences from GenBank/RefSeq by accession through the NCBI ENTREZ API.

Registry entries: Bioconda 
Ncbi-entrez-direct
NCBI Entrez utilities on the command line
Versions of package ncbi-entrez-direct
ReleaseVersionArchitectures
buster10.9.20190219+ds-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye12.0.20190816+ds-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid13.7.20200615+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch6.10.20170123+ds-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 6 users (6 upd.)*
Versions and Archs
License: DFSG free
Git

Entrez Direct (EDirect) is an advanced method for accessing NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a terminal window or script. Functions take search terms from command-line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.

EDirect also provides an argument-driven function that simplifies the extraction of data from document summaries or other results that are returned in structured XML format. This can eliminate the need for writing custom software to answer ad hoc questions. Queries can move seamlessly between EDirect commands and UNIX utilities or scripts to perform actions that cannot be accomplished entirely within Entrez.

Ncbi-seg
tool to mask segments of low compositional complexity in amino acid sequences
Versions of package ncbi-seg
ReleaseVersionArchitectures
jessie0.0.20000620-2amd64,armel,armhf,i386
bullseye0.0.20000620-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid0.0.20000620-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch0.0.20000620-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster0.0.20000620-5amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 4 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

ncbi-seg (a.k.a. SEG) is a program for identifying and masking segments of low compositional complexity in amino acid sequences.

ncbi-seg divides sequences into contrasting segments of low-complexity and high-complexity. Low-complexity segments defined by the algorithm represent "simple sequences" or "compositionally-biased regions".

This program is inappropriate for masking nucleotide sequences and, in fact, may strip some nucleotide ambiguity codes from nt. sequences as they are being read.

Please cite: John C. Wootton and Scott Federhen: Statistics of local complexity in amino acid sequences and sequence databases.. Computers & Chemistry 17:149-163 (1993)
Obitools
programs to analyze NGS data in a DNA metabarcoding context
Versions of package obitools
ReleaseVersionArchitectures
buster1.2.12+dfsg-2amd64
sid1.2.13+dfsg-3amd64
bullseye1.2.13+dfsg-3amd64
Popcon: 6 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

The OBITools programs aims to help you to manipulate various data and sequence files in a convenient way using the Unix command line interface. They follow the standard Unix interface for command line program, allowing to chain a set of commands using the pipe mechanism.

Please cite: Frédéric Boyer, Céline Mercier, Aurélie Bonin, Yvan Le Bras, Pierre Taberlet and Eric Coissac: obitools: a unix-inspired software package for DNA metabarcoding.. (PubMed,eprint) Mol. Ecol. Resour. 16(1):176-182 (2016)
Registry entries: Bio.tools  OMICtools  Bioconda 
Optimir
Integrating genetic variations in miRNA alignment
Versions of package optimir
ReleaseVersionArchitectures
sid1.0-2all
bullseye1.0-2all
Popcon: 1 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

OptimiR is a miRSeq data alignment workflow. It integrates genetic information to assess the impact of variants on miRNA expression.

OptimiR: A bioinformatics pipeline designed to detect and quantify miRNAs, isomiRs and polymiRs from miRSeq data, & study the impact of genetic variations on polymiRs' expression.

Please cite: Florian Thibord, Claire Perret, Maguelonne Roux, Pierre Suchon, Marine Germain, Jean-François Deleuze, Pierre-Emmanuel Morange and David-Alexandre Trégouët: OPTIMIR, a novel algorithm for integrating available genome-wide genotype data into miRNA sequence alignment analysis. RNA (2019)
Registry entries: Bio.tools  OMICtools  Bioconda 
Pal2nal
converts proteins to genomic DNA alignment
Versions of package pal2nal
ReleaseVersionArchitectures
bullseye14.1-2all
sid14.1-2all
buster14.1-2all
Popcon: 4 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

PAL2NAL is a program that converts a multiple sequence alignment of proteins and the corresponding DNA (or mRNA) sequences into a codon-based DNA alignment. The program automatically assigns the corresponding codon sequence even if the input DNA sequence has mismatches with the input protein sequence, or contains UTRs, polyA tails. It can also deal with frame shifts in the input alignment, which is suitable for the analysis of pseudogenes. The resulting codon-based DNA alignment can further be subjected to the calculation of synonymous (Ks) and non-synonymous (Ka) substitution rates.

Please cite: Mikita Suyama, David Torrents and Peer Bork: PAL2NAL: robust conversion of protein sequence alignment into the corresponding codon alignments. (PubMed,eprint) Nucleic Acids Research 34:W609-W612 (2006)
Registry entries: Bio.tools  OMICtools  Bioconda 
Paleomix
pipelines and tools for the processing of ancient and modern HTS data
Versions of package paleomix
ReleaseVersionArchitectures
buster1.2.13.3-1amd64
sid1.2.14-1amd64
Popcon: 3 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

The PALEOMIX pipelines are a set of pipelines and tools designed to aid the rapid processing of High-Throughput Sequencing (HTS) data: The BAM pipeline processes de-multiplexed reads from one or more samples, through sequence processing and alignment, to generate BAM alignment files useful in downstream analyses; the Phylogenetic pipeline carries out genotyping and phylogenetic inference on BAM alignment files, either produced using the BAM pipeline or generated elsewhere; and the Zonkey pipeline carries out a suite of analyses on low coverage equine alignments, in order to detect the presence of F1-hybrids in archaeological assemblages. In addition, PALEOMIX aids in metagenomic analysis of the extracts.

The pipelines have been designed with ancient DNA (aDNA) in mind, and includes several features especially useful for the analyses of ancient samples, but can all be for the processing of modern samples, in order to ensure consistent data processing.

Please cite: Mikkel Schubert, Luca Ermini, Clio Der Sarkissian, Hákon Jónsson, Aurélien Ginolhac, Robert Schaefer, Michael D Martin, Ruth Fernández, Martin Kircher, Molly McCue, Eske Willerslev and Ludovic Orlando: Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. (PubMed) Nature Protocols 9(5):1056-82 (2014)
Registry entries: Bio.tools  SciCrunch  OMICtools 
Paraclu
Parametric clustering of genomic and transcriptomic features
Versions of package paraclu
ReleaseVersionArchitectures
buster9-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie9-1amd64,armel,armhf,i386
bullseye9-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid9-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch9-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 6 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Paraclu finds clusters in data attached to sequences. It was first applied to transcription start counts in genome sequences, but it could be applied to other things too.

Paraclu is intended to explore the data, imposing minimal prior assumptions, and letting the data speak for itself.

One consequence of this is that paraclu can find clusters within clusters. Real data sometimes exhibits clustering at multiple scales: there may be large, rarefied clusters; and within each large cluster there may be several small, dense clusters.

Please cite: Martin C. Frith, Eivind Valen, Anders Krogh, Yoshihide Hayashizaki, Piero Carninci and Albin Sandelin: A code for transcription initiation in mammalian genomes. (eprint) Genome Research 18(1):1-12 (2008)
Registry entries: OMICtools 
Parasail
Aligner based on libparasail3
Versions of package parasail
ReleaseVersionArchitectures
sid2.4.2+dfsg-2amd64,arm64,armel,armhf,i386,ppc64el
bullseye2.4.2+dfsg-2amd64,arm64,armel,armhf,i386,ppc64el
Popcon: 0 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

This package contains a command-line aligner based on libparasail3. Parasail is a SIMD C library containing implementations of the Smith-Waterman, Needleman-Wunsch, and various semi-global pairwise sequence alignment algorithm.

Parsinsert
Parsimonious Insertion of unclassified sequences into phylogenetic trees
Versions of package parsinsert
ReleaseVersionArchitectures
stretch1.04-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.04-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
jessie1.04-1amd64,armel,armhf,i386
sid1.04-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.04-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

ParsInsert efficiently produces both a phylogenetic tree and taxonomic classification for sequences for microbial community sequence analysis. This is a C++ implementation of the Parsimonious Insertion algorithm.

The package is enhanced by the following packages: parsinsert-testdata
Registry entries: OMICtools 
Parsnp
rapid core genome multi-alignment
Versions of package parsnp
ReleaseVersionArchitectures
stretch1.2+dfsg-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.2.1+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.2+dfsg-5amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.2.1+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
upstream1.5.1
Popcon: 2 users (4 upd.)*
Newer upstream!
License: DFSG free
Git

Parsnp was designed to align the core genome of hundreds to thousands of bacterial genomes within a few minutes to few hours. Input can be both draft assemblies and finished genomes, and output includes variant (SNP) calls, core genome phylogeny and multi-alignments. Parsnp leverages contextual information provided by multi-alignments surrounding SNP sites for filtration/cleaning, in addition to existing tools for recombination detection/filtration and phylogenetic reconstruction.

Please cite: Todd J. Treangen, Brian D. Ondov, Sergey Koren and Adam M. Phillippy: The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. (PubMed,eprint) Genome Biology 15(11):524 (2014)
Registry entries: OMICtools  Bioconda 
Patman
rapid alignment of short sequences to large databases
Versions of package patman
ReleaseVersionArchitectures
buster1.2.2+dfsg-5amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.2.2+dfsg-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.2.2+dfsg-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Patman searches for short patterns in large DNA databases, allowing for approximate matches. It is optimized for searching for many small pattern at the same time, for example microarray probes.

Please cite: Kay Prüfer, Udo Stenzel, Michael Dannemann, Richard E Green, Michael Lachmann and Janet Kelso: PatMaN: rapid alignment of short sequences to large databases. (PubMed,eprint) Bioinformatics 24(13):1530-1 (2008)
Registry entries: Bio.tools  SciCrunch  OMICtools 
Pbdagcon
sequence consensus using directed acyclic graphs
Versions of package pbdagcon
ReleaseVersionArchitectures
buster0.3+git20161121.0000000+ds-1.1amd64,arm64,mips64el,ppc64el
stretch0.3+20161121+ds-1amd64,arm64,mips64el,ppc64el
sid0.3+git20180411.c14c422+dfsg-1amd64,arm64,mips64el,ppc64el
bullseye0.3+git20180411.c14c422+dfsg-1amd64,arm64,mips64el,ppc64el
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

pbdagcon is a tool that implements DAGCon (Directed Acyclic Graph Consensus) which is a sequence consensus algorithm based on using directed acyclic graphs to encode multiple sequence alignment.

It uses the alignment information from blasr to align sequence reads to a "backbone" sequence. Based on the underlying alignment directed acyclic graph (DAG), it will be able to use the new information from the reads to find the discrepancies between the reads and the "backbone" sequences. A dynamic programming process is then applied to the DAG to find the optimum sequence of bases as the consensus. The new consensus can be used as a new backbone sequence to iteratively improve the consensus quality.

While the code is developed for processing PacBio(TM) raw sequence data, the algorithm can be used for general consensus purpose. Currently, it only takes FASTA input. For shorter read sequences, one might need to adjust the blasr alignment parameters to get the alignment string properly.

The code and the underlying graphical data structure have been used for some algorithm development prototyping including phasing reads and pre-assembly.

Registry entries: OMICtools  Bioconda 
Pbgenomicconsensus
Pacific Biosciences variant and consensus caller
Versions of package pbgenomicconsensus
ReleaseVersionArchitectures
buster2.3.2-5all
sid2.3.2-5all
stretch2.1.0-1all
upstream2.3.3
Popcon: 2 users (0 upd.)*
Newer upstream!
License: DFSG free
Git

The GenomicConsensus package provides Quiver, Pacific Biosciences' flagship consensus and variant caller. Quiver is an algorithm that finds the maximum likelihood template sequence given PacBio reads of the template. These reads are modeled using a conditional random field approach that prescribes a probability to a read given a template sequence. In addition to the base sequence of each read, Quiver uses several additional quality value covariates that the base caller provides.

This package is part of the SMRTAnalysis suite

Registry entries: OMICtools 
Pbhoney
genomic structural variation discovery
Versions of package pbhoney
ReleaseVersionArchitectures
stretch15.8.24+dfsg-2all
sid15.8.24+dfsg-5all
buster15.8.24+dfsg-3all
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp). PBHoney considers both intra-read discordance and soft-clipped tails of long reads to identify structural variants.

PBHoney is part of the PBSuite.

Registry entries: OMICtools 
Pbjelly
genome assembly upgrading tool
Versions of package pbjelly
ReleaseVersionArchitectures
buster15.8.24+dfsg-3all
stretch15.8.24+dfsg-2all
sid15.8.24+dfsg-5all
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes.

PBJelly is part of the PBSuite.

Registry entries: OMICtools 
Pbsim
simulator for PacBio sequencing reads
Versions of package pbsim
ReleaseVersionArchitectures
stretch1.0.3-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster1.0.3+git20180330.e014b1d+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.0.3+git20180330.e014b1d+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.0.3+git20180330.e014b1d+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

PacBio DNA sequencers produce two types of characteristic reads: CCS (short and low error rate) and CLR (long and high error rate), both of which could be useful for de novo assembly of genomes. PBSIM simulates those PacBio reads from a reference sequence by using either a model-based or sampling-based simulation. Simulated reads are useful, for example, when developing or evaluating sequence assemblers targeted at PacBio data.

Please cite: Yukiteru Ono, Kiyoshi Asai and Michiaki Hamada: PBSIM: PacBio reads simulator - toward accurate genome assembly. (PubMed,eprint) Bioinformatics 29(1):119-121 (2013)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Topics: Sequence analysis
Pbsuite
software for Pacific Biosciences sequencing data
Versions of package pbsuite
ReleaseVersionArchitectures
sid15.8.24+dfsg-5all
buster15.8.24+dfsg-3all
stretch15.8.24+dfsg-2all
Popcon: 0 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

The PBSuite contains two projects created for analysis of Pacific Biosciences long-read sequencing data.

  • PBJelly - genome upgrading tool
  • PBHoney - structural variation discovery
Registry entries: OMICtools 
Perm
efficient mapping of short reads with periodic spaced seeds
Versions of package perm
ReleaseVersionArchitectures
sid0.4.0-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye0.4.0-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster0.4.0-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch0.4.0-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie0.4.0-1amd64,armel,armhf,i386
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

PerM is a software package which was designed to perform highly efficient genome scale alignments for hundreds of millions of short reads produced by the ABI SOLiD and Illumina sequencing platforms. Today PerM is capable of providing full sensitivity for alignments within 4 mismatches for 50bp SOLID reads and 9 mismatches for 100bp Illumina reads.

Please cite: Yangho Chen, Tade Souaiaia and Ting Chen: PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds. (PubMed,eprint) Bioinformatics 25(19):2514-21 (2009)
Registry entries: Bio.tools  SciCrunch  OMICtools 
Pftools
build and search protein and DNA generalized profiles
Versions of package pftools
ReleaseVersionArchitectures
bullseye3+dfsg-3amd64
sid3+dfsg-3amd64
buster3+dfsg-3amd64
Popcon: 6 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

The pftools package contains all the software necessary to build protein and DNA generalized profiles and use them to scan and align sequences, and search databases.

File formats used by the pftools

  • Generalized profiles format and syntax.
  • The multiple sequence alignment format (PSA).
  • The extended header multiple sequence alignment format (XPSA).

Programs to build generalized profiles

 pfmake
   Build a profile from a multiple sequence alignment.
 pfscale
   Fit parameters of an extreme-value distribution to a profile score list.
 pfw
   Weight sequences of a multiple sequence alignment to correct for
   sampling bias.

Programs to search with generalized profiles

 pfsearch / pfsearchV3
   Search a protein or DNA sequence library for sequence segments matching
   a profile (V3 is the new version of this tool).
 pfscan
   Scan a protein or DNA sequence with a profile library

Conversion programs

 psa2msa
   Reformat PSA file to Pearson/Fasta multiple sequence alignment file.
 ptof
   Convert a protein profile into a frame-search profile to search DNA
   sequences. To be used with 2ft.
 2ft
   Converts both strands of DNA into so-called interleaved
   frame-translated DNA sequences to search with protein profiles. To be
   used with ptof.
 6ft
   Translates all six reading frames of a double-stranded DNA sequence
   into individual protein sequences.
 pfgtop
   Convert a profile in GCG format into PROSITE format.
 pfhtop
   Convert a HMMER1 ASCII-formatted HMM into an equivalent PROSITE profile.
 ptoh
   Converts a generalized profile into an approximately equivalent HMM
   profile in HMMER1 format (can be read by the hmmconvert program from
   the HMMER2 package).
Please cite: Christian J. A. Sigrist, Lorenzo Cerutti, Nicolas Hulo, Alexandre Gattiker, Laurent Falquet, Marco Pagni, Amos Bairoch and Philipp Bucher: PROSITE: a documented database using patterns and profiles as motif descriptors. (PubMed,eprint) Briefings in Bioinformatics 3(3):265-74 (2002)
Registry entries: Bio.tools  OMICtools 
Phast
phylogenetic analysis with space/time models
Versions of package phast
ReleaseVersionArchitectures
sid1.5+dfsg-1amd64,i386,mips64el,mipsel
bullseye1.5+dfsg-1amd64,i386,mips64el,mipsel
buster1.4+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

PHAST is a software package for comparative and evolutionary genomics. It consists of about half a dozen major programs, plus more than a dozen utilities for manipulating sequence alignments, phylogenetic trees, and genomic annotations. For the most part, PHAST focuses on two kinds of applications: the identification of novel functional elements, including protein-coding exons and evolutionarily conserved sequences; and statistical phylogenetic modeling, including estimation of model parameters, detection of signatures of selection, and reconstruction of ancestral sequences.

PHAST does not support phylogeny reconstruction or sequence alignment, and it is designed for use with DNA sequences only (see Comparison).

Please cite: Melissa J. Hubisz, Katherine S. Pollard and Adam Siepel: PHAST and RPHAST: phylogenetic analysis with space/time models. (PubMed,eprint) Bioinformatics 12(1):41-51 (2011)
Registry entries: SciCrunch  OMICtools  Bioconda 
Phipack
PHI test and other tests of recombination
Versions of package phipack
ReleaseVersionArchitectures
sid0.0.20160614-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch0.0.20160614-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster0.0.20160614-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye0.0.20160614-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 1 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

The PhiPack software package implements a few tests for recombination and can produce refined incompatibility matrices as well. Specifically, PHIPack implements the 'Pairwise Homoplasy Index', Maximum Chi2 and the 'Neighbour Similarity Score'. The program Phi can be run to produce a p-value of recombination within a data set and the program profile can be run to determine regions exhibiting strongest evidence mosaicism.

Please cite: Trevor C. Bruen, Hervé Philippe and David Bryant: A Simple and Robust Statistical Test for Detecting the Presence of Recombination. (PubMed,eprint) Genetics 172(4):2665-2681 (2006)
Registry entries: OMICtools  Bioconda 
Phybin
binning/clustering newick trees by topology
Versions of package phybin
ReleaseVersionArchitectures
sid0.3-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye0.3-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch0.3-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,s390x
buster0.3-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

PhyBin is a simple command line tool that classifies a set of Newick tree files by their topology. The purpose of it is to take a large set of tree files and browse through the most common tree topologies.

It can do simple binning of identical trees or more complex clustering based on an all-to-all Robinson-Foulds distance matrix.

phybin produces output files that characterize the size and contents of each bin or cluster (including generating GraphViz-based visual representations of the tree topologies).

Please cite: Ryan R. Newton and Irene L.G. Newton: PhyBin: binning trees by topology. (PubMed,eprint) PeerJ 1:e187 (2013)
Registry entries: OMICtools 
Physamp
sample sequence alignment corresponding to phylogeny
Versions of package physamp
ReleaseVersionArchitectures
bullseye1.1.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.1.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.1.0-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch0.2.0-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

The PhySamp package currently contains two programs: bppphysamp, which samples sequences according to their similarity, and bppalnoptim, which samples a sequence alignment by removing sequences in order to maximize the number of sites suitable for a given analysis. The bppalnoptim program has three running modes:

  • Interactive: the user will be iteratively proposed a set of choices for sequence removal, with their corresponding site gains. The procedure stops when the user does not want to remove more sequences, and the resulting filtered alignment is written.
  • Automatic: the user enters an a priori criterion for stopping the filtering procedure (for instance a minimum number of sequences to keep).
  • Diagnostic: this mode allows one to plot the trade-off curve, by showing the site gain as a function of the number of removed sequences.
Please cite: Julien Y. Dutheil and Emeric Figuet: Optimization of sequence alignments according to the number of sequences vs. number of sites trade-off. (PubMed,eprint) BMC Bioinformatics 16:160 (2015)
Registry entries: OMICtools 
Phyutility
simple analyses or modifications on both phylogenetic trees and data matrices
Versions of package phyutility
ReleaseVersionArchitectures
jessie2.7.3-1amd64,armel,armhf,i386
sid2.7.3+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye2.7.3+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster2.7.3+dfsg-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch2.7.3-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 4 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Phyutility (fyoo-til-i-te) is a command line program that performs simple analyses or modifications on both trees and data matrices.

Currently it performs the following functions (to suggest another feature, submit an Issue and use the label Type-Enhancement) :

Trees

  • rerooting
  • pruning
  • type conversion
  • consensus
  • leaf stability
  • lineage movement
  • tree support

Data Matrices

  • concatenate alignments
  • genbank parsing
  • trimming alignments
  • search NCBI
  • fetch NCBI
Please cite: Stephen A. Smith and Casey W. Dunn: Phyutility: a phyloinformatics utility for trees, alignments, and molecular data. (PubMed,eprint) Bioinformatics 24(5):715-716 (2008)
Registry entries: OMICtools 
Phyx
UNIX-style phylogenetic analyses on trees and sequences
Versions of package phyx
ReleaseVersionArchitectures
sid1.01+ds-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster0.999+ds-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.01+ds-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

phyx provides a convenient, lightweight and inclusive toolkit consisting of programs spanning the wide breadth of programs utilized by researchers performing phylogenomic analyses. Modeled after Unix/GNU/Linux command line tools, individual programs perform a single task and operate on standard I/O streams. A result of this stream-centric approach is that, for most programs, only a single sequence or tree is in memory at any moment. Thus, large datasets can be processed with minimal memory requirements. phyx’s ever-growing complement of programs consists of over 35 programs focused on exploring, manipulating, analyzing and simulating phylogenetic objects (alignments, trees and MCMC logs). As with standard Unix command line tools, these programs can be piped (together with non-phyx tools), allowing the easy construction of efficient analytical pipelines.

Registry entries: OMICtools 
Picopore
lossless compression of Nanopore files
Versions of package picopore
ReleaseVersionArchitectures
sid1.2.0-1all
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

The Nanopore is a device to determine the sequences of single moleculres of DNA. No amplification. The output is gigantic and tools like this one help to reduce it.

Over time, other means have substitute the need for this one. Upstream has halted development. Some tutorials and pipelines of the Nanopore still refer to it, though.

Registry entries: Bioconda 
Piler
genomic repeat analysis
Versions of package piler
ReleaseVersionArchitectures
buster0~20140707-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch0~20140707-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye0~20140707-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid0~20140707-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 4 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

PILER (Parsimonious Inference of a Library of Elementary Repeats) searches a genome sequence for repetitive elements. It implements search algorithms that identify characteristic patterns of local alignments induced by certain classes of repeats.

Please cite: Robert C. Edgar and Eugene W. Myers: PILER: identification and classification of genomic repeats. (PubMed,eprint) Bioinformatics 21(suppl 1):i152-i158 (2005)
Registry entries: OMICtools  Bioconda 
Pilercr
software for finding CRISPR repeats
Versions of package pilercr
ReleaseVersionArchitectures
bullseye1.06+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.06+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

CRISPR elements are short, highly conserved repeats in prokaryotic genomes separated by unique sequences of similar length. PILERCR is designed for the identification and analysis of CRISPR repeats.

Please cite: R. C. Edgar: PILER-CR: fast and accurate identification of CRISPR repeats. (PubMed,eprint) BMC Bioinformatics 8:18 (2007)
Registry entries: OMICtools 
Pilon
automated genome assembly improvement and variant detection tool
Versions of package pilon
ReleaseVersionArchitectures
sid1.23+dfsg-2all
bullseye1.23+dfsg-2all
buster1.23+dfsg-1all
Popcon: 2 users (9 upd.)*
Versions and Archs
License: DFSG free
Git

Pilon is a software tool which can be used to:

  • Automatically improve draft assemblies
  • Find variation among strains, including large event detection Pilon requires as input a FASTA file of the genome along with one or more BAM files of reads aligned to the input FASTA file. Pilon uses read alignment analysis to identify inconsistencies between the input genome and the evidence in the reads. It then attempts to make improvements to the input genome, including:

  • Single base differences

  • Small indels
  • Larger indel or block substitution events
  • Gap filling
  • Identification of local misassemblies, including optional opening of new gaps
Please cite: Bruce J. Walker, Thomas Abeel, Terrance Shea, Margaret Priest, Amr Abouelliel, Sharadha Sakthikumar, Christina A. Cuomo, Qiandong Zeng, Jennifer Wortman, Sarah K. Young and Ashlee M. Earl: Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement". (PubMed,eprint) PLOSone 9(11):e11296 (2014)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Pirs
Profile based Illumina pair-end Reads Simulator
Versions of package pirs
ReleaseVersionArchitectures
sid2.0.2+dfsg-8amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye2.0.2+dfsg-8amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster2.0.2+dfsg-8amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch2.0.2+dfsg-5.1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

The program pIRS can be used for simulating Illumina PE reads, with a series of characters generated by Illumina sequencing platform, such as insert size distribution, sequencing error(substitution, insertion, deletion), quality score and GC content-coverage bias.

The insert size follows a normal distribution, so users should set the mean value and standard deviation. Usually the standard deviation is set as 1/20 of the mean value. The normal distribution by Box-Muller method is simulated.

The program simulates sequencing error, quality score and GC content- coverage bias according to the empirical distribution profile. Some default profiles counted from lots of real sequencing data are provided.

To simulate reads from diploid genome, users should simulate the diploid genome sequence firstly by setting the ratio of heterozygosis SNP, heterozygosis InDel and structure variation.

Please cite: Xuesong Hu, Jianying Yuan, Yujian Shi, Jianliang Lu, Binghang Liu, Zhenyu Li, Yanxiang Chen, Desheng Mu, Hao Zhang, Nan Li, Zhen Yue, Fan Bai, Heng Li and Wei Fan: pIRS: Profile-based Illumina pair-end reads simulator. (PubMed,eprint) Bioinformatics 28(11):1533-5 (2012)
Registry entries: OMICtools  Bioconda 
Placnet
Plasmid Constellation Network project
Versions of package placnet
ReleaseVersionArchitectures
sid1.03-3all
stretch1.03-2all
bullseye1.03-3all
buster1.03-3all
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Placnet is a new tool for plasmid analysis in NGS projects. Placnet is optimized to work with Illumina sequences but it also works with 454, Iontorrent or any of the actual sequence technologies.

The input of placnet is a set of contigs and one or more SAM files with the mapping of the reads against the contigs. Placnet obtains a set of files, easily opened on Cytoscape software or other network tools.

Please cite: Val F. Lanza, María de Toro, M. Pilar Garcillán-Barcia, Azucena Mora, Jorge Blanco, Teresa M. Coque and Fernando de la Cruz: Plasmid Flux in Escherichia coli ST131 Sublineages, Analyzed by Plasmid Constellation Network (PLACNET), a New Method for Plasmid Reconstruction from Whole Genome Sequences. (PubMed,eprint) PLOS 10(12):e1004766 (2014)
Registry entries: OMICtools 
Plasmidseeker
identification of known plasmids from whole-genome sequencing reads
Versions of package plasmidseeker
ReleaseVersionArchitectures
sid1.0+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.0+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.0+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 1 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

PlasmidSeeker is a k-mer based program for the identification of known plasmids from bacterial whole genome sequencing reads.

PlasmidSeeker that enables the detection of plasmids from bacterial WGS data without read assembly. The PlasmidSeeker algorithm is based on k-mers and uses k-mer abundance to distinguish between plasmid and bacterial sequences. The performance of PlasmidSeeker was tested on a set of simulated and real bacterial WGS samples, resulting in 100% sensitivity and 99.98% specificity.

Please cite: Märt Roosaare, Mikk Puustusmaa, Märt Möls, Mihkel Vaher and Maido Remm: PlasmidSeeker: identification of known plasmids from bacterial whole genome sequencing reads. (PubMed,eprint) PeerJ - Life & Environment 6:e4588 (2018)
Registry entries: OMICtools 
Plip
fully automated protein-ligand interaction profiler
Versions of package plip
ReleaseVersionArchitectures
sid2.1.2~beta+dfsg-2all
buster1.4.3~b+dfsg-2all
stretch1.3.3+dfsg-1all
bullseye2.1.2~beta+dfsg-2all
Popcon: 5 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

The Protein-Ligand Interaction Profiler (PLIP) is a tool to analyze and visualize protein-ligand interactions in PDB files.

Features include:

  • Detection of eight different types of noncovalent interactions
  • Automatic detection of relevant ligands in a PDB file
  • Direct download of PDB structures from wwPDB server if valid PDB ID is given
  • Processing of custom PDB files containing protein-ligand complexes (e.g. from docking)
  • No need for special preparation of a PDB file, works out of the box
  • Atom-level interaction reports in rST and XML formats for easy parsing
  • Generation of PyMOL session files (.pse) for each pairing, enabling easy preparation of images for publications and talks
  • Rendering of preview image for each ligand and its interactions with the protein
Please cite: Sebastian Salentin, Sven Schreiber, V. Joachim Haupt, Melissa F. Adasme and Michael Schroeder: PLIP: fully automated protein–ligand interaction profiler. (eprint) Nucleic Acids Research (W1) (2015)
Registry entries: Bio.tools  OMICtools 
Populations
population genetic software
Versions of package populations
ReleaseVersionArchitectures
buster1.2.33+svn0120106+dfsg-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie1.2.33+svn0120106-2.1amd64,armel,armhf,i386
stretch1.2.33+svn0120106-2.1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
wheezy1.2.33+svn0120106-2.1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
bullseye1.2.33+svn0120106+dfsg-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.2.33+svn0120106+dfsg-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Debtags of package populations:
roleprogram
uitoolkitqt
Popcon: 4 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Populations is a population genetic software. It computes genetic distances between populations or individuals. It builds phylogenetic trees (NJ or UPGMA) with bootstrap values.

Registry entries: OMICtools 
Screenshots of package populations
Porechop
adapter trimmer for Oxford Nanopore reads
Versions of package porechop
ReleaseVersionArchitectures
bullseye0.2.4+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid0.2.4+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster0.2.4+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Porechop is a tool for finding and removing adapters from Oxford Nanopore reads. Adapters on the ends of reads are trimmed off, and when a read has an adapter in its middle, it is treated as chimeric and chopped into separate reads. Porechop performs thorough alignments to effectively find adapters, even at low sequence identity. Porechop also supports demultiplexing of Nanopore reads that were barcoded with the Native Barcoding Kit, PCR Barcoding Kit or Rapid Barcoding Kit.

Registry entries: OMICtools  Bioconda 
Poretools
toolkit for nanopore nucleotide sequencing data
Versions of package poretools
ReleaseVersionArchitectures
buster0.6.0+dfsg-3all
bullseye0.6.0+dfsg-5all
stretch0.6.0+dfsg-2all
sid0.6.0+dfsg-5all
Popcon: 4 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

poretools is a flexible toolkit for exploring datasets generated by nanopore sequencing devices from MinION for the purposes of quality control and downstream analysis. Poretools operates directly on the native FAST5 (a variant of the HDF5 standard) file format produced by ONT and provides a wealth of format conversion utilities and data exploration and visualization tools.

Please cite: Nicholas Loman and Aaron Quinlan: Poretools: a toolkit for analyzing nanopore sequence data. (PubMed,eprint) Bioinformatics 30(23):3399-3401 (2014)
Registry entries: OMICtools  Bioconda 
Prank
Probabilistic Alignment Kit for DNA, codon and amino-acid sequences
Versions of package prank
ReleaseVersionArchitectures
buster0.0.170427+dfsg-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid0.0.170427+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye0.0.170427+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch0.0.150803-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie0.0.140110-1amd64,armel,armhf,i386
Popcon: 17 users (7 upd.)*
Versions and Archs
License: DFSG free
Git

PRANK is a probabilistic multiple alignment program for DNA, codon and amino-acid sequences. It's based on a novel algorithm that treats insertions correctly and avoids over-estimation of the number of deletion events. In addition, PRANK borrows ideas from maximum likelihood methods used in phylogenetics and correctly takes into account the evolutionary distances between sequences. Lastly, PRANK allows for defining a potential structure for sequences to be aligned and then, simultaneously with the alignment, predicts the locations of structural units in the sequences.

PRANK is a command-line program for UNIX-style environments but the same sequence alignment engine is implemented in the graphical program PRANKSTER. In addition to providing a user-friendly interface to those not familiar with Unix systems, PRANKSTER is an alignment browser for alignments saved in the HSAML format. The novel format allows for storing all the information generated by the aligner and the alignment browser is a convenient way to analyse and manipulate the data.

PRANK aims at an evolutionarily correct sequence alignment and often the result looks different from ones generated with other alignment methods. There are, however, cases where the different look is caused by violations of the method's assumptions. To understand why things may go wrong and how to avoid that, read this explanation of differences between PRANK and traditional progressive alignment methods.

Please cite: Ari Löztznoja: Phylogeny-aware alignment with PRANK. (PubMed) Methods Mol. Biol. 1079:155-170 (2014)
Registry entries: Bio.tools  OMICtools  Bioconda 
Screenshots of package prank
Predictprotein
suite of protein sequence analysis tools
Versions of package predictprotein
ReleaseVersionArchitectures
jessie1.1.06-1all
sid1.1.09-2all
buster1.1.08-1all
stretch1.1.07-2all
Popcon: 3 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

PredictProtein is a sequence analysis suite providing prediction of protein structure and function.

PredictProtein takes a protein sequence as input and provides the following per-residue, or whole protein annotations:

  • secondary structure
  • solvent accessibility
  • multiple sequence alignments
  • PROSITE sequence motifs
  • low-complexity regions
  • nuclear localisation signals
  • regions lacking regular structure (NORS)
  • unstructured loops
  • transmembrane helices
  • transmembrane beta barrels
  • coiled-coil regions
  • disulfide-bonds
  • disordered regions
  • B-value flexibility
  • protein-protein interaction sites
  • Gene Ontology terms
Please cite: Burkhard Rost, Guy Yachdav and Jinfeng Liu: The PredictProtein server. (PubMed,eprint) Nucleic Acids Research 32(2):W321-W326 (2004)
Registry entries: Bio.tools  SciCrunch  OMICtools 
Presto
toolkit for processing B and T cell sequences
Versions of package presto
ReleaseVersionArchitectures
bullseye0.6.0-3all
sid0.6.0-3all
Popcon: 0 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

pRESTO is a toolkit for processing raw reads from high-throughput sequencing of B cell and T cell repertoires.

Dramatic improvements in high-throughput sequencing technologies now enable large-scale characterization of lymphocyte repertoires, defined as the collection of trans-membrane antigen-receptor proteins located on the surface of B cells and T cells. The REpertoire Sequencing TOolkit (pRESTO) is composed of a suite of utilities to handle all stages of sequence processing prior to germline segment assignment. pRESTO is designed to handle either single reads or paired-end reads. It includes features for quality control, primer masking, annotation of reads with sequence embedded barcodes, generation of unique molecular identifier (UMI) consensus sequences, assembly of paired-end reads and identification of duplicate sequences. Numerous options for sequence sorting, sampling and conversion operations are also included.

Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Proalign
Probabilistic multiple alignment program
Versions of package proalign
ReleaseVersionArchitectures
sid0.603-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye0.603-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
jessie0.603-1amd64,armel,armhf,i386
stretch0.603-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster0.603-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 3 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

ProAlign performs probabilistic sequence alignments using hidden Markov models (HMM). It includes a graphical interface (GUI) allowing to (i) perform alignments of nucleotide or amino-acid sequences, (ii) view the quality of solutions, (iii) filter the unreliable alignment regions and (iv) export alignments to other software.

ProAlign uses a progressive method, such that multiple alignment is created stepwise by performing pairwise alignments in the nodes of a guide tree. Sequences are described with vectors of character probabilities, and each pairwise alignment reconstructs the ancestral (parent) sequence by computing the probabilities of different characters according to an evolutionary model.

Please cite: Ari Löytynoja and Michel C Milinkovitch: A hidden Markov model for progressive multiple alignment. (PubMed,eprint) Bioinformatics 19(12):1505-13 (2003)
Registry entries: OMICtools 
Prodigal
Microbial (bacterial and archaeal) gene finding program
Versions of package prodigal
ReleaseVersionArchitectures
jessie2.6.1-1amd64,armel,i386
buster2.6.3-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch2.6.3-1amd64,arm64,armel,i386,mips,mips64el,mipsel,ppc64el,s390x
sid2.6.3-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye2.6.3-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 7 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program developed at Oak Ridge National Laboratory and the University of Tennessee. Key features of Prodigal include:

Speed: Prodigal is an extremely fast gene recognition tool (written in very vanilla C). It can analyze an entire microbial genome in 30 seconds or less.

Accuracy: Prodigal is a highly accurate gene finder. It correctly locates the 3' end of every gene in the experimentally verified Ecogene data set (except those containing introns). It possesses a very sophisticated ribosomal binding site scoring system that enables it to locate the translation initiation site with great accuracy (96% of the 5' ends in the Ecogene data set are located correctly).

Specificity: Prodigal's false positive rate compares favorably with other gene identification programs, and usually falls under 5%.

GC-Content Indifferent: Prodigal performs well even in high GC genomes, with over a 90% perfect match (5'+3') to the Pseudomonas aeruginosa curated annotations.

Metagenomic Version: Prodigal can run in metagenomic mode and analyze sequences even when the organism is unknown.

Ease of Use: Prodigal can be run in one step on a single genomic sequence or on a draft genome containing many sequences. It does not need to be supplied with any knowledge of the organism, as it learns all the properties it needs to on its own.

Please cite: Doug Hyatt, Gwo-Liang Chen, Philip F. Locascio, Miriam L. Land, Frank W. Larimer and Loren J. Hauser: Prodigal: prokaryotic gene recognition and translation initiation site identification. (PubMed,eprint) BMC Bioinformatics 11:119 (2010)
Registry entries: SciCrunch  OMICtools  Bioconda 
Profnet-bval
neural network architecture for profbval
Versions of package profnet-bval
ReleaseVersionArchitectures
wheezy1.0.21-1+wheezy1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
stretch1.0.22-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie1.0.22-2amd64,armel,armhf,i386
buster1.0.22-6amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.0.22-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 4 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Profnet is a component of the prediction methods that make up the Predict Protein service by the lab of Burkhard Rost. It provides the neural network component to a variety of predictors that perform protein feature prediction directly from sequence. This neural network implementation has to be compiled for every different network architecture.

This package contains the neural network architecture for profbval.

Please cite: Avner Schlessinger, Guy Yachdav and Burkhard Rost: OPRFbval: predict flexible and rigid residues in proteins. (PubMed,eprint) Bioinformatics 22(7):891-893 (2006 Apr 1)
Registry entries: OMICtools 
Profnet-chop
neural network architecture for profchop
Versions of package profnet-chop
ReleaseVersionArchitectures
jessie1.0.22-2amd64,armel,armhf,i386
sid1.0.22-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.0.22-6amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch1.0.22-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
wheezy1.0.21-1+wheezy1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Profnet is a component of the prediction methods that make up the Predict Protein service by the lab of Burkhard Rost. It provides the neural network component to a variety of predictors that perform protein feature prediction directly from sequence. This neural network implementation has to be compiled for every different network architecture.

This package contains the neural network architecture for profchop.

Please cite: Avner Schlessinger, Guy Yachdav and Burkhard Rost: OPRFbval: predict flexible and rigid residues in proteins. (PubMed,eprint) Bioinformatics 22(7):891-893 (2006 Apr 1)
Registry entries: OMICtools 
Profnet-con
neural network architecture for profcon
Versions of package profnet-con
ReleaseVersionArchitectures
buster1.0.22-6amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie1.0.22-2amd64,armel,armhf,i386
stretch1.0.22-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.0.22-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
wheezy1.0.21-1+wheezy1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Profnet is a component of the prediction methods that make up the Predict Protein service by the lab of Burkhard Rost. It provides the neural network component to a variety of predictors that perform protein feature prediction directly from sequence. This neural network implementation has to be compiled for every different network architecture.

This package contains the neural network architecture for profcon.

Please cite: Avner Schlessinger, Guy Yachdav and Burkhard Rost: OPRFbval: predict flexible and rigid residues in proteins. (PubMed,eprint) Bioinformatics 22(7):891-893 (2006 Apr 1)
Registry entries: OMICtools 
Profnet-isis
neural network architecture for profisis
Versions of package profnet-isis
ReleaseVersionArchitectures
sid1.0.22-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.0.22-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
wheezy1.0.21-1+wheezy1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
buster1.0.22-6amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie1.0.22-2amd64,armel,armhf,i386
Popcon: 4 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Profnet is a component of the prediction methods that make up the Predict Protein service by the lab of Burkhard Rost. It provides the neural network component to a variety of predictors that perform protein feature prediction directly from sequence. This neural network implementation has to be compiled for every different network architecture.

This package contains the neural network architecture for profisis.

Please cite: Avner Schlessinger, Guy Yachdav and Burkhard Rost: OPRFbval: predict flexible and rigid residues in proteins. (PubMed,eprint) Bioinformatics 22(7):891-893 (2006 Apr 1)
Registry entries: OMICtools 
Profnet-md
neural network architecture for metadisorder
Versions of package profnet-md
ReleaseVersionArchitectures
jessie1.0.22-2amd64,armel,armhf,i386
stretch1.0.22-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.0.22-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
wheezy1.0.21-1+wheezy1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
buster1.0.22-6amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Profnet is a component of the prediction methods that make up the Predict Protein service by the lab of Burkhard Rost. It provides the neural network component to a variety of predictors that perform protein feature prediction directly from sequence. This neural network implementation has to be compiled for every different network architecture.

This package contains the neural network architecture for metadisorder.

Please cite: Avner Schlessinger, Guy Yachdav and Burkhard Rost: OPRFbval: predict flexible and rigid residues in proteins. (PubMed,eprint) Bioinformatics 22(7):891-893 (2006 Apr 1)
Registry entries: OMICtools 
Profnet-norsnet
neural network architecture for norsnet
Versions of package profnet-norsnet
ReleaseVersionArchitectures
stretch1.0.22-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie1.0.22-2amd64,armel,armhf,i386
sid1.0.22-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
wheezy1.0.21-1+wheezy1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
buster1.0.22-6amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 4 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Profnet is a component of the prediction methods that make up the Predict Protein service by the lab of Burkhard Rost. It provides the neural network component to a variety of predictors that perform protein feature prediction directly from sequence. This neural network implementation has to be compiled for every different network architecture.

This package contains the neural network architecture for norsnet.

Please cite: Avner Schlessinger, Guy Yachdav and Burkhard Rost: OPRFbval: predict flexible and rigid residues in proteins. (PubMed,eprint) Bioinformatics 22(7):891-893 (2006 Apr 1)
Registry entries: OMICtools 
Profnet-prof
neural network architecture for profacc
Versions of package profnet-prof
ReleaseVersionArchitectures
sid1.0.22-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.0.22-6amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie1.0.22-2amd64,armel,armhf,i386
wheezy1.0.21-1+wheezy1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
stretch1.0.22-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 7 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

Profnet is a component of the prediction methods that make up the Predict Protein service by the lab of Burkhard Rost. It provides the neural network component to a variety of predictors that perform protein feature prediction directly from sequence. This neural network implementation has to be compiled for every different network architecture.

This package contains the neural network architecture for profsec and profacc.

Please cite: Avner Schlessinger, Guy Yachdav and Burkhard Rost: OPRFbval: predict flexible and rigid residues in proteins. (PubMed,eprint) Bioinformatics 22(7):891-893 (2006 Apr 1)
Registry entries: OMICtools 
Profnet-snapfun
neural network architecture for snapfun
Versions of package profnet-snapfun
ReleaseVersionArchitectures
stretch1.0.22-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster1.0.22-6amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie1.0.22-2amd64,armel,armhf,i386
sid1.0.22-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
wheezy1.0.21-1+wheezy1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Profnet is a component of the prediction methods that make up the Predict Protein service by the lab of Burkhard Rost. It provides the neural network component to a variety of predictors that perform protein feature prediction directly from sequence. This neural network implementation has to be compiled for every different network architecture.

This package contains the neural network architecture for snapfun.

Please cite: Avner Schlessinger, Guy Yachdav and Burkhard Rost: OPRFbval: predict flexible and rigid residues in proteins. (PubMed,eprint) Bioinformatics 22(7):891-893 (2006 Apr 1)
Registry entries: OMICtools 
Profphd
secondary structure and solvent accessibility predictor
Versions of package profphd
ReleaseVersionArchitectures
wheezy1.0.39-1all
buster1.0.42-3all
stretch1.0.42-1all
jessie1.0.40-1all
sid1.0.42-3all
Popcon: 6 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

This package provides prof(1), the protein secondary structure, accessibility and transmembrane helix predictor from Burkhard Rost. Prediction is either done from protein sequence alone or from an alignment - the latter should be used for optimal performance.

How well does prof(1) perform?

  • Secondary structure is predicted at an expected average accuracy > 72% for the three states helix, strand and loop.

  • Solvent accessibility is predicted at a correlation coefficient (correlation between experimentally observed and predicted relative solvent accessibility) of 0.54

  • Transmembrane helix prediction has an expected per-residue accuracy of about 95%. The number of false positives, i.e., transmembrane helices predicted in globular proteins, is about 2%.

Please cite: Burkhard Rost and Chris Sander: Combining evolutionary information and neural networks to predict protein secondary structure. (PubMed) Proteins 19(1):55-72 (1994)
Profphd-net
neural network architecture for profphd
Versions of package profphd-net
ReleaseVersionArchitectures
sid1.0.22-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
jessie1.0.22-2amd64,armel,armhf,i386
wheezy1.0.21-1+wheezy1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
buster1.0.22-6amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch1.0.22-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 7 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

Profnet is a component of the prediction methods that make up the Predict Protein service by the lab of Burkhard Rost. It provides the neural network component to a variety of predictors that perform protein feature prediction directly from sequence. This neural network implementation has to be compiled for every different network architecture.

This package contains the neural network architecture for profphd.

Please cite: Avner Schlessinger, Guy Yachdav and Burkhard Rost: OPRFbval: predict flexible and rigid residues in proteins. (PubMed,eprint) Bioinformatics 22(7):891-893 (2006 Apr 1)
Registry entries: OMICtools 
Profphd-utils
profphd helper utilities convert_seq and filter_hssp
Versions of package profphd-utils
ReleaseVersionArchitectures
jessie1.0.10-1amd64,armel,armhf,i386
bullseye1.0.10-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.0.10-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster1.0.10-5amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.0.10-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
wheezy1.0.9-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
Popcon: 7 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

The package provides the following binary utilities: convert_seq, filter_hssp. These are used by prof from the profphd package: a secondary structure, accessibility and transmembrane helix predictor from Burkhard Rost.

Progressivemauve
multiple genome alignment algorithms
Versions of package progressivemauve
ReleaseVersionArchitectures
sid1.2.0+4713+dfsg-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.2.0+4713+dfsg-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.2.0+4713+dfsg-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch1.2.0+4713-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 4 users (8 upd.)*
Versions and Archs
License: DFSG free
Git

The mauveAligner and progressiveMauve alignment algorithms have been implemented as command-line programs included with the downloadable Mauve software. When run from the command-line, these programs provide options not yet available in the graphical interface.

Mauve is a system for efficiently constructing multiple genome alignments in the presence of large-scale evolutionary events such as rearrangement and inversion. Multiple genome alignment provides a basis for research into comparative genomics and the study of evolutionary dynamics. Aligning whole genomes is a fundamentally different problem than aligning short sequences.

Mauve has been developed with the idea that a multiple genome aligner should require only modest computational resources. It employs algorithmic techniques that scale well in the amount of sequence being aligned. For example, a pair of Y. pestis genomes can be aligned in under a minute, while a group of 9 divergent Enterobacterial genomes can be aligned in a few hours.

Mauve computes and interactively visualizes genome sequence comparisons. Using FastA or GenBank sequence data, Mauve constructs multiple genome alignments that identify large-scale rearrangement, gene gain, gene loss, indels, and nucleotide substutition.

Mauve is developed at the University of Wisconsin.

Please cite: Aaron E. Darling, Bob Mau and Nicole T. Perna: progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. (PubMed,eprint) PloS one 5(6):e11147 (2010)
Registry entries: Bio.tools  SciCrunch  OMICtools 
Prokka
rapid annotation of prokaryotic genomes
Versions of package prokka
ReleaseVersionArchitectures
sid1.14.6+dfsg-1all
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

A typical 4 Mbp genome can be fully annotated in less than 10 minutes on a quad-core computer, and scales well to 32 core SMP systems. It produces GFF3, GBK and SQN files that are ready for editing in Sequin and ultimately submitted to Genbank/DDJB/ENA.

The package is enhanced by the following packages: multiqc
Please cite: Torsten Seemann: Prokka: rapid prokaryotic genome annotation. (PubMed,eprint) Bioinformatics 30(14):2068-2069 (2014)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
Proteinortho
Detection of (Co-)orthologs in large-scale protein analysis
Versions of package proteinortho
ReleaseVersionArchitectures
sid6.0.16+dfsg-1amd64,arm64,mips64el,ppc64el,s390x
bullseye6.0.16+dfsg-1amd64,arm64,mips64el,ppc64el,s390x
buster5.16.b+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch5.15+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
upstream6.0.18
Popcon: 2 users (7 upd.)*
Newer upstream!
License: DFSG free
Git

Proteinortho is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. Proteinortho was applied to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. Authors succeeded identifying thirty proteins present in 99% of all bacterial proteomes.

Please cite: Marcus Lechner, Sven Findeiß, Lydia Steiner, Manja Marz, Peter F Stadler and Sonja J Prohaska: Proteinortho: Detection of (Co-)orthologs in large-scale analysis. (PubMed,eprint) BMC Bioinformatics 12:124 (2011)
Registry entries: OMICtools 
Prottest
selection of best-fit models of protein evolution
Versions of package prottest
ReleaseVersionArchitectures
stretch3.4.2+dfsg-2all
sid3.4.2+dfsg-3all
bullseye3.4.2+dfsg-3all
buster3.4.2+dfsg-3all
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

PROTTEST (ModelTest's relative) is a program for selecting the model of protein evolution that best fits a given set of sequences (alignment). This java program is based on the Phyml program (for maximum likelihood calculations and optimization of parameters) and uses the PAL library as well. Models included are empirical substitution matrices (such as WAG, LG, mtREV, Dayhoff, DCMut, JTT, VT, Blosum62, CpREV, RtREV, MtMam, MtArt, HIVb, and HIVw) that indicate relative rates of amino acid replacement, and specific improvements (+I:invariable sites, +G: rate heterogeneity among sites, +F: observed amino acid frequencies) to account for the evolutionary constraints impossed by conservation of protein structure and function. ProtTest uses the Akaike Information Criterion (AIC) and other statistics (AICc and BIC) to find which of the candidate models best fits the data at hand.

Please cite: Diego Darriba, Guillermo L. Taboada, Ramón Doallo and David Posada: ProtTest 3: fast selection of best-fit models of protein evolution. (PubMed,eprint) Bioinformatics 27(8):1164-5 (2011)
Registry entries: Bio.tools  SciCrunch  OMICtools 
Pscan-chip
ChIP-based identifcation of TF binding sites
Versions of package pscan-chip
ReleaseVersionArchitectures
sid1.1-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.1-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.1-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

Regulation of transcription is one of the main check points of gene expression regulation and plays a key role in fundamental processes like cellular differentiation and dynamic molecular responses to stimuli The transcriptional activity of genes is finely regulated by the interaction of sequence elements on the DNA (transcription factor binding sites or TFBSs) and particular proteins called Transcription Factors (TFs). , TFBSs are usually clustered in specific regulatory genomic regions called promoters and enhancers. TFs usually recognize TFBSs in a loose sequence specific fashion but there is no computational way to determine if any given sequence motif on the DNA is actually bound in-vivo by a TF, even when the motif is an istance of the sequences typically bound by the TF itself.

Tools like Pscan and PscanChIP analyse a set of regulatory sequences to detect motif enrichment. The rationale is that if a given TFBS is present in a "surpisingly high" number of istances then there is a good chance that the TF that recognize that motif is a common regulator of the input sequences, thus they use redundancy as an information source.

While Pscan (of the pscan-tfbs package) is tailored to work on promoters, that is the regulatory regions upstream of transcription start sites, PscanChIP is suited to work on more general regulatory genomic regions like the ones identified through ChIP-Seq experiments.

Registry entries: Bio.tools  SciCrunch  OMICtools 
Pscan-tfbs
search for transcription factor binding sites
Versions of package pscan-tfbs
ReleaseVersionArchitectures
sid1.2.2-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.2.2-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.2.2-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (6 upd.)*
Versions and Archs
License: DFSG free
Git

Pscan finds Over-represented Transcription Factor Binding Site Motifs in Sequences from Co-Regulated or Co-Expressed Genes.

Pscan is a software tool that scans a set of sequences (e.g. promoters) from co-regulated or co-expressed genes with motifs describing the binding specificity of known transcription factors and assesses which motifs are significantly over- or under-represented, providing thus hints on which transcription factors could be common regulators of the genes studied, together with the location of their candidate binding sites in the sequences. Pscan does not resort to comparisons with orthologous sequences and experimental results show that it compares favorably to other tools for the same task in terms of false positive predictions and computation time. The website is free and open to all users and there is no login requirement.

Please cite: Federico Zambelli, Graziano Pesole and Giulio Pavesi: Pscan: Finding Over-represented Transcription Factor Binding Site Motifs in Sequences from Co-Regulated or Co-Expressed Genes. (PubMed,eprint) Nucleic Acids Research 37(Web Server Issue):W247-W252 (2009)
Registry entries: Bio.tools  OMICtools 
Psortb
bacterial localization prediction tool
Versions of package psortb
ReleaseVersionArchitectures
sid3.0.6+dfsg-2amd64
buster3.0.6+dfsg-1amd64
bullseye3.0.6+dfsg-2amd64
Popcon: 2 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

PSORTb enables prediction of bacterial protein subcellular localization (SCL) and provides a quick and inexpensive means for gaining insight into protein function, verifying experimental results, annotating newly sequenced bacterial genomes, detecting potential cell surface/secreted drug targets, as well as identifying biomarkers for microbes.

Please cite: Nancy Y. Yu, James R. Wagner, Matthew R. Laird, Gabor Melli, Sébastien Rey, Raymond Lo, Phuong Dao, S. Cenk Sahinalp, Martin Ester, Leonard J. Foster and F. S. Brinkman: PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. (PubMed,eprint) Bioinformatics 26(13):1608-1615 (2010)
Registry entries: OMICtools 
Pycoqc
computes metrics and generates Interactive QC plots
Versions of package pycoqc
ReleaseVersionArchitectures
sid2.5.0.21+dfsg-3all
bullseye2.5.0.21+dfsg-3all
Popcon: users ( upd.)*
Versions and Archs
License: DFSG free
Git

PycoQC computes metrics and generates interactive QC plots for Oxford Nanopore technologies sequencing data

PycoQC relies on the sequencing_summary.txt file generated by Albacore and Guppy, but if needed it can also generates a summary file from basecalled fast5 files. The package supports 1D and 1D2 runs generated with Minion, Gridion and Promethion devices and basecalled with Albacore 1.2.1+ or Guppy 2.1.3+

The package is enhanced by the following packages: multiqc
Registry entries: Bioconda 
Pycorrfit
tool for fitting correlation curves on a logarithmic plot
Versions of package pycorrfit
ReleaseVersionArchitectures
jessie0.8.3-2all
buster1.1.5+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch0.9.9+dfsg-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.1.7+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Debtags of package pycorrfit:
fieldbiology, mathematics, physics
interfacex11
roleprogram
sciencemodelling, plotting, visualisation
scopeapplication
uitoolkitwxwidgets
useanalysing, learning, organizing, viewing
x11application
Popcon: 5 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

PyCorrFit is a general-purpose FCS evaluation software that, amongst other formats, supports the established Zeiss ConfoCor3 ~.fcs file format. PyCorrFit comes with several built-in model functions, covering a wide range of applications in standard confocal FCS. In addition, it contains equations dealing with different excitation geometries like total internal reflection (TIR).

Please cite: Paul Müller, Petra Schwille and Thomas Weidemann: PyCorrFit—generic data evaluation for fluorescence correlation spectroscopy. (PubMed) Bioinformatics 30(17):2532–2533 (2014)
Registry entries: OMICtools 
Other screenshots of package pycorrfit
VersionURL
0.8.1-1https://screenshots.debian.net/screenshots/000/010/540/large.png
Screenshots of package pycorrfit
Pyscanfcs
scientific tool for perpendicular line scanning FCS
Versions of package pyscanfcs
ReleaseVersionArchitectures
bullseye0.3.5+ds-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
jessie0.2.2-2amd64,armel,armhf,i386
stretch0.2.3-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid0.3.5+ds-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster0.3.2+ds-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 21 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

When a membrane is scanned perpendicularly to its surface, the fluorescence signal originating from the membrane itself must be separated from the signal of the surrounding medium for an FCS analysis. PyScanFCS interactively extracts the fluctuating fluorescence signal from such measurements and applies a multiple-tau algorithm. The obtained correlation curves can be evaluated using PyCorrFit.

Package provides the Python module pyscanfcs and its graphical user interface. The graphical user interface is written in wxPython.

Registry entries: OMICtools 
Python3-deeptools
platform for exploring biological deep-sequencing data
Versions of package python3-deeptools
ReleaseVersionArchitectures
sid3.4.3-1all
bullseye3.4.3-1all
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Aiming for compatibility with the Galaxy worklfow environment, but also independently contributing to a series of workflows in genomics, this package provides a series of tools to address common tasks for the processing of high-throughput DNA/RNA sequencing.

Please cite: Fidel Ramirez, Devon P. Ryan, Björn Grüning, Sarah Diehl, Vivek Bhardwaj, Fabian Kilpert, Andreas S Richter, Steffen Heyne, Friederike Dündar and Thomas Manke: deepTools2: a next generation web server for deep-sequencing data analysis. (eprint) Nucleic Acids Research :W160–W165 (2016)
Registry entries: Bio.tools  OMICtools  Bioconda 
Python3-deeptoolsintervals
handlig GTF-like sequence-associated interal-annotation
Versions of package python3-deeptoolsintervals
ReleaseVersionArchitectures
bullseye0.1.9-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid0.1.9-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Regions in biological sequences are described (annotated) as genes, transcription factor binding sites, low complexity, ... whatever biological research brings.

This package supports the efficienct operation with this information.

Python3-geneimpacts
wraps command line tools to assess variants in gene sequences
Versions of package python3-geneimpacts
ReleaseVersionArchitectures
bullseye0.3.7-2all
sid0.3.7-2all
Popcon: 2 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

Interpersonal differences in DNA is responsible for variations in response to external stimuli, the efficiency of metabolism or may even cause what is referenced as a genetic disorder.

A range of tools have been created to predict the importance of differences (polymorphisms) in genetic sequences at single nucleotides, SNPs. This Python class wraps and represents findings provided by any of the tools snpEff, VEP and BCFT.

Registry entries: Bioconda 
Python3-gffutils
Work with GFF and GTF files in a flexible database framework
Versions of package python3-gffutils
ReleaseVersionArchitectures
bullseye0.10.1-2all
sid0.10.1-2all
buster0.9-1all
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

A Python package for working with and manipulating the GFF and GTF format files typically used for genomic annotations. Files are loaded into a sqlite3 database, allowing much more complex manipulation of hierarchical features (e.g., genes, transcripts, and exons) than is possible with plain-text methods alone.

Registry entries: Bio.tools  OMICtools  Bioconda 
Python3-pybedtools
Python 3 wrapper around BEDTools for bioinformatics work
Versions of package python3-pybedtools
ReleaseVersionArchitectures
bullseye0.8.0-5amd64,arm64,mips64el,ppc64el
sid0.8.0-5amd64,arm64,mips64el,ppc64el
buster0.8.0-1amd64,arm64,mips64el,ppc64el
Popcon: 3 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

The BEDTools suite of programs is widely used for genomic interval manipulation or “genome algebra”. pybedtools wraps and extends BEDTools and offers feature-level manipulations from within Python.

This is the Python 3 version.

Please cite: R. K. Dale, B. S. Pedersen and A. R. Quinlan: Pybedtools: a flexible Python library for manipulating genomic datasets and annotations". Bioinformatics 27(24):3423-3424 (2011)
Registry entries: Bio.tools  OMICtools  Bioconda 
Python3-pybel
Biological Expression Language
Versions of package python3-pybel
ReleaseVersionArchitectures
buster0.12.1-1all
bullseye0.14.9-1all
sid0.14.9-1all
upstream0.14.10
Popcon: 4 users (1 upd.)*
Newer upstream!
License: DFSG free
Git

PyBEL is a pure Python package for parsing and handling biological networks encoded in the Biological Expression Language (BEL) version 2. It also facilitates data interchange between common formats and databases such as NetworkX, JSON, CSV, SIF, Cytoscape, CX, NDEx, SQL, and Neo4J.

This package installs the library for Python 3.

Please cite: Charles Tapley Hoyt, Andrej Konotopez and Christian Ebeling: PyBEL: a computational framework for Biological Expression Language. (eprint) Bioinformatics 34(4):703–704 (2018)
Registry entries: Bio.tools  Bioconda 
Python3-sqt
SeQuencing Tools for biological DNA/RNA high-throughput data
Versions of package python3-sqt
ReleaseVersionArchitectures
bullseye0.8.0-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
sid0.8.0-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
buster0.8.0-3amd64,arm64,mips64el,ppc64el
Popcon: 2 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

sqt is a collection of command-line tools for working with high-throughput sequencing data. Conceptionally not fixed to use any particular language, many sqt subcommands are currently implemented in Python. For them, a Python package is available with functions for reading and writing FASTA/FASTQ files, computing alignments, quality trimming, etc.

The following tools are offered:

  • sqt-coverage -- Compute per-reference statistics such as coverage and GC content
  • sqt-fastqmod -- FASTQ modifications: shorten, subset, reverse complement, quality trimming.
  • sqt-fastastats -- Compute N50, min/max length, GC content etc. of a FASTA file
  • sqt-qualityguess -- Guess quality encoding of one or more FASTA files.
  • sqt-globalalign -- Compute a global or semiglobal alignment of two strings.
  • sqt-chars -- Count length of the first word given on the command line.
  • sqt-sam-cscq -- Add the CS and CQ tags to a SAM file with colorspace reads.
  • sqt-fastamutate -- Add substitutions and indels to sequences in a FASTA file.
  • sqt-fastaextract -- Efficiently extract one or more regions from an indexed FASTA file.
  • sqt-translate -- Replace characters in FASTA files (like the 'tr' command).
  • sqt-sam-fixn -- Replace all non-ACGT characters within reads in a SAM file.
  • sqt-sam-insertsize -- Mean and standard deviation of paired-end insert sizes.
  • sqt-sam-set-op -- Set operations (union, intersection, ...) on SAM/BAM files.
  • sqt-bam-eof -- Check for the End-Of-File marker in compressed BAM files.
  • sqt-checkfastqpe -- Check whether two FASTQ files contain correctly paired paired-end data.
Registry entries: Bioconda 
Python3-treetime
inference of time stamped phylogenies and ancestral reconstruction (Python 3)
Versions of package python3-treetime
ReleaseVersionArchitectures
sid0.7.5-1all
bullseye0.7.5-1all
buster0.5.3-1all
Popcon: 1 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

TreeTime provides routines for ancestral sequence reconstruction and the maximum likelihoo inference of molecular-clock phylogenies, i.e., a tree where all branches are scaled such that the locations of terminal nodes correspond to their sampling times and internal nodes are placed at the most likely time of divergence.

TreeTime aims at striking a compromise between sophisticated probabilistic models of evolution and fast heuristics. It implements GTR models of ancestral inference and branch length optimization, but takes the tree topology as given. To optimize the likelihood of time-scaled phylogenies, treetime uses an iterative approach that first infers ancestral sequences given the branch length of the tree, then optimizes the positions of unconstraine d nodes on the time axis, and then repeats this cycle. The only topology optimization are (optional) resolution of polytomies in a way that is most (approximately) consistent with the sampling time constraints on the tree. The package is designed to be used as a stand-alone tool or as a library used in larger phylogenetic analysis workflows.

Features

  • ancestral sequence reconstruction (marginal and joint maximum likelihood)
  • molecular clock tree inference (marginal and joint maximum likelihood)
  • inference of GTR models
  • rerooting to obtain best root-to-tip regression
  • auto-correlated relaxed molecular clock (with normal prior)

This package provides the Python 3 module.

Registry entries: OMICtools  Bioconda 
Pyvcf
helper scripts for Variant Call Format (VCF) parser
Versions of package pyvcf
ReleaseVersionArchitectures
buster0.6.8+git20170215.476169c-1all
stretch0.6.8-1all
sid0.6.8+git20170215.476169c-7all
bullseye0.6.8+git20170215.476169c-7all
Popcon: 4 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

The Variant Call Format (VCF) specifies the format of a text file used in bioinformatics for storing gene sequence variations. The format has been developed with the advent of large-scale genotyping and DNA sequencing projects, such as the 1000 Genomes Project.

The intent of this module is to mimic the csv module in the Python stdlib, as opposed to more flexible serialization formats like JSON or YAML. vcf will attempt to parse the content of each record based on the data types specified in the meta-information lines -- specifically the ##INFO and ##FORMAT lines. If these lines are missing or incomplete, it will check against the reserved types mentioned in the spec. Failing that, it will just return strings.

This package provides helper scripts using python3-pyvcf.

Registry entries: OMICtools  Bioconda 
Qcat
demultiplexing Oxford Nanopore reads from FASTQ files
Versions of package qcat
ReleaseVersionArchitectures
sid1.1.0-2all
bullseye1.1.0-2all
Popcon: users ( upd.)*
Versions and Archs
License: DFSG free
Git

Qcat is a command-line tool for demultiplexing Oxford Nanopore reads from FASTQ files. It accepts basecalled FASTQ files and splits the reads into separate FASTQ files based on their barcode. Qcat makes the demultiplexing algorithms used in albacore/guppy and EPI2ME available to be used locally with FASTQ files. Currently qcat implements the EPI2ME algorithm.

The package is enhanced by the following packages: qcat-examples
Registry entries: Bioconda 
Qcumber
quality control of genomic sequences
Versions of package qcumber
ReleaseVersionArchitectures
sid1.0.14+dfsg-1all
bullseye1.0.14+dfsg-1all
buster1.0.14+dfsg-1all
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

QCPipeline is a tool for quality control. The workflow is as follows:

 1. Quality control with FastQC
 2. Trim Reads with Trimmomatic
 3. Quality control of trimmed reads with FastQC
 4. Map reads against reference using bowtie2
 5. Classify reads with Kraken
Registry entries: OMICtools  Bioconda 
Qiime
Quantitative Insights Into Microbial Ecology
Versions of package qiime
ReleaseVersionArchitectures
wheezy1.4.0-2amd64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
sid2019.10.0-1all
jessie1.8.0+dfsg-4amd64,armel,armhf,i386
upstream2020.6.0
Debtags of package qiime:
roleprogram
Popcon: 3 users (0 upd.)*
Newer upstream!
License: DFSG free
Git

QIIME 2 is a powerful, extensible, and decentralized microbiome analysis package with a focus on data and analysis transparency. QIIME 2 enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results. Key features:

  • Integrated and automatic tracking of data provenance
  • Semantic type system
  • Plugin system for extending microbiome analysis functionality
  • Support for multiple types of user interfaces (e.g. API, command line, graphical)

QIIME 2 is a complete redesign and rewrite of the QIIME 1 microbiome analysis pipeline. QIIME 2 will address many of the limitations of QIIME 1, while retaining the features that makes QIIME 1 a powerful and widely-used analysis pipeline.

QIIME 2 currently supports an initial end-to-end microbiome analysis pipeline. New functionality will regularly become available through QIIME 2 plugins. You can view a list of plugins that are currently available on the QIIME 2 plugin availability page. The future plugins page lists plugins that are being developed.

Please cite: Evan Bolyen, Jai Ram Rideout, Matthew R Dillon, Nicholas A Bokulich, Christian Abnet, Gabriel A Al-Ghalith, Harriet Alexander, Eric J Alm, Manimozhiyan Arumugam, Francesco Asnicar, Yang Bai, Jordan E Bisanz, Kyle Bittinger, Asker Brejnrod, Colin J Brislawn, C Titus Brown, Benjamin J Callahan, Andrés Mauricio Caraballo-Rodríguez, John Chase, Emily Cope, Ricardo Da Silva, Pieter C Dorrestein, Gavin M Douglas, Daniel M Durall, Claire Duvallet, Christian F Edwardson, Madeleine Ernst, Mehrbod Estaki, Jennifer Fouquier, Julia M Gauglitz, Deanna L Gibson, Antonio Gonzalez, Kestrel Gorlick, Jiarong Guo, Benjamin Hillmann, Susan Holmes, Hannes Holste, Curtis Huttenhower, Gavin Huttley, Stefan Janssen, Alan K Jarmusch, Lingjing Jiang, Benjamin Kaehler, Kyo Bin Kang, Christopher R Keefe, Paul Keim, Scott T Kelley, Dan Knights, Irina Koester, Tomasz Kosciolek, Jorden Kreps, Morgan GI Langille, Joslynn Lee, Ruth Ley, Yong-Xin Liu, Erikka Loftfield, Catherine Lozupone, Massoud Maher, Clarisse Marotz, Bryan D Martin, Daniel McDonald, Lauren J McIver, Alexey V Melnik, Jessica L Metcalf, Sydney C Morgan, Jamie Morton, Ahmad Turan Naimey, Jose A Navas-Molina, Louis Felix Nothias, Stephanie B Orchanian, Talima Pearson, Samuel L Peoples, Daniel Petras, Mary Lai Preuss, Elmar Pruesse, Lasse Buur Rasmussen, Adam Rivers, Michael S Robeson, Patrick Rosenthal, Nicola Segata, Michael Shaffer, Arron Shiffer, Rashmi Sinha, Se Jin Song, John R Spear, Austin D Swafford, Luke R Thompson, Pedro J Torres, Pauline Trinh, Anupriya Tripathi, Peter J Turnbaugh, Sabah Ul-Hasan, Justin JJ van der Hooft, Fernando Vargas, Yoshiki Vázquez-Baeza, Emily Vogtmann, Max von Hippel, William Walters, Yunhu Wan, Mingxun Wang, Jonathan Warren, Kyle C Weber, Chase HD Williamson, Amy D Willis, Zhenjiang Zech Xu, Jesse R Zaneveld, Yilong Zhang, Qiyun Zhu, Rob Knight and J Gregory Caporaso: Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. (PubMed,eprint) Nature Biotechnology 37:852 - 857 (2019)
Registry entries: Bio.tools  OMICtools  Bioconda 
Topics: Microbial ecology
Qtltools
Tool set for molecular QTL discovery and analysis
Versions of package qtltools
ReleaseVersionArchitectures
stretch1.1+dfsg-1amd64,arm64,armel,i386,mips64el,mipsel,ppc64el
buster1.1+dfsg-3amd64,arm64,armel,armhf,mips,mips64el,mipsel,ppc64el,s390x
sid1.2+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.2+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 3 users (8 upd.)*
Versions and Archs
License: DFSG free
Git

QTLtools is a tool set for molecular Quantitative Trait Loci (QTL) discovery and analysis. It allows user to go from the raw sequence data to collection of molecular QTL in few easy-to-perform steps. QTLtools contains multiple methods to prepare the data, to discover proximal and distal molecular QTL and to finally integrate them with GWAS variants and functional annotations of the genome.

The package is enhanced by the following packages: qtltools-example
Please cite: Olivier Delaneau, Halit Ongen, Andrew A. Brown, Alexandre Fort, Nikolaos I. Panousis and Emmanouil T. Dermitzakis: A complete tool set for molecular QTL discovery and analysis. (eprint) Nature Communications (2017)
Registry entries: Bio.tools  OMICtools 
Quicktree
Neighbor-Joining algorithm for phylogenies
Versions of package quicktree
ReleaseVersionArchitectures
bullseye2.5-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid2.5-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 0 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

QuickTree is an efficient implementation of the Neighbor-Joining algorithm (PMID: 3447015), capable of reconstructing phylogenies from huge alignments in time less than the age of the universe.

QuickTree accepts both distance matrix and multiple-sequence-aligment inputs. The former should be in PHYLIP format. The latter should be in Stockholm format, which is the native alignment format for the Pfam database. Alignments in various formats can be converted to Stockholm format with the sreformat program, which is part of the HMMer package (hmmer.org).

The tress are written to stdout, in the Newick/New-Hampshire format use by PHYLIP and many other programs

Quorum
QUality Optimized Reads of genomic sequences
Versions of package quorum
ReleaseVersionArchitectures
sid1.1.1-2amd64
bullseye1.1.1-2amd64
buster1.1.1-2amd64,arm64,mips64el,ppc64el
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

QuorUM enables to obtain trimmed and error-corrected reads that result in assemblies with longer contigs and fewer errors. QuorUM provides best performance compared to other published error correctors in several metrics. QuorUM is efficiently implemented making use of current multi- core computing architectures and it is suitable for large data sets (1 billion bases checked and corrected per day per core). The third-party assembler (SOAPdenovo) benefits significantly from using QuorUM error- corrected reads. QuorUM error corrected reads result in a factor of 1.1 to 4 improvement in N50 contig size compared to using the original reads with SOAPdenovo for the data sets investigated.

Please cite: Guillaume Marçais, James A. Yorke and Aleksey Zimin: QuorUM: An Error Corrector for Illumina Reads. (PubMed,eprint) PLoS One 10(6):e0130821 (2015)
Registry entries: SciCrunch  OMICtools 
R-bioc-annotate
BioConductor annotation for microarrays
Versions of package r-bioc-annotate
ReleaseVersionArchitectures
sid1.66.0+dfsg-1all
buster1.60.0+dfsg-1all
stretch1.52.1+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.66.0+dfsg-1all
Popcon: 27 users (18 upd.)*
Versions and Archs
License: DFSG free
Git

This BioConductor module provides methods for annotation for microarrays

Registry entries: Bioconda 
R-bioc-biostrings
GNU R string objects representing biological sequences
Versions of package r-bioc-biostrings
ReleaseVersionArchitectures
buster2.50.2-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch2.42.1-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie2.32.1-1amd64,armel,armhf,i386
sid2.56.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye2.56.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 29 users (15 upd.)*
Versions and Archs
License: DFSG free
Git

Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or set of sequences.

Registry entries: Bio.tools  OMICtools  Bioconda 
R-bioc-bitseq
transcript expression inference and analysis for RNA-seq data
Versions of package r-bioc-bitseq
ReleaseVersionArchitectures
sid1.32.0+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.32.0+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.26.1+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 4 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

The BitSeq package is targeted for transcript expression analysis and differential expression analysis of RNA-seq data in two stage process. In the first stage it uses Bayesian inference methodology to infer expression of individual transcripts from individual RNA-seq experiments. The second stage of BitSeq embraces the differential expression analysis of transcript expression. Providing expression estimates from replicates of multiple conditions, Log-Normal model of the estimates is used for inferring the condition mean transcript expression and ranking the transcripts based on the likelihood of differential expression.

Please cite: Peter Glaus, Antti Honkela and Magnus Rattray: Identifying differentially expressed transcripts from RNA-seq data with biological variation. (PubMed,eprint) Bioinformatics 28(13):1721–1728 (2012)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
R-bioc-cner
CNE Detection and Visualization
Versions of package r-bioc-cner
ReleaseVersionArchitectures
buster1.18.1+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.24.0+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.24.0+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 4 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

Large-scale identification and advanced visualization of sets of conserved noncoding elements.

Registry entries: Bio.tools  OMICtools  Bioconda 
R-bioc-cummerbund
tool for analysis of Cufflinks RNA-Seq output
Versions of package r-bioc-cummerbund
ReleaseVersionArchitectures
stretch2.16.0-2all
jessie2.6.1-1all
wheezy1.2.0-1all
buster2.24.0-2all
bullseye2.30.0-1all
sid2.30.0-1all
Popcon: 8 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

Allows for persistent storage, access, exploration, and manipulation of Cufflinks high-throughput sequencing data. In addition, provides numerous plotting functions for commonly used visualizations.

Please cite: L. Goff and C. Trapnell: cummeRbund: Analysis, exploration, manipulation, and visualization of Cufflinks high-throughput sequencing data (2012)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
R-bioc-deseq2
R package for RNA-Seq Differential Expression Analysis
Versions of package r-bioc-deseq2
ReleaseVersionArchitectures
sid1.28.1+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.22.2+dfsg-1amd64,arm64,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch1.14.1-1amd64,arm64,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.28.1+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 11 users (9 upd.)*
Versions and Archs
License: DFSG free
Git

Differential gene expression analysis based on the negative binomial distribution. Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.

Please cite: Michael I Love, Wolfgang Huber and Simon Anders: Moderated estimation of fold change and dispersion for {RNA}-seq data with {DESeq}2. (eprint) Genome Biol 15(12) (2014)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
R-bioc-dnacopy
R package: DNA copy number data analysis
Versions of package r-bioc-dnacopy
ReleaseVersionArchitectures
bullseye1.62.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.56.0-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.62.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.48.0-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 11 users (10 upd.)*
Versions and Archs
License: DFSG free
Git

Implements the circular binary segmentation (CBS) algorithm to segment DNA copy number data and identify genomic regions with abnormal copy number.

This package is for analyzing array DNA copy number data, which is usually (but not always) called array Comparative Genomic Hybridization (array CGH) data It implements a methodology for finding change-points in these data which are points after which the (log) test over reference ratios have changed location. This model is that the change-points correspond to positions where the underlying DNA copy number has changed. Therefore, change-points can be used to identify regions of gained and lost copy number. Also provided is a function for making relevant plots of these data.

Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
R-bioc-ebseq
R package for RNA-Seq Differential Expression Analysis
Versions of package r-bioc-ebseq
ReleaseVersionArchitectures
stretch1.14.0-1all
bullseye1.28.0-1all
buster1.22.1-2all
sid1.28.0-1all
Popcon: 6 users (7 upd.)*
Versions and Archs
License: DFSG free
Git

r-bioc-ebseq is an R package for identifying genes and isoforms differentially expressed (DE) across two or more biological conditions in an RNA-seq experiment.

Please cite: Ning Leng, John A. Dawson, James A. Thomson, Victor Ruotti, Anna I. Rissman, Bart M. G. Smits, Jill D. Haag, Michael N. Gould, Ron M. Stewart and Christina Kendziorski: EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. (eprint) Bioinformatics 29(8):1035-1043 (2013)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
R-bioc-edger
Empirical analysis of digital gene expression data in R
Versions of package r-bioc-edger
ReleaseVersionArchitectures
sid3.30.3+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
jessie3.8.2+dfsg-1amd64,armel,armhf,i386
wheezy2.6.1~dfsg-1all
stretch3.14.0+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye3.30.3+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 4 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

Bioconductor package for differential expression analysis of whole transcriptome sequencing (RNA-seq) and digital gene expression profiles with biological replication. It uses empirical Bayes estimation and exact tests based on the negative binomial distribution. It is also useful for differential signal analysis with other types of genome-scale count data.

Please cite: Mark D. Robinson, Davis J. McCarthy and Gordon K. Smyth: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. (PubMed,eprint) Bioinformatics 26,:139-140 (2010)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
R-bioc-genefilter
methods for filtering genes from microarray experiments
Versions of package r-bioc-genefilter
ReleaseVersionArchitectures
sid1.70.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.64.0-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.70.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.56.0-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 28 users (17 upd.)*
Versions and Archs
License: DFSG free
Git

This BioConductor module provides methods for filtering genes from microarray experiments. It contains some basic functions for filtering genes.

Registry entries: Bio.tools  OMICtools  Bioconda 
R-bioc-geneplotter
R package of functions for plotting genomic data
Versions of package r-bioc-geneplotter
ReleaseVersionArchitectures
bullseye1.66.0-1all
stretch1.52.0-2all
sid1.66.0-1all
buster1.60.0-1all
Popcon: 11 users (9 upd.)*
Versions and Archs
License: DFSG free
Git

geneplotter contains plotting functions for microarrays

Registry entries: Bio.tools  OMICtools  Bioconda 
R-bioc-geoquery
Get data from NCBI Gene Expression Omnibus (GEO)
Versions of package r-bioc-geoquery
ReleaseVersionArchitectures
bullseye2.56.0+dfsg-1all
sid2.56.0+dfsg-1all
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

The NCBI Gene Expression Omnibus (GEO) is a public repository of microarray data. Given the rich and varied nature of this resource, it is only natural to want to apply BioConductor tools to these data. GEOquery is the bridge between GEO and BioConductor.

Please cite: Sean Davis and Paul Meltzer: GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor Bioinformatics 14,:1846-1847, (2007,)
Registry entries: Bio.tools  OMICtools  Bioconda 
R-bioc-gviz
Plotting data and annotation information along genomic coordinates
Versions of package r-bioc-gviz
ReleaseVersionArchitectures
stretch1.18.1-1all
jessie1.8.4-1all
sid1.32.0+dfsg-2all
buster1.26.4-1all
bullseye1.32.0+dfsg-2all
Popcon: 10 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

Genomic data analyses requires integrated visualization of known genomic information and new experimental data. Gviz uses the biomaRt and the rtracklayer packages to perform live annotation queries to Ensembl and UCSC and translates this to e.g. gene/transcript structures in viewports of the grid graphics package. This results in genomic information plotted together with your data.

Please cite: Michael Lawrence, Robert Gentleman and "Vincent Carey: rtracklayer: an R package for interfacing with genome browsers. (PubMed,eprint) Bioinformatics 25(14):1841-1842 (2009)
Registry entries: Bio.tools  OMICtools  Bioconda 
R-bioc-htsfilter
GNU R filter replicated high-throughput transcriptome sequencing data
Versions of package r-bioc-htsfilter
ReleaseVersionArchitectures
sid1.28.0-1all
bullseye1.28.0-1all
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

This package implements a filtering procedure for replicated transcriptome sequencing data based on a global Jaccard similarity index in order to identify genes with low, constant levels of expression across one or more experimental conditions.

R-bioc-impute
Imputation for microarray data
Versions of package r-bioc-impute
ReleaseVersionArchitectures
buster1.56.0-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.62.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.62.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 10 users (8 upd.)*
Versions and Archs
License: DFSG free
Git

R package which provide a function to perform imputation for microarray data (currently KNN only).

Registry entries: Bio.tools  OMICtools  Bioconda 
R-bioc-limma
linear models for microarray data
Versions of package r-bioc-limma
ReleaseVersionArchitectures
bullseye3.44.2+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid3.44.3+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster3.38.3+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie3.22.1+dfsg-1amd64,armel,armhf,i386
wheezy3.12.0~dfsg-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
stretch3.30.8+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 17 users (15 upd.)*
Versions and Archs
License: DFSG free
Git

Microarrays are microscopic plates with carefully arranged short DNA strands and/or chemically prepared surfaces to which other DNA preferably binds. The amount of DNA binding at different locations of these chips, typically determined by a fluorescent dye, is to be interpreted. The technology is typically used with DNA that is derived from RNA, i.e to determine the activity of a gene and/or its splice variants. But the technology is also used to determine sequence variations in genomic DNA.

This Bioconductor package supports the analysis of gene expression microarray data, especially the use of linear models for analysing designed experiments and the assessment of differential expression. The package includes pre-processing capabilities for two-colour spotted arrays. The differential expression methods apply to all array platforms and treat Affymetrix, single channel and two channel experiments in a unified way.

Please cite: Gordon K. Smyth: Limma: linear models for microarray data. (eprint) :397-420 (2005)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
R-bioc-mergeomics
Integrative network analysis of omics data
Versions of package r-bioc-mergeomics
ReleaseVersionArchitectures
stretch1.2.0-1all
buster1.10.0-1all
sid1.16.0-1all
bullseye1.16.0-1all
Popcon: 6 users (7 upd.)*
Versions and Archs
License: DFSG free
Git

The Mergeomics pipeline serves as a flexible framework for integrating multidimensional omics-disease associations, functional genomics, canonical pathways and gene-gene interaction networks to generate mechanistic hypotheses. It includes two main parts: 1) Marker set enrichment analysis (MSEA); 2) Weighted Key Driver Analysis (wKDA).

Please cite: Le Shu, Yuqi Zhao, Zeyneb Kurt, Sean Geoffrey Byars, Taru Tukiainen, Johannes Kettunen, Luz D. Orozco, Matteo Pellegrini, Aldons J. Lusis, Samuli Ripatti, Bin Zhang, Michael Inouye, Ville-Petteri Mäkinen and Xia Yang: Mergeomics: multidimensional data integration to identify pathogenic perturbations to biological systems. (eprint) BMC Genomics (2016)
Registry entries: Bio.tools  OMICtools 
R-bioc-metagenomeseq
GNU R statistical analysis for sparse high-throughput sequencing
Versions of package r-bioc-metagenomeseq
ReleaseVersionArchitectures
buster1.24.1-1all
sid1.30.0-1all
bullseye1.30.0-1all
stretch1.16.0-2all
Popcon: 6 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

MetagenomeSeq is designed to determine features (be it Operational Taxanomic Unit (OTU), species, etc.) that are differentially abundant between two or more groups of multiple samples. metagenomeSeq is designed to address the effects of both normalization and under-sampling of microbial communities on disease association detection and the testing of feature correlations.

Registry entries: Bio.tools  OMICtools  Bioconda 
R-bioc-mofa
Multi-Omics Factor Analysis (MOFA)
Versions of package r-bioc-mofa
ReleaseVersionArchitectures
sid1.4.0+dfsg-1all
bullseye1.4.0+dfsg-1all
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Multi-Omics Factor Analysis: an unsupervised framework for the integration of multi-omics data sets.

Please cite: Ricard Argelaguet, Britta Velten, Damien Arnol, Sascha Dietrich, Thorsten Zenz, John C Marioni, Florian Buettner, Wolfgang Huber and Oliver Stegle: Link to publication Mol Syst Biol 14:e8124 (2018)
Registry entries: OMICtools  Bioconda 
R-bioc-multiassayexperiment
Software for integrating multi-omics experiments in BioConductor
Versions of package r-bioc-multiassayexperiment
ReleaseVersionArchitectures
bullseye1.14.0+dfsg-1all
sid1.14.0+dfsg-1all
Popcon: 1 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

MultiAssayExperiment harmonizes data management of multiple assays performed on an overlapping set of specimens. It provides a familiar Bioconductor user experience by extending concepts from SummarizedExperiment, supporting an open-ended mix of standard data classes for individual assays, and allowing subsetting by genomic ranges or rownames.

Registry entries: Bio.tools  OMICtools  Bioconda 
R-bioc-mutationalpatterns
GNU R comprehensive genome-wide analysis of mutational processes
Versions of package r-bioc-mutationalpatterns
ReleaseVersionArchitectures
sid2.0.0-2all
bullseye2.0.0-2all
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

This BioConductor package provides an extensive toolset for the characterization and visualization of a wide range of mutational patterns in base substitution catalogs.

R-bioc-pcamethods
BioConductor collection of PCA methods
Versions of package r-bioc-pcamethods
ReleaseVersionArchitectures
sid1.80.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.74.0-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.80.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 4 users (7 upd.)*
Versions and Archs
License: DFSG free
Git

Provides Bayesian PCA, Probabilistic PCA, Nipals PCA, Inverse Non-Linear PCA and the conventional SVD PCA. A cluster based method for missing value estimation is included for comparison. BPCA, PPCA and NipalsPCA may be used to perform PCA on incomplete data as well as for accurate missing value estimation. A set of methods for printing and plotting the results is also provided. All PCA methods make use of the same data structure (pcaRes) to provide a common interface to the PCA results. Initiated at the Max-Planck Institute for Molecular Plant Physiology, Golm, Germany.

Please cite: Wolfram Stacklies, Henning Redestig, Matthias Scholz, Dirk Walther and Joachim Selbig: pcaMethods — a bioconductor package providing PCA methods for incomplete data. (PubMed,eprint) Bioinformatics 23(9):1164–1167 (2007)
Registry entries: Bio.tools  OMICtools  Bioconda 
R-bioc-phyloseq
GNU R handling and analysis of high-throughput microbiome census data
Versions of package r-bioc-phyloseq
ReleaseVersionArchitectures
buster1.26.1+dfsg-1all
sid1.32.0+dfsg-1all
stretch1.19.1-2all
bullseye1.32.0+dfsg-1all
Popcon: 9 users (10 upd.)*
Versions and Archs
License: DFSG free
Git

The Bioconductor module phyloseq provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data.

Please cite: Paul J. McMurdie and Susan Holmes: phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 8(4):e61217 (2013)
Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
R-bioc-rtracklayer
GNU R interface to genome browsers and their annotation tracks
Versions of package r-bioc-rtracklayer
ReleaseVersionArchitectures
buster1.42.1-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie1.24.2-1amd64,armel,armhf,i386
sid1.48.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.48.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.34.1-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 16 users (7 upd.)*
Versions and Archs
License: DFSG free
Git

Extensible framework for interacting with multiple genome browsers (currently UCSC built-in) and manipulating annotation tracks in various formats (currently GFF, BED, bedGraph, BED15, WIG, BigWig and 2bit built-in). The user may export/import tracks to/from the supported browsers, as well as query and modify the browser state, such as the current viewport.

Please cite: Michael Lawrence, Robert Gentleman and "Vincent Carey: rtracklayer: an R package for interfacing with genome browsers. (PubMed,eprint) Bioinformatics 25(14):1841-1842 (2009)
Registry entries: Bio.tools  OMICtools  Bioconda 
R-bioc-tfbstools
GNU R Transcription Factor Binding Site (TFBS) Analysis
Versions of package r-bioc-tfbstools
ReleaseVersionArchitectures
buster1.20.0+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.26.0+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.26.0+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 4 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

TFBSTools is a package for the analysis and manipulation of transcription factor binding sites. It includes matrices conversion between Position Frequency Matirx (PFM), Position Weight Matirx (PWM) and Information Content Matrix (ICM). It can also scan putative TFBS from sequence/alignment, query JASPAR database and provides a wrapper of de novo motif discovery software.

Please cite: Ge Tan and Boris Lenhard: TFBSTools: an R/bioconductor package for transcription factor binding site analysis. (PubMed,eprint) Bioinformatics 32(10):1555–1556 (2016)
Registry entries: Bio.tools  OMICtools  Bioconda 
R-cran-adegenet
GNU R exploratory analysis of genetic and genomic data
Versions of package r-cran-adegenet
ReleaseVersionArchitectures
sid2.1.3-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster2.1.1-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch2.0.1-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye2.1.3-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 13 users (9 upd.)*
Versions and Archs
License: DFSG free
Git

Toolset for the exploration of genetic and genomic data. Adegenet provides formal (S4) classes for storing and handling various genetic data, including genetic markers with varying ploidy and hierarchical population structure ('genind' class), alleles counts by populations ('genpop'), and genome-wide SNP data ('genlight'). It also implements original multivariate methods (DAPC, sPCA), graphics, statistical tests, simulation tools, distance and similarity measures, and several spatial methods. A range of both empirical and simulated datasets is also provided to illustrate various methods.

Please cite: Thibaut Jombart: adegenet: a R package for the multivariate analysis of genetic markers. (PubMed,eprint) Bioinformatics 24(11):1403-5 (2008)
Registry entries: SciCrunch  OMICtools 
R-cran-adephylo
GNU R exploratory analyses for the phylogenetic comparative method
Versions of package r-cran-adephylo
ReleaseVersionArchitectures
stretch1.1-10-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.1-11-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.1-11-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.1-11-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 11 users (7 upd.)*
Versions and Archs
License: DFSG free
Git

This GNU R package provides multivariate tools to analyze comparative data, i.e. a phylogeny and some traits measured for each taxa.

Please cite: Thibaut Jombart, François Balloux and Stéphane Dray: adephylo: new tools for investigating the phylogenetic signal in biological traits. (PubMed,eprint) Bioinformatics 26(15):1907-1909 (2010)
Registry entries: OMICtools  Bioconda 
R-cran-alakazam
Immunoglobulin Clonal Lineage and Diversity Analysis
Versions of package r-cran-alakazam
ReleaseVersionArchitectures
sid1.0.1-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.0.1-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster0.2.11-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 6 users (6 upd.)*
Versions and Archs
License: DFSG free
Git

Alakazam is part of the Immcantation analysis framework for Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) and provides a set of tools to investigate lymphocyte receptor clonal lineages, diversity, gene usage, and other repertoire level properties, with a focus on high-throughput immunoglobulin (Ig) sequencing.

Alakazam serves five main purposes:

  • Providing core functionality for other R packages in the Immcantation framework. This includes common tasks such as file I/O, basic DNA sequence manipulation, and interacting with V(D)J segment and gene annotations.
  • Providing an R interface for interacting with the output of the pRESTO and Change-O tool suites.
  • Performing lineage reconstruction on clonal populations of Ig sequences and analyzing the topology of the resultant lineage trees.
  • Performing clonal abundance and diversity analysis on lymphocyte repertoires.
  • Performing physicochemical property analyses of lymphocyte receptor sequences.
Please cite: Namita T. Gupta, Jason A. Vander Heiden, Mohamed Uduman, Daniel Gadala-Maria, Gur Yaari and Steven H. Kleinstein: Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. (eprint) 31:3356–3358 (2017)
Registry entries: OMICtools 
R-cran-ape
GNU R package for Analyses of Phylogenetics and Evolution
Versions of package r-cran-ape
ReleaseVersionArchitectures
stretch4.0-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie3.1-4-1amd64,armel,armhf,i386
buster5.2-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye5.4-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid5.4-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 30 users (29 upd.)*
Versions and Archs
License: DFSG free
Git

This package provides functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ, BIONJ, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.

Please cite: Emmanuel Paradis and Klaus Schliep: ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics (2018)
Registry entries: OMICtools 
R-cran-distory
GNU R distance between phylogenetic histories
Versions of package r-cran-distory
ReleaseVersionArchitectures
buster1.4.3-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch1.4.2-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.4.4-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.4.4-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 12 users (9 upd.)*
Versions and Archs
License: DFSG free
Git

This GNU R package enables calculation of geodesic distance between phylogenetic trees and associated functions.

Registry entries: OMICtools 
R-cran-kaos
Encoding of Sequences Based on Frequency Matrix Chaos
Versions of package r-cran-kaos
ReleaseVersionArchitectures
bullseye0.1.2-2all
sid0.1.2-2all
Popcon: 3 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

Sequences encoding by using the chaos game representation. Löchel et al. (2019) .

Please cite: Hannah F. Löchel, Dominic Eger, Theodor Sperlea and Dominik Heider: Deep learning on chaos game representation for proteins. Bioinformatics (2019)
Registry entries: Bioconda 
R-cran-metamix
GNU R bayesian mixture analysis for metagenomic community profiling
Versions of package r-cran-metamix
ReleaseVersionArchitectures
buster0.3-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye0.3-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid0.3-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 9 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

Resolves complex metagenomic mixtures by analysing deep sequencing data, using a mixture model based approach. The use of parallel Monte Carlo Markov chains for the exploration of the species space enables the identification of the set of species more likely to contribute to the mixture.

Please cite: Sofia Morfopoulou and Vincent Plagnol: Bayesian mixture analysis for metagenomic community profiling. (eprint) Bioinformatics 31(18):2930-2938 (2015)
Registry entries: Bio.tools  OMICtools 
R-cran-phangorn
GNU R package for phylogenetic analysis
Versions of package r-cran-phangorn
ReleaseVersionArchitectures
bullseye2.5.5-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster2.4.0-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch2.1.1-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid2.5.5-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 15 users (9 upd.)*
Versions and Archs
License: DFSG free
Git

phangorn is a tool for reconstructing phylogenies, using distance-based methods, maximum parsimony or maximum likelihood, and performing Hadamard conjugation. It also offers functions for comparing trees, phylogenetic models or splits, simulating character data and performing congruence analysis.

Please cite: K.P. Schliep: phangorn: phylogenetic analysis in R. (PubMed) Bioinformatics 27(4):592-593 (2011)
Registry entries: OMICtools 
R-cran-phytools
GNU R phylogenetic tools for comparative biology
Versions of package r-cran-phytools
ReleaseVersionArchitectures
buster0.6-60-1all
sid0.7-20-2all
bullseye0.7-20-2all
upstream0.7-47
Popcon: 8 users (8 upd.)*
Newer upstream!
License: DFSG free
Git

A wide range of functions for phylogenetic analysis. Functionality is concentrated in phylogenetic comparative biology, but also includes a diverse array of methods for visualizing, manipulating, reading or writing, and even inferring phylogenetic trees and data. Included among the functions in phylogenetic comparative biology are various for ancestral state reconstruction, model-fitting, simulation of phylogenies and data, and multivariate analysis. There are a broad range of plotting methods for phylogenies and comparative data which include, but are not restricted to, methods for mapping trait evolution on trees, for projecting trees into phenotypic space or a geographic map, and for visualizing correlated speciation between trees. Finally, there are a number of functions for reading, writing, analyzing, inferring, simulating, and manipulating phylogenetic trees and comparative data not covered by other packages. For instance, there are functions for randomly or non-randomly attaching species or clades to a phylogeny, for estimating supertrees or consensus phylogenies from a set, for simulating trees and phylogenetic data under a range of models, and for a wide variety of other manipulations and analyses that phylogenetic biologists might find useful in their research.

Please cite: Liam J. Revell: phytools: an R package for phylogenetic comparative biology (and other things). (eprint) Methods in Ecology and Evolution 3(2):217-223 (2012)
Registry entries: SciCrunch  OMICtools 
R-cran-pscbs
R package: Analysis of Parent-Specific DNA Copy Numbers
Versions of package r-cran-pscbs
ReleaseVersionArchitectures
buster0.64.0-1all
stretch0.62.0-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye0.65.0-3all
sid0.65.0-3all
Popcon: 10 users (8 upd.)*
Versions and Archs
License: DFSG free
Git

Segmentation of allele-specific DNA copy number data and detection of regions with abnormal copy number within each parental chromosome. Both tumor-normal paired and tumoronly analyses are supported.

Please cite: Adam B. Olshen, Henrik Bengtsson, Pierre Neuvial, Paul T. Spellman, Richard A. Olshen and Venkatraman E. Seshan: Parent-specific copy number in paired tumor-normal studies using circular binary segmentation. (PubMed,eprint) Bioinformatics 27(15):2038-2046 (2011)
Registry entries: SciCrunch  OMICtools  Bioconda 
R-cran-rotl
GNU R interface to the 'Open Tree of Life' API
Versions of package r-cran-rotl
ReleaseVersionArchitectures
sid3.0.10-2all
bullseye3.0.10-2all
stretch3.0.1-1all
buster3.0.6-1all
Popcon: 14 users (10 upd.)*
Versions and Archs
License: DFSG free
Git

An interface to the 'Open Tree of Life' API to retrieve phylogenetic trees, information about studies used to assemble the synthetic tree, and utilities to match taxonomic names to 'Open Tree identifiers'. The 'Open Tree of Life' aims at assembling a comprehensive phylogenetic tree for all named species.

Registry entries: OMICtools 
R-cran-samr
GNU R significance analysis of microarrays
Versions of package r-cran-samr
ReleaseVersionArchitectures
bullseye3.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid3.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster3.0-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 9 users (6 upd.)*
Versions and Archs
License: DFSG free
Git

This GNU R package provides significance analysis of microarrays. A microarray is a multiplex lab-on-a-chip. It is a 2D array on a solid substrate (usually a glass slide or silicon thin-film cell) that assays large amounts of biological material using high-throughput screening miniaturized, multiplexed and parallel processing and detection methods.

This package helps analysing this kind of microarrays.

Registry entries: Bio.tools  SciCrunch  OMICtools  Bioconda 
R-cran-sdmtools
Species Distribution Modelling Tools
Versions of package r-cran-sdmtools
ReleaseVersionArchitectures
sid1.1-221.2-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.1-221-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.1-221.2-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 10 users (6 upd.)*
Versions and Archs
License: DFSG free
Git

This package provides a set of tools for post processing the outcomes of species distribution modeling exercises. It includes novel methods for comparing models and tracking changes in distributions through time. It further includes methods for visualizing outcomes, selecting thresholds, calculating measures of accuracy and landscape fragmentation statistics, etc.

This package was made possible in part by financial support from the Australian Research Council & ARC Research Network for Earth System Science.

Registry entries: OMICtools 
R-cran-seqinr
GNU R biological sequences retrieval and analysis
Versions of package r-cran-seqinr
ReleaseVersionArchitectures
sid3.6-1-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch3.3-3-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye3.6-1-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster3.4-5-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 19 users (11 upd.)*
Versions and Archs
License: DFSG free
Git

Exploratory data analysis and data visualization for biological sequence (DNA and protein) data. Includes also utilities for sequence data management under the ACNUC system.

Registry entries: OMICtools 
R-cran-seurat
Tools for Single Cell Genomics
Versions of package r-cran-seurat
ReleaseVersionArchitectures
bullseye3.1.5-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid3.1.5-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. See Satija R, Farrell J, Gennert D, et al (2015) , Macosko E, Basu A, Satija R, et al (2015) , and Butler A and Satija R (2017) for more details.

Please cite: Rahul Satija, Jeffrey A. Farrell, David Gennert, Alexander F. Schier and Aviv Regev: Spatial reconstruction of single-cell gene expression data. (PubMed) Nature Biotechnology 33:495–502 (2015)
Registry entries: OMICtools  Bioconda 
R-cran-shazam
Immunoglobulin Somatic Hypermutation Analysis
Versions of package r-cran-shazam
ReleaseVersionArchitectures
sid1.0.0-1all
buster0.1.11-1all
bullseye1.0.0-1all
Popcon: 6 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

Provides a computational framework for Bayesian estimation of antigen-driven selection in immunoglobulin (Ig) sequences, providing an intuitive means of analyzing selection by quantifying the degree of selective pressure. Also provides tools to profile mutations in Ig sequences, build models of somatic hypermutation (SHM) in Ig sequences, and make model-dependent distance comparisons of Ig repertoires.

SHazaM is part of the Immcantation analysis framework for Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) and provides tools for advanced analysis of somatic hypermutation (SHM) in immunoglobulin (Ig) sequences. Shazam focuses on the following analysis topics:

  • Quantification of mutational load SHazaM includes methods for determine the rate of observed and expected mutations under various criteria. Mutational profiling criteria include rates under SHM targeting models, mutations specific to CDR and FWR regions, and physicochemical property dependent substitution rates.
  • Statistical models of SHM targeting patterns Models of SHM may be divided into two independent components: 1) a mutability model that defines where mutations occur and 2) a nucleotide substitution model that defines the resulting mutation. Collectively these two components define an SHM targeting model. SHazaM provides empirically derived SHM 5-mer context mutation models for both humans and mice, as well tools to build SHM targeting models from data.
  • Analysis of selection pressure using BASELINe The Bayesian Estimation of Antigen-driven Selection in Ig Sequences (BASELINe) method is a novel method for quantifying antigen-driven selection in high-throughput Ig sequence data. BASELINe uses SHM targeting models can be used to estimate the null distribution of expected mutation frequencies, and provide measures of selection pressure informed by known AID targeting biases.
  • Model-dependent distance calculations SHazaM provides methods to compute evolutionary distances between sequences or set of sequences based on SHM targeting models. This information is particularly useful in understanding and defining clonal relationships.
Please cite: Namita T. Gupta, Jason A. Vander Heiden, Mohamed Uduman, Daniel Gadala-Maria, Gur Yaari and Steven H. Kleinstein: Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data.. (PubMed,eprint) Bioinformatics 31(20):3356-3358 (2015)
Registry entries: OMICtools  Bioconda 
R-cran-spp
GNU R ChIP-seq processing pipeline
Versions of package r-cran-spp
ReleaseVersionArchitectures
bullseye1.16.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.15.5-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.16.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 8 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

R package for anlaysis of ChIP-seq and other functional sequencing data

  • Assess overall DNA-binding signals in the data and select appropriate quality of tag alignment.
  • Discard or restrict positions with abnormally high number of tags.
  • Calculate genome-wide profiles of smoothed tag density and save them in WIG files for viewing in other browsers.
  • Calculate genome-wide profiles providing conservative statistical estimates of fold enrichment ratios along the genome. These can be exported for browser viewing, or thresholded to determine regions of significant enrichment/depletion.
  • Determine statistically significant point binding positions
  • Assess whether the set of point binding positions detected at a current sequencing depth meets saturation criteria, and if does not, estimate what sequencing depth would be required to do so.
Please cite: Peter V Kharchenko, Michael Y Tolstorukov and Peter J Park: Design and analysis of ChIP-seq experiments for DNA-binding proteins. (PubMed) Nature biotechnology 26(12):1351–1359 (2008)
R-cran-tcr
Advanced Data Analysis of Immune Receptor Repertoires
Versions of package r-cran-tcr
ReleaseVersionArchitectures
buster2.2.3-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid2.3.2+ds-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye2.3.2+ds-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 7 users (9 upd.)*
Versions and Archs
License: DFSG free
Git

Cells of the immune system are the grand exception to the rule that all cells of an individuum have (mostly exact) copies of the same DNA. B cells (which produce antibodies) and T cells (which communicate with cells) however have a section of their DNA with genes of the groups V, D and J that are reorganised within the genomic DNA to provide the flexibility to deal with yet unknown pathogens.

This package provides a platform for the advanced analysis of T cell receptor repertoire data and its visualisations.

Caveat: This package is soon to be replaced by http://github.com/immunomind/immunarch which is not yet available as a Debian package.

Please cite: Vadim I. Nazarov, Mikhail V. Pogorelyy, Ekaterina A. Komech, Ivan V. Zvyagin, Dmitry A. Bolotin, Mikhail Shugay, Dmitry M. Chudakov, Yury B. Lebedev and Ilgar Z. Mamedov: tcR: an R package for T cell receptor repertoire advanced data analysis. (eprint) BMC Bioinformatics 16:175 (2015)
Registry entries: Bio.tools  OMICtools  Bioconda 
R-cran-tigger
Infers new Immunoglobulin alleles from Rep-Seq Data
Versions of package r-cran-tigger
ReleaseVersionArchitectures
sid1.0.0-1all
buster0.3.1-1all
bullseye1.0.0-1all
Popcon: 6 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

Summary: Infers the V genotype of an individual from immunoglobulin (Ig) repertoire-sequencing (Rep-Seq) data, including detection of any novel alleles. This information is then used to correct existing V allele calls from among the sample sequences.

High-throughput sequencing of B cell immunoglobulin receptors is providing unprecedented insight into adaptive immunity. A key step in analyzing these data involves assignment of the germline V, D and J gene segment alleles that comprise each immunoglobulin sequence by matching them against a database of known V(D)J alleles. However, this process will fail for sequences that utilize previously undetected alleles, whose frequency in the population is unclear.

TIgGER is a computational method that significantly improves V(D)J allele assignments by first determining the complete set of gene segments carried by an individual (including novel alleles) from V(D)J-rearrange sequences. TIgGER can then infer a subject’s genotype from these sequences, and use this genotype to correct the initial V(D)J allele assignments.

The application of TIgGER continues to identify a surprisingly high frequency of novel alleles in humans, highlighting the critical need for this approach. TIgGER, however, can and has been used with data from other species.

Core Abilities:

  • Detecting novel alleles
  • Inferring a subject’s genotype
  • Correcting preliminary allele calls

Required Input

  • A table of sequences from a single individual, with columns containing the following:
  • V(D)J-rearranged nucleotide sequence (in IMGT-gapped format)
  • Preliminary V allele calls
  • Preliminary J allele calls
  • Length of the junction region
  • Germline Ig sequences in IMGT-gapped fasta format (e.g., as those downloaded from IMGT/GENE-DB)

The former can be created through the use of IMGT/HighV-QUEST and Change-O.

Please cite: Namita T. Gupta, Jason A. Vander Heiden, Mohamed Uduman, Daniel Gadala-Maria, Gur Yaari and Steven H. Kleinstein: Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. (eprint) 31:3356–3358 (2017)
Registry entries: OMICtools  Bioconda 
R-cran-treescape
GNU R Statistical Exploration of Landscapes of Phylogenetic Trees
Versions of package r-cran-treescape
ReleaseVersionArchitectures
buster1.10.18+dfsg-1amd64,arm64,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.10.18+dfsg-2amd64,arm64,armhf,i386,mips64el,mipsel,s390x
sid1.10.18+dfsg-2amd64,arm64,armhf,i386,mips64el,mipsel,s390x
stretch1.10.18-6amd64,arm64,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 11 users (7 upd.)*
Versions and Archs
License: DFSG free
Git

This GNU R package provides tools for the exploration of distributions of phylogenetic trees. This package includes a shiny interface which can be started from R using 'treescapeServer()'.

Registry entries: OMICtools 
R-cran-tsne
t-distributed stochastic neighbor embedding for R (t-SNE)
Versions of package r-cran-tsne
ReleaseVersionArchitectures
bullseye0.1-3-3all
sid0.1-3-3all
Popcon: 1 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

A "pure R" implementation of the t-SNE algorithm.

R-cran-vegan
Community Ecology Package for R
Versions of package r-cran-vegan
ReleaseVersionArchitectures
jessie2.0-10-1amd64,armel,armhf,i386
buster2.5-4+dfsg-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye2.5-6+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid2.5-6+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch2.4-2-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
wheezy2.0-3-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
Popcon: 24 users (18 upd.)*
Versions and Archs
License: DFSG free
Git

R package for community ecologists. It contains most multivariate analysis needed in analysing ecological communities, and tools for diversity analysis. Most diversity methods assume that data are counts of individuals.

These tools are sometimes used outside the field of ecology, for instance to study populations of white blood cells or RNA molecules.

Registry entries: SciCrunch  OMICtools 
R-cran-webgestaltr
find over-represented properties in gene lists
Versions of package r-cran-webgestaltr
ReleaseVersionArchitectures
bullseye0.4.3-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid0.4.3-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 3 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

The web version WebGestalt http://www.webgestalt.org supports 12 organisms, 354 gene identifiers and 321,251 function categories. Users can upload the data and functional categories with their own gene identifiers. In addition to the Over-Representation Analysis, WebGestalt also supports Gene Set Enrichment Analysis and Network Topology Analysis. The user-friendly output report allows interactive and efficient exploration of enrichment results. The WebGestaltR package not only supports all above functions but also can be integrated into other pipeline or simultaneously analyze multiple gene lists.

Registry entries: OMICtools 
R-cran-wgcna
Weighted Correlation Network Analysis
Versions of package r-cran-wgcna
ReleaseVersionArchitectures
sid1.69-1amd64,arm64,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.69-1amd64,arm64,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 1 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) and Langfelder and Horvath (2008) . Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.

Please cite: Peter Langfelder and Steve Horvath: WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9:559 (2012)
Registry entries: Bio.tools  OMICtools  Bioconda 
R-other-ascat
Allele-Specific Copy Number Analysis of Tumours
Versions of package r-other-ascat
ReleaseVersionArchitectures
sid2.5.2-2all
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

ASCAT (allele-specific copy number analysis of tumors) is a allele- specific copy number analysis of the in vivo breast cancer genome. It can be used to accurately dissect the allele-specific copy number of solid tumors, simultaneously estimating and adjusting for both tumor ploidy and nonaberrant cell admixture.

Please cite: Peter Van Loo, Silje H Nordgard, Ole Christian Lingjærde, Hege G Russnes, Inga H Rye, Wei Sun, Victor J Weigman, Peter Marynen, Anders Zetterberg, Bjørn Naume, Charles M Perou, Anne-Lise Børresen-Dale and Vessela N Kristensen: Allele-specific Copy Number Analysis of Tumors. (PubMed) PNAS 107(39):16910-5 (2010)
Registry entries: Bioconda 
Racon
consensus module for raw de novo DNA assembly of long uncorrected reads
Versions of package racon
ReleaseVersionArchitectures
buster1.3.2-1amd64
bullseye1.4.13-1amd64,arm64,armel,armhf,mips64el,mipsel,ppc64el,s390x
sid1.4.13-1amd64,arm64,armel,armhf,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

Racon is intended as a standalone consensus module to correct raw contigs generated by rapid assembly methods which do not include a consensus step. The goal of Racon is to generate genomic consensus which is of similar or better quality compared to the output generated by assembly methods which employ both error correction and consensus steps, while providing a speedup of several times compared to those methods. It supports data produced by both Pacific Biosciences and Oxford Nanopore Technologies.

Racon can be used as a polishing tool after the assembly with either Illumina data or data produced by third generation of sequencing. The type of data inputed is automatically detected.

Racon takes as input only three files: contigs in FASTA/FASTQ format, reads in FASTA/FASTQ format and overlaps/alignments between the reads and the contigs in MHAP/PAF/SAM format. Output is a set of polished contigs in FASTA format printed to stdout. All input files can be compressed with gzip.

Racon can also be used as a read error-correction tool. In this scenario, the MHAP/PAF/SAM file needs to contain pairwise overlaps between reads including dual overlaps.

A wrapper script is also available to enable easier usage to the end- user for large datasets. It has the same interface as racon but adds two additional features from the outside. Sequences can be subsampled to decrease the total execution time (accuracy might be lower) while target sequences can be split into smaller chunks and run sequentially to decrease memory consumption. Both features can be run at the same time as well.

Please cite: Robert Vaser, Ivan Sovic, Niranjan Nagarajan and Mile Sikic: Fast and accurate de novo genome assembly from long uncorrected reads. (PubMed,eprint) Genome Research 27(5):737-746 (2017)
Registry entries: OMICtools  Bioconda 
Radiant
explore hierarchical metagenomic data with zoomable pie charts
Versions of package radiant
ReleaseVersionArchitectures
sid2.7.1+dfsg-1all
bullseye2.7.1+dfsg-1all
buster2.7+dfsg-2all
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Krona allows hierarchical data to be explored with zoomable pie charts. Krona charts include support for several bioinformatics tools and raw data formats. The charts can be viewed with a recent version of any major web browser.

Please cite: Brian D Ondov, Nicholas H Bergman and Adam M Phillippy: Interactive metagenomic visualization in a Web browser. (PubMed,eprint) BMC Bioinformatics 12:385 (2011)
Registry entries: Bio.tools  SciCrunch  OMICtools 
Ragout
Reference-Assisted Genome Ordering UTility
Versions of package ragout
ReleaseVersionArchitectures
bullseye2.3-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid2.3-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 0 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

Ragout (Reference-Assisted Genome Ordering UTility) is a tool for chromosome-level scaffolding using multiple references. Given initial assembly fragments (contigs/scaffolds) and one or multiple related references (complete or draft), it produces a chromosome-scale assembly (as a set of scaffolds).

The approach is based on the analysis of genome rearrangements (like inversions or chromosomal translocations) between the input genomes and reconstructing the most parsimonious structure of the target genome.

Ragout now supports both small and large genomes (of mammalian scale and complexity). The assembly of highly polymorphic genomes is currently limited.

Please cite: Mikhail Kolmogorov, Joel Armstrong, Brian J. Raney, Ian Streeter, Matthew Dunn, Fengtang Yang, Duncan Odom, Paul Flicek, Thomas M. Keane, David Thybert, Benedict Paten and Son Pham: Chromosome assembly of large and complex genomes using multiple references. (PubMed,eprint) Genome Research 28(11):1720-1732 (2018)
Registry entries: OMICtools  Bioconda 
Rambo-k
Read Assignment Method Based On K-mers