Debian Science Project
Summary
Data Management
Debian Science Data Management packages

This metapackage will install packages to assist with data management tasks, such as obtaining data from remote resources, keeping data under version control, etc.

Description

For a better overview of the project's availability as a Debian package, each head row has a color code according to this scheme:

If you discover a project which looks like a good candidate for Debian Science to you, or if you have prepared an unofficial Debian package, please do not hesitate to send a description of that project to the Debian Science mailing list

Links to other tasks

Debian Science Data Management packages

Official Debian packages with high relevance

datalad
data files management and distribution platform
Versions of package datalad
ReleaseVersionArchitectures
sid0.19.6-2all
stretch0.4.1-1all
trixie0.19.6-2all
bookworm0.18.1-2all
bullseye0.14.0-1all
buster0.11.2-2all
upstream1.0.2
Popcon: 41 users (5 upd.)*
Newer upstream!
License: DFSG free
Git

DataLad is a data management and distribution platform providing access to a wide range of data resources already available online. Using git-annex as its backend for data logistics it provides following facilities built-in or available through additional extensions

  • command line and Python interfaces for manipulation of collections of datasets (install, uninstall, update, publish, save, etc.) and separate files/directories (add, get)
  • extract, aggregate, and search through various sources of metadata (xmp, EXIF, etc; install datalad-neuroimaging for DICOM, BIDS, NIfTI support)
  • crawl web sites to automatically prepare and update git-annex repositories with content from online websites, S3, etc (install datalad-crawler)
datalad-container
DataLad extension for working with containerized environments
Maintainer: Yaroslav Halchenko
Versions of package datalad-container
ReleaseVersionArchitectures
sid1.2.5-1all
buster0.2.2-2all
bullseye1.1.2-1all
bookworm1.1.9-1all
trixie1.2.5-1all
Popcon: 4 users (3 upd.)*
Versions and Archs
License: DFSG free

This extension enhances DataLad (http://datalad.org) for working with computational containers.

git-annex
git 내에서 파일 내용을 확인하지 않고, git으로 파일 관리
Versions of package git-annex
ReleaseVersionArchitectures
sid10.20240129-1amd64,arm64,i386,mips64el,ppc64el,riscv64,s390x
bullseye8.20210223-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster-backports8.20200330-1~bpo10+1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster7.20190129-3amd64,arm64,armhf,i386
stretch-backports7.20190129-2~bpo9+1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,s390x
stretch-backports7.20181211-2~bpo9+1mips
stretch-backports6.20180913-1~bpo9+1mipsel
stretch6.20170101-1+deb9u2amd64,arm64,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch-security6.20170101-1+deb9u1amd64,i386
jessie-security5.20141125+oops-1+deb8u2amd64,armel,armhf,i386
jessie5.20141125+deb8u1amd64,armel,armhf,i386
trixie10.20240129-1amd64,arm64,i386,mips64el,ppc64el,s390x
bookworm-backports10.20240129-1~bpo12+1amd64,arm64,armel,i386,mips64el,mipsel,ppc64el,s390x
bookworm10.20230126-3amd64,arm64,i386,mips64el,mipsel,ppc64el,s390x
Debtags of package git-annex:
develrcs
roleprogram
works-withfile
Popcon: 427 users (33 upd.)*
Versions and Archs
License: DFSG free
Git

git-annex는 git에 파일 내용을 저장하지 않고, git으로 대용량 파일을 관리할 수 있습니다. 오프라인과 온라인에서 데이타를 동기화, 백업, 보관할 수 있습니다.체크섬과 암호화는 데이타를 안전하게 유지합니다. git-annex를 사용해서 git의 강력한 성능과 분산 특성을 대용량 파일에 적용할 수 있습니다.

로컬 하드 디스크에서 S3, WebDav, rsync등을 포함하는, 수많은 클라우드 스토리지 서비스까지 다양한 장소에 대용량 파일을 저장할 수 있으며, 플러그인을 통해 수십 개의 클라우드 스토리지 공급자를 사용할 수 있습니다. 파일은 gpg를 통해 암호화되므로 클라우드 스토리지 공급자는 파일의 내용을 볼 수 없습니다. git-annex는 각 파일이 저장되는 위치를 추적하여 사용 가능한 복사본 수를 파악하고 데이터를 보존할 수 있는 많은 기능을 제공합니다.

git-annex는 컴퓨터간 폴더를 싱크하고, 파일이 변경되면 이를 알아차려 자동으로 변경된 파일을 git에 커밋하고 변경된 파일을 다른 컴퓨터로 전송하는데 사용될 수 있습니다. git-annex webapp은 git-annex를 쉽게 사용할 수 있도록 합니다.

The package is enhanced by the following packages: elpa-git-annex elpa-magit-annex keysafe
Screenshots of package git-annex
hdf5-filter-plugin
external filters for HDF5: LZ4, BZip2, Bitshuffle
Versions of package hdf5-filter-plugin
ReleaseVersionArchitectures
bookworm0.0~git20221111.49e3b65-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
trixie0.0~git20221111.49e3b65-4amd64,arm64,armel,armhf,i386,mips64el,ppc64el,s390x
sid0.0~git20221111.49e3b65-4amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
Popcon: 0 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

The external filter mechanism introduced with HDF5 1.8.12 allows applications to utilize custom filters not shipped by the HDF5 core library without recompiling your application. This package provides external filters for HDF5 for

  • the LZ4 compression algorithm
  • BZip2 compression
hdf5-filter-plugin-blosc-serial
blocking, shuffling and lossless compression library
Versions of package hdf5-filter-plugin-blosc-serial
ReleaseVersionArchitectures
sid0.0~git20220616.9683f7d-5amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie0.0~git20220616.9683f7d-5amd64,arm64,armel,armhf,i386,mips64el,ppc64el,s390x
bookworm0.0~git20220616.9683f7d-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 0 users (13 upd.)*
Versions and Archs
License: DFSG free
Git

This package contains a filter for HDF5 that uses the Blosc compressor. By installing this filter, you can read and write HDF5 files with Blosc-compressed datasets.

hdf5-filter-plugin-zfp-serial
Compression plugin for the HDF5 library using ZFP compression
Versions of package hdf5-filter-plugin-zfp-serial
ReleaseVersionArchitectures
sid1.1.1-2amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64
bookworm1.1.0+git20221021-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
experimental1.1.0+git20230428-0+exp2amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie1.1.1-2amd64,arm64,armel,armhf,i386,mips64el,ppc64el
Popcon: 0 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

H5Z-ZFP is a compression filter for HDF5 using the ZFP compression library, supporting lossy and lossless compression of floating point and integer data to meet bitrate, accuracy, and/or precision targets.

nexus-tools
NeXus scientific data file format - applications
Versions of package nexus-tools
ReleaseVersionArchitectures
sid4.4.3-6amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bullseye4.4.3-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
jessie4.3.2-svn1921-2amd64,armel,armhf,i386
trixie4.4.3-6amd64,arm64,armel,armhf,i386,mips64el,ppc64el,s390x
bookworm4.4.3-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 4 users (6 upd.)*
Versions and Archs
License: DFSG free
Git

NeXus is a common data format for neutron, X-ray, and muon science. It is being developed as an international standard by scientists and programmers representing major scientific facilities in Europe, Asia, Australia, and North America in order to facilitate greater cooperation in the analysis and visualization of neutron, X-ray, and muon data.

This is the package containing some applications for reading and writing NeXus files.

plfit
fitting power-law distributions to empirical data -- interfaces
Versions of package plfit
ReleaseVersionArchitectures
sid0.9.6+ds-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie0.9.4+ds-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el
bookworm0.9.4+ds-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
Popcon: 2 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

The plfit software fits power-law distributions to empirical (discrete or continuous) data, according to the method of Clauset, Shalizi and Newman [SIAM Review 51, 661-703 (2009)].

This package provides two command line utilities, plfit and plgen.

The package is enhanced by the following packages: plfit-doc
python3-jdata
JData encoder/decoder for python 3
Versions of package python3-jdata
ReleaseVersionArchitectures
bullseye0.3.6-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid0.3.6-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie0.3.6-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,s390x
bookworm0.3.6-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 1 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

The JData Specification (https://github.com/fangq/jdata/) defines a lightweight language-independent data annotation interface targeted at storing and sharing complex data structures across different programming languages such as MATLAB, JavaScript, python etc. Using JData formats, a complex python data structure can be encoded as a dict object that is easily serialized as a JSON/binary JSON file and share such data between programs of different languages.

python3-mdp
Modular toolkit for Data Processing
Versions of package python3-mdp
ReleaseVersionArchitectures
jessie3.3-2all
stretch3.5-1all
sid3.6-7all
bookworm3.6-2amd64,arm64,mips64el,ppc64el
bullseye3.6-1.1all
Popcon: 10 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

Python data processing framework for building complex data processing software by combining widely used machine learning algorithms into pipelines and networks. Implemented algorithms include: Principal Component Analysis (PCA), Independent Component Analysis (ICA), Slow Feature Analysis (SFA), Independent Slow Feature Analysis (ISFA), Growing Neural Gas (GNG), Factor Analysis, Fisher Discriminant Analysis (FDA), and Gaussian Classifiers.

The package is enhanced by the following packages: python3-sklearn
python3-nxs
NeXus scientific data file format - Python 3 binding
Versions of package python3-nxs
ReleaseVersionArchitectures
bullseye4.4.1-3all
trixie4.4.1-4all
sid4.4.1-4all
bookworm4.4.1-4all
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

NeXus is a common data format for neutron, X-ray, and muon science. It is being developed as an international standard by scientists and programmers representing major scientific facilities in Europe, Asia, Australia, and North America in order to facilitate greater cooperation in the analysis and visualization of neutron, X-ray, and muon data.

This is the package containing the Python 3 bindings.

python3-pyzoltan
Wrapper for the Zoltan data management library
Versions of package python3-pyzoltan
ReleaseVersionArchitectures
bullseye1.0.1-2+deb11u1amd64,arm64,ppc64el,s390x
bookworm1.0.1-5+deb12u1amd64,arm64,ppc64el,s390x
trixie1.0.1-9amd64,arm64,ppc64el,s390x
sid1.0.1-9amd64,arm64,ppc64el,riscv64,s390x
Popcon: 4 users (9 upd.)*
Versions and Archs
License: DFSG free
Git

PyZoltan is as the name suggests, is a Python wrapper for the Zoltan data management library.

In PyZoltan, only specific routines and objects are wrapped. The following features of Zoltan are currently supported:

  • Dynamic load balancing using geometric algorithms
  • Unstructured point-to-point communication
  • Distributed data directories
virtuoso-opensource
고성능 데이타베이스
Versions of package virtuoso-opensource
ReleaseVersionArchitectures
bookworm7.2.5.1+dfsg1-0.3all
sid7.2.5.1+dfsg1-0.8all
experimental7.2.12+dfsg-0.1all
jessie6.1.6+dfsg2-2all
buster6.1.6+dfsg2-4all
stretch6.1.6+dfsg2-4all
bullseye7.2.5.1+dfsg1-0.1all
upstream7.2.12
Debtags of package virtuoso-opensource:
rolemetapackage, program
works-withdb
Popcon: 0 users (0 upd.)*
Newer upstream!
License: DFSG free
Git

OpenLink Virtuoso는 고성능 객체관계형 SQL 데이타베이스입니다. 이는 트랜잭 션, 스마트 SQL 컴파일러, 핫 백업, SQL:1999 지원, 서버 사이드 JAVA 또는 .NET 을 지원하는 강력한 저장 프로시저 언어를 제공합니다. 이는 ODBC, JDBC, ADO.NET, 그리고 OLE/DB를 포함한 모든 주요 데이타 엑세스 인터페이스를 제공합니다.

Virtuoso는 데이타베이스에 저장된 RDF 데이타 쿼리를 위해 SQL에 내장된 SPARQL 을 지원합니다. SPARQL은 SPARQL 자체 타입 캐스팅 규칙과 전용 IRI 데이타 유형 같은 엔진 그 자체에 저수준 지원의 혜택이 있습니다.

Virtuoso OSE ("Open-Source Edition")를 구성한는 완전한 패키지 세트를 위해 이 메타 패키지를 설치합니다.

visidata
rapidly explore columnar data in the terminal
Versions of package visidata
ReleaseVersionArchitectures
bullseye2.2.1-1all
bookworm2.11-1all
sid3.0.2-1all
buster1.5.2-1all
trixie3.0.2-1all
Popcon: 34 users (13 upd.)*
Versions and Archs
License: DFSG free
Git

VisiData is a multipurpose terminal utility for exploring, cleaning, restructuring and analysing tabular data. Current supported sources are TSV, CSV, fixed-width text, JSON, SQLite, HTTP, HTML, .xls, and .xlsx (Microsoft Excel).

Official Debian packages with lower relevance

libnexus-dev
NeXus scientific data file format - development libraries
Versions of package libnexus-dev
ReleaseVersionArchitectures
sid4.4.3-6amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm4.4.3-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
trixie4.4.3-6amd64,arm64,armel,armhf,i386,mips64el,ppc64el,s390x
bullseye4.4.3-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 0 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

NeXus is a common data format for neutron, X-ray, and muon science. It is being developed as an international standard by scientists and programmers representing major scientific facilities in Europe, Asia, Australia, and North America in order to facilitate greater cooperation in the analysis and visualization of neutron, X-ray, and muon data.

This is the package containing the development libraries.

libnexus-java
NeXus scientific data file format - java libraries
Versions of package libnexus-java
ReleaseVersionArchitectures
sid4.4.3-6all
bullseye4.4.3-5all
bookworm4.4.3-5all
trixie4.4.3-6all
Popcon: 0 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

NeXus is a common data format for neutron, X-ray, and muon science. It is being developed as an international standard by scientists and programmers representing major scientific facilities in Europe, Asia, Australia, and North America in order to facilitate greater cooperation in the analysis and visualization of neutron, X-ray, and muon data.

This is the package containing the java libraries.

libplfit-dev
fitting power-law distributions to empirical data -- development
Versions of package libplfit-dev
ReleaseVersionArchitectures
trixie0.9.4+ds-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el
bookworm0.9.4+ds-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
sid0.9.6+ds-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

The plfit software fits power-law distributions to empirical (discrete or continuous) data, according to the method of Clauset, Shalizi and Newman [SIAM Review 51, 661-703 (2009)].

This package contains the header files, static libraries and symbolic links that developers using the plfit library will need.

The package is enhanced by the following packages: plfit-doc
python3-openpyxl
Python 3 module to read/write OpenXML xlsx/xlsm files
Versions of package python3-openpyxl
ReleaseVersionArchitectures
trixie3.1.2+dfsg-6all
bullseye3.0.3-1all
bookworm3.0.9-1all
buster2.4.9-1all
sid3.1.2+dfsg-6all
stretch2.3.0-3all
Popcon: 247 users (294 upd.)*
Versions and Archs
License: DFSG free
Git

Openpyxl is a pure Python 3 module to read/write Excel 2007 (OpenXML) xlsx/xlsm files.

This package contains the module itself.

python3-opentsne
t-Distributed Stochastic Neighbor Embedding algorithm
Versions of package python3-opentsne
ReleaseVersionArchitectures
sid1.0.0-1amd64,arm64,armel,armhf,mips64el,ppc64el,riscv64,s390x
sid0.5.0-2i386
upstream1.0.1
Popcon: 0 users (0 upd.)*
Newer upstream!
License: DFSG free
Git

Modular Python implementation of t-Distributed Stochasitc Neighbor Embedding (t-SNE), a popular dimensionality-reduction algorithm for visualizing high-dimensional data sets. openTSNE incorporates the latest improvements to the t-SNE algorithm, including the ability to add new data points to existing embeddings, massive speed improvements, enabling t-SNE to scale to millions of data points and various tricks to improve global alignment of the resulting visualizations.

python3-plfit
fitting power-law distributions to empirical data -- Python
Versions of package python3-plfit
ReleaseVersionArchitectures
sid0.9.6+ds-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie0.9.4+ds-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el
bookworm0.9.4+ds-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
Popcon: 0 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

The plfit software fits power-law distributions to empirical (discrete or continuous) data, according to the method of Clauset, Shalizi and Newman [SIAM Review 51, 661-703 (2009)].

This package provides a Python module.

The package is enhanced by the following packages: plfit-doc
*Popularitycontest results: number of people who use this package regularly (number of people who upgraded this package recently) out of 236226