Archive DICOMs
As the “true” raw data, DICOMs are rarely (re)accessed and hardly ever need to change.
However, they need to be stored somewhere.
Tracking DICOMs in DataLad datasets allows dependency tracking for conversion to NIfTI.
However, it’s good to keep DataLad optional (allow DataLad and non-DataLad access).
Historical precedent: ICF at FZJ
The following solution has been proposed for the Imaging Core Facility at FZJ:
- DICOMs are packed into tar files (tarballs)
- the tarballs are placed on a web server (intranet only), organized
by project (HTTP Basic Authentication for access management)
- DataLad datasets record availability via
archivist
and
uncurl
special remotes, which translates to:
- a file is available from a tarball (archivist special remote)
- a tarball is available from a given URL, pointing to the web server (uncurl special remote).
- Only the Git repository (no annex) is stored by the consuming institute;
the ICF web server is the permanent DICOM storage.
The system has been documented in https://inm-icf-utilities.readthedocs.io/en/latest/
and the tarball & dataset generation tools implementation is in https://github.com/psychoinformatics-de/inm-icf-utilities.
TRR reimplementation
One of the TRR sites indicated intent to use a Forgejo instance for DICOM storage.
A particular challenge for the underlying system was inode limitation.
For this reason, an adaptation of the ICF system has been proposed:
- a dataset is generated upfront, and DICOM tarball is stored with the dataset in Forgejo (as annex)
- we keep using the archivist remote (file in tarball) to avoid using up thousands of inodes for individual files
(Git can pack its repository into several files, so we only add one more for the tarball).
A proof of principle for dataset generation (using re-written ICF
code) has been proposed in https://hub.trr379.de/q02/dicom-utilities. See the README for more
detailed explanations (and commit messages for even more detail).
BIDS conversion
Converting the heterogeneous, site-specific raw MRI data acquisitions into a standardized dataset is an essential precondition for the collaborative work in TRR379.
It readies the data for processing with established pipelines, and applies a pseudonymization as a safeguard for responsible use of this personal data.
TRR379 uses the Brain Imaging Data Structure (BIDS) as the standard for its datasets.
Conversion to BIDS
The conversion of raw MRI data in DICOM format to a BIDS-compliant dataset is a largely automated process.
The recommended software to be used for conversion is Heudiconv.
Heudiconv uses dcm2niix as the actual DICOM→NIfTI converter.
In our experience, dcm2niix
is the most robust and most correct tool available for this task.
Heudiconv does the job of mapping DICOM series to BIDS entities (ie. determine BIDS-compliant file names).
A key heudiconv concept is a heuristic: a Python program (function) which looks at the DICOM series properties and matches it with a file naming pattern.
A heuristic typically relies on DICOM series naming (set at the scanner console), but it can also use other properties such as number of images or acquisition parameters.
Because TRR379 uses its own conventions, a matching heuristic needs to be provided (possibly one for each TRR379 site).
An implementation of such a heuristic has been created, and was tested on phantom MRI acquisitions from all sites (see below).
Using this heuristic, MRI data from all sites can be BIDS-standardized.
As with any automation, caution and oversight is needed for edge cases (e.g. repeated / discarded acquisitions).
Heudiconv tutorials further illustrate the process and capabilities of the software.
Good practices
- Use heudiconv as a containerized application. Q02 provides a readily usable utility dataset with a
configured container. See that repository for
an example usage.
- DICOMs as subdatasets helps with provenance, even if those DICOMs are never accessed outside
- Heudiconv takes paths and (optionally) intended subject IDs as input
- if paths contain identifying information, this would leak into DataLad run records
- having a helper script / lookup table in the (private) DICOM dataset can hide this information
Caveats
Demonstrators and resources
TRR phantom DICOMs
Scans of MRI phantoms were carried out using the intended sequences (presumably - see caveats section below).
These were shared with Q02 and uploaded to the TRR Hub forgejo instance:
Note: Aachen did a re-scan which was shared by e-mail / cloud (June 03, 2025).
This has not been uploaded to forgejo (permissions + size).
TRR phantom BIDS
Conversion of re-scanned Aachen phantom is in https://hub.trr379.de/q02/tmp-phantom-bids (separate from the above
because input data is not available as a DataLad dataset)
Data consistency
Conversion: technical issues
These are open questions: