Work with raw MRI data

Magnetic resonance imaging is an essential data acquisition method for TRR379. Four sites acquire such data. Each site has (different) established routines and conventions. This documentation collects resources and best practices that can be adopted by TRR379 members.

Subsections of Work with raw MRI data

Archive DICOMs

As the “true” raw data, DICOMs are rarely (re)accessed and hardly ever need to change. However, they need to be stored somewhere. Tracking DICOMs in DataLad datasets allows dependency tracking for conversion to NIfTI. However, it’s good to keep DataLad optional (allow DataLad and non-DataLad access).

Historical precedent: ICF at FZJ

The following solution has been proposed for the Imaging Core Facility at FZJ:

  • DICOMs are packed into tar files (tarballs) 1
  • the tarballs are placed on a web server (intranet only), organized by project (HTTP Basic Authentication for access management)
  • DataLad datasets record availability via archivist and uncurl special remotes, which translates to:
    • a file is available from a tarball (archivist special remote)
    • a tarball is available from a given URL, pointing to the web server (uncurl special remote)2.
  • Only the Git repository (no annex) is stored by the consuming institute; the ICF web server is the permanent DICOM storage.

The system has been documented in https://inm-icf-utilities.readthedocs.io/en/latest/ and the tarball & dataset generation tools implementation is in https://github.com/psychoinformatics-de/inm-icf-utilities.

TRR reimplementation

One of the TRR sites indicated intent to use a Forgejo instance for DICOM storage. A particular challenge for the underlying system was inode limitation. For this reason, an adaptation of the ICF system has been proposed:

  • a dataset is generated upfront, and DICOM tarball is stored with the dataset in Forgejo (as annex)
  • we keep using the archivist remote (file in tarball) to avoid using up thousands of inodes for individual files (Git can pack its repository into several files, so we only add one more for the tarball).

A proof of principle for dataset generation (using re-written ICF code) has been proposed in https://hub.trr379.de/q02/dicom-utilities. See the README for more detailed explanations (and commit messages for even more detail).


  1. timestamps are normalized to ensure re-packing the same data does not change tarball checksums ↩︎

  2. uncurl is chosen because it allows re-writing patterns with just configuration, e.g., should the base URL change ↩︎

BIDS conversion

Converting the heterogeneous, site-specific raw MRI data acquisitions into a standardized dataset is an essential precondition for the collaborative work in TRR379. It readies the data for processing with established pipelines, and applies a pseudonymization as a safeguard for responsible use of this personal data.

TRR379 uses the Brain Imaging Data Structure (BIDS) as the standard for its datasets.

Conversion to BIDS

The conversion of raw MRI data in DICOM format to a BIDS-compliant dataset is a largely automated process. The recommended software to be used for conversion is Heudiconv.

Heudiconv uses dcm2niix as the actual DICOM→NIfTI converter. In our experience, dcm2niix is the most robust and most correct tool available for this task.

Heudiconv does the job of mapping DICOM series to BIDS entities (ie. determine BIDS-compliant file names). A key heudiconv concept is a heuristic: a Python program (function) which looks at the DICOM series properties and matches it with a file naming pattern. A heuristic typically relies on DICOM series naming (set at the scanner console), but it can also use other properties such as number of images or acquisition parameters.

Because TRR379 uses its own conventions, a matching heuristic needs to be provided (possibly one for each TRR379 site). An implementation of such a heuristic has been created, and was tested on phantom MRI acquisitions from all sites (see below). Using this heuristic, MRI data from all sites can be BIDS-standardized. As with any automation, caution and oversight is needed for edge cases (e.g. repeated / discarded acquisitions).

Heudiconv tutorials further illustrate the process and capabilities of the software.

Good practices

  • Use heudiconv as a containerized application. Q02 provides a readily usable utility dataset with a configured container. See that repository for an example usage.
  • DICOMs as subdatasets helps with provenance, even if those DICOMs are never accessed outside
  • Heudiconv takes paths and (optionally) intended subject IDs as input
    • if paths contain identifying information, this would leak into DataLad run records
    • having a helper script / lookup table in the (private) DICOM dataset can hide this information

Caveats

Demonstrators and resources

TRR phantom DICOMs

Scans of MRI phantoms were carried out using the intended sequences (presumably - see caveats section below). These were shared with Q02 and uploaded to the TRR Hub forgejo instance:

Note: Aachen did a re-scan which was shared by e-mail / cloud (June 03, 2025). This has not been uploaded to forgejo (permissions + size).

TRR phantom BIDS

Conversion of re-scanned Aachen phantom is in https://hub.trr379.de/q02/tmp-phantom-bids (separate from the above because input data is not available as a DataLad dataset)

Data consistency

Conversion: technical issues

These are open questions: