Identifiers

Identifiers are an essential component of the TRR379 research data management (RDM) approach. This is reflected in the visible organization of information on the consortium website, but also in the schemas that define the structure of metadata on TRR379 outputs.

Many systems for identifying particular types of entities have been developed. A well-known example is DOI for digital objects, most commonly used for publications. However, many others exist, like ROR for research organizations, or Cognitive Atlas for concepts, tasks, and phenotypes related to human cognition.

RDM in the TRR379 aims to employ and align with existing systems as much as possible to maximize interoperability with other efforts and solutions. However, no particular identifier system is required or exclusively adopted by TRR379.

Instead, anything and everything that is relevant for TRR379 has an identifier in a TRR379-specific namespace.

Identifier persistence

TRR379 RDM heavily relies on persistent identifiers. More or less anything and everything has, and must have, a persistent identifier. This key constraint makes it possible for multiple actors to collaboratively, and simultaneously contribute metadata on arbitrary aspects – without having to wait and query for finished metadata records on associated entities.

TRR379 identifier namespace

TRR379 uses URIs as identifiers that map onto the structure of the main consortium website. For example, the full TRR379 identifier for the spokesperson Ute Habel is https://trr379.de/contributors/ute-habel. In this URI, https://trr379.de is the unique TRR379-specific namespace prefix, contributors/ute-habel is the TRR379-specific identifier for Ute Habel (where contributors is a sub-namespace for agents that in some way contribute to the consortium).

Even though Ute Habel can also identified by the ORCID 0000-0003-0703-7722, via the quasi-standard identifier system for researchers, this alternative identifier is considered an optional, alternative identifier rather than a requirement for TRR379 RDM.

The reasons for this approach are simplicity, and flexibility.

An identifier in TRR379 RDM is a simple text label, in a self-managed namespace. This self-managed namespace can cover any and all entity types that require identification with TRR379. In many cases, an identifier directly maps to a page on the main consortium website. This is a simple strategy to document the nature of any entity. It also establishes the main website as a central, straightforward instrument for communicating and deduplicating identifiers in a distributed research consortium.

Alignment with other identifiers

Even though any relevant entity can receive a TRR379-specific identifier with the approach described above, the utility of these identifier is limited to TRR379-specific procedures and activities. However, a TRR379 metadata record on a research site (e.g., https://trr379.de/sites/aachen ) can be annotated with any alternative identifier for the same entity (e.g., https://ror.org/04xfq0f34 ). Thereby it is possible to combine the benefits of a self-governed, project-specific identifier namespace with the superior discoverability and interoperability of established identification systems for particular entities.

Identifiers for particular entities

The additional documentation linked below provides more information on particular identifiers used by TRR379.

Participants

Participants

This page is a more in-depth description of the rationale behind the SOP for participant identifiers used by TRR379.

Q1 participant identifiers

Q01 is the central recruitment project. Any participant included in the core TRR379 dataset is registered with Q01 and receives an identifier. This identifier is unique within TRR379 and stable across the entire lifespan of TRR379.

The dataset acquired by TRR379 is longitudinal in nature. Therefore participants need to be reliably identified and re-identified for follow-up visits. Because participants are not expected to remember their TRR379 identifier, it is necessary to store personal data on a participant for the time of their participation in data acquisition activities.

In order to avoid needlessly wide-spread distribution of this personal data, participant registration and personal data retention is done only at the site where a person participates in TRR379 data acquisitions. Each site:

issues TRR379-specific participant identifiers that are unique and valid throughout the runtime of TRR379
uses secure systems for this purpose, for example, existing patient information systems
is responsible for linking all relevant information that is required for reporting and data analysis within TRR379 to the issued Q01 identifier, so that all data can be identified and delivered upon request (e.g., link to brain imaging facility subject ID).

The site-issued identifiers have a unique, site-specific prefix (e.g., a letter like A for Aachen), such that each site can self-organize their own identifier namespace without having to synchronize with all other sites to avoid duplication.

The identifiers must not have any other information encoded in them.

Responsible use and anonymization of identifiers

The TRR379 participant identifiers, as described above, are pseudonymous. Using these TRR379-specific identifiers only, for any TRR379-specific communication and implementations, is advised for compliance with the GDPR principles of necessity and proportionality of personal data handling. This includes, for example, data analysis scripts that can be expected to become part of a more widely accessible documentation or publication.

Any TRR379 site that issues identifiers is responsible for strictly separating personal data used for (re-)identifying a participant, such as health insurance ID, government ID card numbers, or name and date of birth. This information is linked to TRR379-specific identifiers in a dedicated mapping table. Access to this table is limited to specifically authorized personnel.

When a participant withdraws, or when a study’s data acquisition is completed, the mapping of the TRR379 identifier to personal identifying information (1) is destroyed, by removing the associated record (row) from the mapping table. At this point, the TRR379 identifier itself can be considered anonymous. Consequently, occurrences of such identifiers in any published or otherwise shared records, or computer scripts need not be redacted.

The validity of the statement above critically depends on the identifier-issuing sites to maintain a strictly separate, confidential mapping of identifier to personal identifying information, and to not encode participant-specific information into the identifier itself.

Participant identifiers in A/B/C projects

For each project or study that is covered by its own ethics documentation and approval, separate and dedicated participant identifiers are used that are different from a Q01-identifier for a person. This is done to enable such projects to fulfill their individual requirements regarding responsible use of personal data. In particular, it enables any individual project to share and publish data without enabling trivial, undesired, and unauthorized cross-referencing of data on an individual person, acquired in different studies.

These project-specific identifiers are managed and issued in the same way as described above.

A project requests a project-specific identifier from the local site representative of Q01, by presenting personal identifying information.
This information is matched to any existing Q01-identifier, and a project-specific identifier is created and/or reported.
Any created project-specific identifier is linked to the Q01-identifier, using the same technical systems and procedures that also link other identifiers (e.g., patient information system record identifier).

Importantly, the mapping of the Q01-identifier and a project-specific identifier is typically not shared with the requesting project. This is done to prevent accidental and undesired co-occurrence of the two different identifiers in a way that enables unauthorized agents to reconstruct an identifier mapping that violates the boundaries of specific research ethics.

Special-purpose identifiers

Sometimes it is necessary to generate participant identifiers that are not compliant with the procedures and properties described above. For example, an external service provide may require particular information to be encoded in an identifier (e.g., sex, age, date of acquisition).

If this is the case, an additional identifier must be generated for that specific purpose. Its use must be limited in time and it must not be reused for other purposes.

Identifier generation and linkage to the standard Q01-participant identifiers is done using the procedure described for project-specific identifiers above.