The image-identifier-lookup gear performs NACCID lookups for DICOM images uploaded to the NACC Data Platform. The gear processes a DICOM image file (or a zip archive containing DICOM files), extracts patient identifiers from the DICOM metadata, queries the identifier database, and updates the Flywheel subject metadata with the corresponding NACCID.
This gear is designed to run on individual DICOM image files as they are uploaded to the platform. It:
On re-runs where the NACCID and DICOM metadata are already stored in subject custom info, the gear skips opening the DICOM file entirely.
This gear uses the AWS SSM parameter store for API key management and S3 for event capture. It expects that AWS credentials are available in environment variables within the Flywheel runtime:
AWS_SECRET_ACCESS_KEYAWS_ACCESS_KEY_IDAWS_DEFAULT_REGIONThe gear needs to be added to the allow list for these variables to be shared.
Gear configs are defined in manifest.json.
dry_run (boolean, default: false): Whether to perform a dry run without making changes
true: Performs all lookups and validations but does not update subject metadatafalse: Updates subject metadata with the looked-up NACCIDdatabase_mode (string, default: “prod”): Which identifier database to query
"prod": Query the production identifier database"dev": Query the development identifier databasenaccid_field_name (string, default: “naccid”): Field name for storing NACCID in subject metadata
event_environment (string, required): Environment for visit event capture
"prod" or "dev"event_bucket (string, required): S3 bucket name for event capture
apikey_path_prefix (string, default: “/prod/flywheel/gearbot”): AWS SSM parameter path prefix for API keysThe gear requires a single DICOM image file as input:
input_file: A DICOM format image file or a zip archive containing DICOM files
dicom in Flywheel.dcm/.dicom extension, or extensionless files)The gear extracts the following information:
ADCID (Center ID): Retrieved from the project’s pipeline configuration
Study Date: Retrieved from DICOM StudyDate tag (0008,0020)
Initialization: Validates that event capture is configured (both event_environment and event_bucket must be provided)
dicom_metadata in subject custom info (if available from a prior run)Short-Circuit Check: If PTID, study date, and modality are all available from custom info, the gear skips opening the DICOM file. This avoids unnecessary file I/O on re-runs.
File Resolution (if needed): If any required data is missing from custom info, the gear opens the input file. If the input is a zip archive, it extracts the first DICOM file to a temporary location.
DICOM Enrichment (if needed): Fills in missing fields (PTID, study date, modality) from the DICOM file and builds visit metadata.
Idempotency Check: If the subject already has a NACCID in metadata, the gear skips the lookup step.
Identifier Lookup: Queries the identifier database with ADCID and PTID to retrieve the NACCID.
Metadata Update: Updates the subject metadata with the NACCID and comprehensive DICOM metadata (unless in dry run mode).
QC Logging: Creates a QC status log file at the project level.
Event Capture: Logs a submission event to S3 for tracking.
The gear produces the following outputs:
The subject’s metadata is updated with the NACCID and DICOM metadata (unless in dry run mode):
{
"naccid": "NACC123456",
"dicom_metadata": {
"patient_id": "110001",
"study_date": "20240115",
"modality": "MR",
"study_instance_uid": "1.2.840.113619.2.1.1.1",
"series_instance_uid": "1.2.840.113619.2.1.1.2",
"series_number": "5",
"manufacturer": "Siemens",
"manufacturer_model_name": "Skyra",
"series_description": "T1 MPRAGE",
"magnetic_field_strength": "3.0",
"images_in_acquisition": "176"
}
}
A project-level log file is created with the naming pattern:
{ptid}_{date}_{modality}_qc-status.logExample: 12345_2024-03-15_MR_qc-status.log
After processing, the gear updates the input DICOM file with the following metadata. See the QC Conventions reference for details on the data models and conventions used.
name: "validation"state: "PASS" or "FAIL" depending on processing outcomedata: Error details (from FileErrorList) if any errors occurredValidation Timestamp: The file’s .info metadata is updated with a validated_timestamp field set to the current UTC time. This allows tracking when the file was last processed.
GearTags mechanism:
gear-PASS or gear-FAIL tag (prefixed with the gear name) based on processing statusNote: Failures in metadata updates are logged but do not fail the gear, as these updates are considered non-critical.
When event capture is configured, the gear creates a submission event in the configured S3 bucket. Events include:
Event capture failures do not affect the primary identifier lookup functionality.
The gear implements comprehensive error handling:
event_environment and event_bucket required)All errors are logged to:
{
"dry_run": false,
"database_mode": "prod",
"event_environment": "prod",
"event_bucket": "nacc-event-logs"
}
{
"dry_run": true,
"database_mode": "dev",
"event_environment": "dev",
"event_bucket": "nacc-event-logs-dev"
}
This gear is typically used in conjunction with:
| Tag | Name | Purpose |
|---|---|---|
(0010,0020) |
PatientID | Patient identifier (PTID fallback) |
(0008,0020) |
StudyDate | Date of the study |
(0008,0060) |
Modality | Imaging modality (MR, CT, PET, etc.) |
(0020,000D) |
StudyInstanceUID | Unique study identifier |
(0020,000E) |
SeriesInstanceUID | Unique series identifier |
(0020,0011) |
SeriesNumber | Series number within study |
(0008,0021) |
SeriesDate | Date of the series |
(0018,0087) |
MagneticFieldStrength | Scanner field strength |
(0008,0070) |
Manufacturer | Scanner manufacturer |
(0008,1090) |
ManufacturerModelName | Scanner model |
(0008,103E) |
SeriesDescription | Series description |
(0020,1002) |
ImagesInAcquisition | Number of images |
pydicom: DICOM file parsingflywheel-sdk: Flywheel platform integrationfw-gear: Gear development frameworkboto3: AWS S3 and SSM integrationpydantic: Data validation for LookupContext modelThe gear follows a clean architecture pattern:
run.py: Gear interface, Flywheel context management, file resolution, and DICOM extractionmain.py: ImageIdentifierLookup class orchestrating the lookup workflowprocessor.py: Business logic for identifier lookup and subject metadata updatesextraction.py: LookupContext model for accumulating workflow data, DICOM metadata extractionfile_resolver.py: Zip archive detection and DICOM file extractiondicom_utils.py: Low-level DICOM tag reading via pydicom