The identifier-lookup gear reads participant IDs from a tabular (CSV) input file and performs identifier lookups in two directions:
direction: "nacc"): Looks up NACCIDs from center identifiers (adcid + ptid)direction: "center"): Looks up center identifiers (adcid + ptid) from NACCIDsThe gear outputs a CSV file with the looked-up identifiers appended. If any rows fail lookup, an error file is produced listing the failures.
This gear uses the AWS SSM parameter store, and expects that AWS credentials are available in environment variables within the Flywheel runtime.
The variables used are AWS_SECRET_ACCESS_KEY, AWS_ACCESS_KEY_ID, AWS_DEFAULT_REGION.
The gear needs to be added to the allow list for these variables to be shared.
Gear configs are defined in manifest.json.
The gear supports three main usage scenarios:
Use case: Process form data submissions from a single center with full validation and QC logging.
Configuration:
{
"direction": "nacc",
"single_center": true,
"module": "uds",
"event_environment": "prod",
"event_bucket": "nacc-event-logs"
}
Inputs:
input_file: CSV file with adcid and ptid columnsform_configs_file: JSON file with module configurations (required)Behavior:
adcid matching the project’s pipeline ADCIDnaccid and module columnsEvent Capture:
When event capture is configured (by providing event_environment and event_bucket), the gear will create submission events for each valid visit row. These events are captured to an S3 bucket for tracking data submissions in the NACC event system.
Event capture behavior:
Event capture configuration parameters:
event_environment (string, required for event capture): Environment for event capture. Valid values are “prod” or “dev”. This determines the environment prefix used when storing events in S3.event_bucket (string, required for event capture): S3 bucket name where submission events will be stored. The gear must have write access to this bucket.When to use: Standard form data processing pipelines where data comes from a single center.
Use case: Look up NACCIDs for data from multiple centers without center-specific validation.
Configuration:
{
"direction": "nacc",
"single_center": false
}
Inputs:
input_file: CSV file with adcid and ptid columns (can have different ADCIDs per row)form_configs_file: Optional - if provided, enables field validation without center validationBehavior:
adcid valuesform_configs_file is provided, module field validation is still performednaccid column (and module if configs provided)When to use: Processing aggregated data from multiple centers, or when you need identifier lookup without strict center validation.
Use case: Look up center identifiers from NACCIDs.
Configuration:
{
"direction": "center"
}
Inputs:
input_file: CSV file with naccid columnBehavior:
adcid and ptid for each naccidadcid and ptid columnsWhen to use: When you have NACCIDs and need to determine which center and participant they correspond to.
direction (string, default: “nacc”): Direction of identifier mapping
"nacc": Look up NACCIDs from center identifiers (adcid + ptid)"center": Look up center identifiers from NACCIDssingle_center (boolean, default: true): Whether to enforce single-center validation
true: All rows must have the same adcid matching the project’s pipeline ADCID (only applies when form_configs_file is provided)false: Allows rows with different adcid values, disables center validationmodule (string, optional): Module name for form processing (e.g., “uds”, “lbd”)
data-uds.csv → module “UDS”)form_configs_file unless filename has module suffixpreserve_case (boolean, default: true): Whether to preserve the case of header keys in the input file
database_mode (string, default: “prod”): Whether to lookup identifiers from “dev” or “prod” databaseThe input is a single CSV file, which must have columns adcid and ptid.
The gear has two output files.
naccid column if the direction is nacc or additional adcid and ptid columns if the direction is center
preserve_case is set to True, all header keys will also be forced to lower case and spaces replaced with _Note: Event capture, when enabled, does not produce additional output files. Events are captured directly to the configured S3 bucket and do not affect the standard CSV output files described above.