The identifier-lookup gear reads participant IDs from a tabular (CSV) input file and performs identifier lookups in two directions:
direction: "nacc"): Looks up NACCIDs from center identifiers (adcid + ptid)direction: "center"): Looks up center identifiers (adcid + ptid) from NACCIDsThe gear outputs a CSV file with the looked-up identifiers appended. If any rows fail lookup, an error file is produced listing the failures.
This gear uses the AWS SSM parameter store, and expects that AWS credentials are available in environment variables within the Flywheel runtime.
The variables used are AWS_SECRET_ACCESS_KEY, AWS_ACCESS_KEY_ID, AWS_DEFAULT_REGION.
The gear needs to be added to the allow list for these variables to be shared.
Gear configs are defined in manifest.json.
The gear supports three main usage scenarios:
Use case: Process form data submissions from a single center with full validation and QC logging.
Configuration:
{
"direction": "nacc",
"single_center": true,
"module": "uds",
"event_environment": "prod",
"event_bucket": "nacc-event-logs"
}
Inputs:
input_file: CSV file with adcid and ptid columnsform_configs_file: JSON file with module configurations (required)Behavior:
adcid matching the project’s pipeline ADCIDnaccid and module columnsEvent Capture:
In single center mode with form_configs_file, the gear captures submission events for each valid visit row. These events are stored in an S3 bucket for tracking data submissions in the NACC event system. The event_environment and event_bucket parameters are required in this scenario.
Event capture behavior:
When to use: Standard form data processing pipelines where data comes from a single center.
Use case: Look up NACCIDs for data from multiple centers without center-specific validation.
Configuration:
{
"direction": "nacc",
"single_center": false
}
Inputs:
input_file: CSV file with adcid and ptid columns (can have different ADCIDs per row)form_configs_file: Optional - if provided, enables field validation without center validationBehavior:
adcid valuesform_configs_file is provided, module field validation is still performednaccid column (and module if configs provided)When to use: Processing aggregated data from multiple centers, or when you need identifier lookup without strict center validation.
Use case: Look up center identifiers from NACCIDs.
Configuration:
{
"direction": "center"
}
Inputs:
input_file: CSV file with naccid columnBehavior:
adcid and ptid for each naccidadcid and ptid columnsWhen to use: When you have NACCIDs and need to determine which center and participant they correspond to.
direction (string, default: “nacc”): Direction of identifier mapping
"nacc": Look up NACCIDs from center identifiers (adcid + ptid)"center": Look up center identifiers from NACCIDssingle_center (boolean, default: true): Whether to enforce single-center validation
true: All rows must have the same adcid matching the project’s pipeline ADCID (only applies when form_configs_file is provided). Enables QC logging and event capture.false: Allows rows with different adcid values, disables center validation, QC logging, and event capturemodule (string, optional): Module name for form processing (e.g., “uds”, “lbd”)
data-uds.csv → module “UDS”)form_configs_file unless filename has module suffixpreserve_case (boolean, default: true): Whether to preserve the case of header keys in the input file
database_mode (string, default: “prod”): Whether to lookup identifiers from “dev” or “prod” database
event_environment (string, optional): Environment for visit event capture. Valid values are “prod” or “dev”. Required when using nacc direction with form_configs_file in single center mode.
event_bucket (string, optional): S3 bucket name for event capture. Required when using nacc direction with form_configs_file in single center mode. The gear must have write access to this bucket.
dry_run (boolean, default: false): Whether to do a dry run
admin_group (string, default: “nacc”): Name of the admin group
apikey_path_prefix (string, default: “/prod/flywheel/gearbot”): The instance-specific AWS parameter gearbot path prefixThe input is a single CSV file, which must have columns adcid and ptid.
The gear produces up to two output files.
{input_basename}_identifiers.{ext} — a CSV file consisting of the rows of the input file for which an identifier was found, with an additional naccid column if the direction is nacc or additional adcid and ptid columns if the direction is center. For example, an input file named data-uds.csv produces data-uds_identifiers.csv. This file is only written if at least one row has a successful lookup.
preserve_case is set to True, all header keys will also be forced to lower case and spaces replaced with _Note: Event capture, when enabled, does not produce additional output files. Events are captured directly to the configured S3 bucket and do not affect the standard CSV output files described above.
After processing, the gear updates the input CSV file with the following metadata. See the QC Conventions reference for details on the data models and conventions used.
file.info.qc metadata with:
name: "validation"state: "PASS" or "FAIL" depending on whether all identifier lookups succeededdata: List of FileError objects with error details for any rows where identifiers were not found"identifier-lookup") is added as a simple tag to the input file, indicating the file has been processed by this gear.