flywheel-gear-extensions

Identifier Lookup

The identifier-lookup gear reads participant IDs from a tabular (CSV) input file and performs identifier lookups in two directions:

The gear outputs a CSV file with the looked-up identifiers appended. If any rows fail lookup, an error file is produced listing the failures.

Environment

This gear uses the AWS SSM parameter store, and expects that AWS credentials are available in environment variables within the Flywheel runtime. The variables used are AWS_SECRET_ACCESS_KEY, AWS_ACCESS_KEY_ID, AWS_DEFAULT_REGION. The gear needs to be added to the allow list for these variables to be shared.

Configuration

Gear configs are defined in manifest.json.

Supported Scenarios

The gear supports three main usage scenarios:

Scenario 1: Single Center Form Submission (Default)

Use case: Process form data submissions from a single center with full validation and QC logging.

Configuration:

{
  "direction": "nacc",
  "single_center": true,
  "module": "uds",
  "event_environment": "prod",
  "event_bucket": "nacc-event-logs"
}

Inputs:

Behavior:

Event Capture:

In single center mode with form_configs_file, the gear captures submission events for each valid visit row. These events are stored in an S3 bucket for tracking data submissions in the NACC event system. The event_environment and event_bucket parameters are required in this scenario.

Event capture behavior:

When to use: Standard form data processing pipelines where data comes from a single center.

Scenario 2: Multi-Center Identifier Lookup

Use case: Look up NACCIDs for data from multiple centers without center-specific validation.

Configuration:

{
  "direction": "nacc",
  "single_center": false
}

Inputs:

Behavior:

When to use: Processing aggregated data from multiple centers, or when you need identifier lookup without strict center validation.

Scenario 3: Reverse Lookup (NACCID to Center IDs)

Use case: Look up center identifiers from NACCIDs.

Configuration:

{
  "direction": "center"
}

Inputs:

Behavior:

When to use: When you have NACCIDs and need to determine which center and participant they correspond to.

Configuration Parameters

Input

The input is a single CSV file, which must have columns adcid and ptid.

Output

The gear produces up to two output files.

Note: Event capture, when enabled, does not produce additional output files. Events are captured directly to the configured S3 bucket and do not affect the standard CSV output files described above.

File Metadata and Tagging

After processing, the gear updates the input CSV file with the following metadata. See the QC Conventions reference for details on the data models and conventions used.

  1. QC Result: A validation QC result is added to the file’s file.info.qc metadata with:
    • name: "validation"
    • state: "PASS" or "FAIL" depending on whether all identifier lookups succeeded
    • data: List of FileError objects with error details for any rows where identifiers were not found
  2. File Tag: The gear name (e.g., "identifier-lookup") is added as a simple tag to the input file, indicating the file has been processed by this gear.