flywheel-gear-extensions

Identifier Lookup

The identifier-lookup gear reads participant IDs from a tabular (CSV) input file and performs identifier lookups in two directions:

The gear outputs a CSV file with the looked-up identifiers appended. If any rows fail lookup, an error file is produced listing the failures.

Environment

This gear uses the AWS SSM parameter store, and expects that AWS credentials are available in environment variables within the Flywheel runtime. The variables used are AWS_SECRET_ACCESS_KEY, AWS_ACCESS_KEY_ID, AWS_DEFAULT_REGION. The gear needs to be added to the allow list for these variables to be shared.

Configuration

Gear configs are defined in manifest.json.

Supported Scenarios

The gear supports three main usage scenarios:

Scenario 1: Single Center Form Submission (Default)

Use case: Process form data submissions from a single center with full validation and QC logging.

Configuration:

{
  "direction": "nacc",
  "single_center": true,
  "module": "uds",
  "event_environment": "prod",
  "event_bucket": "nacc-event-logs"
}

Inputs:

Behavior:

Event Capture:

When event capture is configured (by providing event_environment and event_bucket), the gear will create submission events for each valid visit row. These events are captured to an S3 bucket for tracking data submissions in the NACC event system.

Event capture behavior:

Event capture configuration parameters:

When to use: Standard form data processing pipelines where data comes from a single center.

Scenario 2: Multi-Center Identifier Lookup

Use case: Look up NACCIDs for data from multiple centers without center-specific validation.

Configuration:

{
  "direction": "nacc",
  "single_center": false
}

Inputs:

Behavior:

When to use: Processing aggregated data from multiple centers, or when you need identifier lookup without strict center validation.

Scenario 3: Reverse Lookup (NACCID to Center IDs)

Use case: Look up center identifiers from NACCIDs.

Configuration:

{
  "direction": "center"
}

Inputs:

Behavior:

When to use: When you have NACCIDs and need to determine which center and participant they correspond to.

Configuration Parameters

Input

The input is a single CSV file, which must have columns adcid and ptid.

Output

The gear has two output files.

Note: Event capture, when enabled, does not produce additional output files. Events are captured directly to the configured S3 bucket and do not affect the standard CSV output files described above.