Scrapes existing QC status log files from a Flywheel project to generate historical submission and pass-qc events. This gear is designed to backfill event data for visits that were processed before event capture was implemented.
The transactional event scraper reconstructs the event history for form submissions by analyzing QC status log files that were created during the form processing pipeline. This allows NACC to:
*-qc-status-log.json)start_date and end_date parametersCreated for every valid QC status log file found. These events record the initial submission of form data.
Event structure:
submittransactional-event-scraperCreated only for log files where the QC status is “pass”. These events record successful completion of QC validation.
Event structure:
pass-qctransactional-event-scraperThis gear takes the following configuration parameters:
| Parameter | Default | Description |
|---|---|---|
dry_run |
false |
Whether to perform a dry run without capturing events. When enabled, the gear will process files and log what would be done, but will not write events to S3. |
event_bucket |
"submission-events" |
S3 bucket name for event storage. The gear must have write access to this bucket. |
event_environment |
"prod" |
Environment prefix for event storage. Valid values are “prod” or “dev”. This determines the environment prefix used when storing events in S3. |
start_date |
(optional) | Start date for filtering files in YYYY-MM-DD format. Only files created on or after this date will be processed. |
end_date |
(optional) | End date for filtering files in YYYY-MM-DD format. Only files created on or before this date will be processed. |
apikey_path_prefix |
"/prod/flywheel/gearbot" |
The instance-specific AWS parameter path prefix for API key. |
This gear requires only the Flywheel API key input:
| Input | Description |
|---|---|
api-key |
Flywheel API key for accessing the project and its files. |
This gear uses AWS S3 for event storage and the AWS SSM parameter store for API key retrieval. It expects that AWS credentials are available in environment variables within the Flywheel runtime:
AWS_SECRET_ACCESS_KEYAWS_ACCESS_KEY_IDAWS_DEFAULT_REGIONThe gear needs to be added to the allow list for these variables to be shared.
To scrape all QC status log files in a project:
event_bucket and event_environment as neededTo preview what would be processed without capturing events:
dry_run to trueTo process only files within a specific date range:
start_date to the beginning of your desired range (e.g., “2024-01-01”)end_date to the end of your desired range (e.g., “2024-12-31”)This is useful for:
The gear produces log output with processing statistics:
No output files are created. All events are captured directly to the configured S3 bucket.
When first enabling event capture, run this gear to populate historical events for all existing QC status logs:
{
"dry_run": false,
"event_bucket": "submission-events",
"event_environment": "prod"
}
To add events for a specific time period (e.g., after fixing an issue):
{
"dry_run": false,
"event_bucket": "submission-events",
"event_environment": "prod",
"start_date": "2024-06-01",
"end_date": "2024-06-30"
}
To test the scraper on development data:
{
"dry_run": true,
"event_bucket": "submission-events-dev",
"event_environment": "dev"
}
*-qc-status-log.json)