flywheel-gear-extensions

Regression Curator

Runs a regression test between curated projects and the QAF and MQT baseline files.

Workflow

This gear uses the same framework as the attribute_curator, but instead uses common/src/curator/regression_curator.py. Currently it also only runs over *UDS.json files since the QAF is UDS-specific, and assumes the project has already been curated by attribute_curator.

Preparation steps:

Localize the QAF/MQT baseline files from S3, and convert to dicts mapping NACCID_VISITDATE (QAF) or NACCID to a dict-representation of the CSV record
1. In the case of the QAF, this will filter out any columns that are not NACC*, NGDS*, or provided by the keep_fields config
QAF is used as a baseline for the file-level derived variables (focusing only on variables under file.info.derived), whereas MQT is used as a baseline for all subject-level variables

Regression curation steps:

Create a data view of the project, filtering on the provided filename_pattern config (defaults to *UDS.json for all UDS files)
For each subject, aggregate each file into a MinHeap based on order determined by file type (scope) and date
Multiprocess by subject. For each subject
1. Run a regression test against subject.info using the MQT baseline
  1. Run a regression test against file.info for each file in the heap using the QAF baseline
Keep track of errors - if errors are found, writes to regression_errors.csv (defined by error_outfile config) and uploads it to the curated project

This site is open source. Improve this page.