020 — Rule-based QC#
This tutorial applies the multi-stage QC pipeline to the unified monthly data from notebook 010: physical-bounds checks, IQR outliers, breakpoint detection (ruptures), drift detection, sensor-exclusion masks, and combined flag synthesis.
Requires real data. The bundled synthetic sample ships post-QC parquet, skipping this stage. The synthetic generator (
scripts/make_sample_data.py) injects realistic edge cases — NaN bursts, drift segments, OOB spikes — that exercise the QC code paths when this notebook is run against real data.
Future work will thin this notebook further by adding palmwtc.pipeline.step_qc_full
that wraps the joblib-parallel process_variable_qc loop.
from palmwtc.config import DataPaths
from palmwtc.qc import (
QCProcessor,
apply_iqr_flags,
apply_physical_bounds_flags,
detect_breakpoints_ruptures,
)
paths = DataPaths.resolve()
print(paths.describe())
DataPaths (source=sample (bundled synthetic), site=libz):
raw_dir = /home/runner/work/palmwtc/palmwtc/src/palmwtc/data/sample/synthetic
processed_dir = /home/runner/work/palmwtc/palmwtc/src/palmwtc/data/sample/Data/Integrated_QC_Data
exports_dir = /home/runner/work/palmwtc/palmwtc/src/palmwtc/data/sample/exports
config_dir = /home/runner/work/palmwtc/palmwtc/src/palmwtc/data/sample/config
extras = <none>
QC components (preview)#
The full notebook iterates over each variable in config/variable_config.json
and runs:
result = QCProcessor(paths).run("CO2_C1")
which under the hood calls (in order):
apply_physical_bounds_flags— out-of-range valuesapply_iqr_flags— IQR outliers per rolling windowapply_rate_of_change_flags— implausible jumpsapply_persistence_flags— stuck-sensor detectionapply_battery_proxy_flags— datalogger health proxyapply_sensor_exclusion_flags— manual + auto exclusion windowscombine_qc_flags— synthesis into one composite flagdetect_breakpoints_ruptures— step-change detectiondetect_drift_windstats— slow-baseline-drift detection
Joblib parallelism (process_variable_qc) keeps the loop fast over many variables.