022 — ML-enhanced QC (optional)#
This opt-in QC pipeline branch supplements the rule-based QC (notebook 020) with ML outlier detection — IsolationForest on a feature matrix per variable. It catches contextual outliers that physical-bounds + IQR would miss (e.g. a CO2 reading that’s in-range but inconsistent with the trend).
Requires
palmwtc[ml]extra (sklearn). Real chamber data recommended.
from palmwtc.config import DataPaths
from palmwtc.qc import QCProcessor
from palmwtc.hardware.gpu import DEVICE, get_isolation_forest
paths = DataPaths.resolve()
print(paths.describe())
print(f"Compute device: {DEVICE}")
DataPaths (source=sample (bundled synthetic), site=libz):
raw_dir = /home/runner/work/palmwtc/palmwtc/src/palmwtc/data/sample/synthetic
processed_dir = /home/runner/work/palmwtc/palmwtc/src/palmwtc/data/sample/Data/Integrated_QC_Data
exports_dir = /home/runner/work/palmwtc/palmwtc/src/palmwtc/data/sample/exports
config_dir = /home/runner/work/palmwtc/palmwtc/src/palmwtc/data/sample/config
extras = <none>
Compute device: cpu
# Build an IsolationForest with the package's GPU-aware factory.
# Falls back to CPU sklearn if [gpu] (cuML) isn't installed.
try:
iso = get_isolation_forest(n_estimators=100, contamination=0.05, random_state=42)
print(f"IsolationForest backend: {type(iso).__module__}")
except ImportError as e:
print(f"[skip] sklearn not installed: {e}")
print("Install with: pip install 'palmwtc[ml]'")
[skip] sklearn not installed: No module named 'sklearn'
Install with: pip install 'palmwtc[ml]'
Wiring into QCProcessor#
The ML branch is invoked by setting enable_ml=True in the QCProcessor
config. On real data this would look like:
processor = QCProcessor(paths)
result = processor.run("CO2_C1", enable_ml=True)
This notebook documents the API surface; production ML thresholds are tuned in notebook 035 (sensitivity sweep).