palmwtc.windows#
palmwtc.windows — high-confidence calibration window selection.
This subpackage selects contiguous date ranges (“windows”) of oil-palm chamber cycles whose per-cycle quality scores are high enough to use as training data for the XPalm digital-twin model.
Main entry point#
WindowSelectorMulti-criteria selector that scores cycles, detects instrument drift, and packages qualifying windows as a cycle CSV and JSON manifest.
Module-level helper#
merge_sensor_qc_onto_cycles()Vectorized interval-join that appends per-cycle mean CO₂/H₂O sensor QC flags from the high-frequency 021 parquet onto the cycle DataFrame.
Configuration#
DEFAULT_CONFIGDict of all tunable thresholds with documented physical meaning. Pass
config={"key": value}toWindowSelectorto override individual keys.
Typical usage:
from palmwtc.windows import WindowSelector
ws = WindowSelector(cycles_df, config={"min_window_days": 7})
ws.detect_drift()
ws.score_cycles()
ws.identify_windows()
filtered_df, manifest = ws.export()
Submodules#
Attributes#
Classes#
Select high-confidence calibration windows from per-cycle flux quality scores. |
Functions#
|
Aggregate per-cycle mean sensor QC flags from the high-frequency QC parquet. |
Package Contents#
- palmwtc.windows.DEFAULT_CONFIG: dict#
- class palmwtc.windows.WindowSelector(cycles_df: pandas.DataFrame, config: dict | None = None)#
Select high-confidence calibration windows from per-cycle flux quality scores.
A window is a contiguous date range of oil-palm chamber cycles whose per-cycle confidence scores are high enough to use as training data for the XPalm digital-twin model. The selector walks the scored cycles, identifies qualifying spans, and packages them as a cycle CSV + JSON manifest.
Parameters#
- cycles_dfpd.DataFrame
Cycle-level data from notebook 030 (
01_chamber_cycles.csv). Required columns:flux_datetime,Source_Chamber. Optional but used when present:cycle_end,co2_r2,co2_nrmse,co2_snr,co2_outlier_frac,slope_diff_pct,delta_aicc,sensor_co2_qc_mean,sensor_h2o_qc_mean,flux_intercept,anomaly_ensemble_score,closure_confidence,co2_qc.- configdict, optional
Key-value overrides merged on top of
DEFAULT_CONFIG. Pass only the keys you want to change; all others keep their defaults.
Attributes#
- cycles_dfpd.DataFrame
Working copy of the input cycles. After
score_cycles()this gains per-component score columns (score_regression,score_robustness, etc.) and the compositecycle_confidencecolumn.- configdict
Merged configuration (your overrides +
DEFAULT_CONFIGfallbacks).- drift_dfpd.DataFrame or None
Per
(date, Source_Chamber)drift summary — set bydetect_drift(). Columns:date,Source_Chamber,drift_severity, z-score columns.- regime_agreementdict or None
Date → cross-chamber agreement score from the 026 regime audit. Set by
load_regime_diagnostics(); None if the file was not found.- windows_dfpd.DataFrame or None
Window summary table — set by
identify_windows(). One row per window; columns includewindow_id,start_date,end_date,n_cycles,window_score,qualifies_for_export.- approved_windowsdict
{window_id: {"approved": bool, "notes": str}}— populated by the interactive inspector in the calibration notebook. Persisted viaexport().
Methods#
- load_regime_diagnostics(path)
Load cross-chamber agreement scores from the 026 audit CSV.
- detect_drift()
Compute per-day rolling drift severity per chamber.
- score_cycles()
Add
cycle_confidenceand per-component sub-scores tocycles_df.- identify_windows()
Find high-confidence date windows per chamber.
- export(approved_only, exclude_list)
Filter cycles to approved windows, write CSV + JSON, return both.
- summary()
Print a brief text overview of selection results.
Examples#
Build a selector on a small fixture and inspect the result:
>>> import pandas as pd >>> from palmwtc.windows import WindowSelector >>> cycles = pd.DataFrame({ ... "flux_datetime": pd.date_range("2024-01-01", periods=4, freq="6h"), ... "Source_Chamber": ["Chamber 1"] * 4, ... }) >>> ws = WindowSelector(cycles) >>> len(ws.cycles_df) 4 >>> ws.config["min_window_days"] 5
Full pipeline (needs a real cycles DataFrame with flux columns):
>>> ws.detect_drift().score_cycles().identify_windows() >>> filtered_df, manifest = ws.export()
- config#
- cycles_df#
- drift_df: pandas.DataFrame | None = None#
- regime_agreement: dict | None = None#
- windows_df: pandas.DataFrame | None = None#
- approved_windows: dict#
- load_regime_diagnostics(path: pathlib.Path | str | None = None) WindowSelector#
Load cross-chamber agreement scores from the 026 regime audit CSV.
Each CO₂ regime is assigned an agreement score based on the inter-chamber regression (slope proximity to 1.0 and R²). The score is stored as a per-date lookup in
self.regime_agreement.If the audit file does not exist (026 not run), this is a silent no-op and the cross_chamber component defaults to neutral in
score_cycles.Parameters#
- pathPath or str, optional
Override for the audit CSV path. Falls back to
config["regime_audit_path"].
Returns#
self
- detect_drift() WindowSelector#
Compute per-day rolling drift severity for each chamber.
Active drift signals (configurable via
config["drift_signals"]):night_intercept— seasonally detrended baseline shift offlux_intercept(nighttime cycles only) — detects zero-point / calibration drift
slope_divergence— seasonally detrended z-score ofslope_diff_pct(OLS vs Theil-Sen disagreement) — detects noise inflation
Signals not active by default (confounded by seasonal biology):
co2_slope— raw z-score ofco2_slopeflags seasonal phenology (leaf flush,drought) as drift; only valid if seasonally detrended externally.
h2o_slope— same issue; VPD-driven seasonal stomatal variation dominates.
Seasonal detrending: before computing the short-term rolling z-score (
drift_window_days), a long-term rolling median (seasonal_detrend_days, default 90 days) is subtracted from each signal. This removes the seasonal biological baseline, leaving only residual instrument drift in the z-score.Results are stored in
self.drift_dfwith columns:date, Source_Chamber, drift_severity, co2_slope_zscore, night_intercept_zscore, h2o_slope_zscore, slope_div_zscore
drift_severity= max across active signals, mapped to 0.0 (clean) / 0.5 (moderate) / 1.0 (severe).Returns#
- selfWindowSelector
Returns
selfto allow method chaining.
- _regression_score(r2, nrmse, snr, outlier) float#
- _robustness_score(slope_diff, delta_aicc) float#
- _closure_score(closure_confidence) float#
- _sensor_qc_score(co2_flag_mean, h2o_flag_mean) float#
- _anomaly_score(ensemble_score) float#
- _drift_score_lookup(date, chamber, drift_lookup: dict) float#
- score_cycles() WindowSelector#
Add
cycle_confidence(0–1) and per-component sub-scores tocycles_df.New columns added to
self.cycles_df(all 0–1):score_regression— R², NRMSE, SNR, outlier fraction (4 components; monotonicity is intentionally excluded because non-monotonic CO₂ traces in a tree chamber under variable irradiance reflect real photosynthesis).score_robustness— OLS vs Theil-Sen slope agreement, AICc curvature test.score_sensor_qc— CO₂/H₂O sensor flag mean from 021 parquet.score_drift— seasonally detrended instrument drift score.score_cross_chamber— cross-chamber agreement from 026 regime diagnostics (NaN when the 026 audit file was not loaded).score_closure— diagnostic only, not in composite; CO₂/H₂O ratio is a biological variable, not a physical leakage indicator.score_anomaly— diagnostic only, not in composite; anomaly detectors flag drought stress and rapid leaf flush that have calibration value.cycle_confidence— weighted composite of the five active components (seescore_weightsinDEFAULT_CONFIG).
Nighttime cycles carry full weight (
nighttime_weight = 1.0) because dark respiration is the primary constraint for Ra and Q10 calibration in XPalm.When cross-chamber data is unavailable, its weight (0.10 by default) is redistributed proportionally across the remaining four components.
Returns#
- selfWindowSelector
Returns
selfto allow method chaining.
Raises#
- (no explicit raises)
Silently proceeds even when optional score columns are absent from
cycles_df; missing columns default toNaN→ neutral score.
Notes#
Call
detect_drift()first. If not called, drift component defaults to 1.0 (no drift assumed), which gives slightly optimistic scores.
- identify_windows() WindowSelector#
Find high-confidence windows per chamber with rolling flexibility.
Algorithm#
For each (chamber, date):
daily_coverage= n_cycles / 95th-pct(cycles/day), capped at 1.0daily_good_frac= fraction of cycles withcycle_confidence >= config["confidence_good_threshold"]Mark day as qualifying if: -
daily_coverage >= min_daily_coverage_frac-daily_good_frac >= min_confidence_frac- Nois_instrumental_regime_change == Trueon that day(when
exclude_instrumental_regimesis True)Note:
grade_ab_frac(co2_qc ≤ 1) is computed for transparency but is NOT a qualifying gate — it double-counts sensor_qc which is already incycle_confidence, and 021 ROC flags can erroneously reject valid rapid photosynthetic drawdown cycles.Find windows where ≥
min_window_daysqualifying days occur within amin_window_days + window_flexibility_bufferday span. This allows up towindow_flexibility_buffernon-qualifying gap days (power outages, maintenance) within an otherwise good period without breaking the window.Window score = weighted combination:
0.40 × mean_cycle_confidence + 0.25 × mean_daily_coverage + 0.20 × (1 – mean_drift_severity) + 0.15 × diurnal_hour_coverage
where
diurnal_hour_coverage= fraction of hours 5–18 represented by ≥1 cycle (14 hours; extended from 7–17 to include dawn/dusk transitions for light-response fitting).
Results stored in
self.windows_dfwith columns:window_id, Source_Chamber, start_date, end_date, n_days, n_cycles, mean_confidence, mean_coverage, mean_drift_severity, mean_daytime_grade_ab_frac, mean_all_grade_ab_frac, mean_grade_a_frac, diurnal_hour_coverage, window_score, qualifies_for_export
Returns#
- selfWindowSelector
Returns
selfto allow method chaining.
Raises#
- RuntimeError
If
score_cycles()has not been called yet (cycle_confidencecolumn is missing fromcycles_df).
- export(approved_only: bool = True, exclude_list: list[int] | None = None) tuple[pandas.DataFrame, dict]#
Filter cycles to approved windows and write outputs.
Parameters#
- approved_onlybool
If True (default) and
approved_windowsis non-empty, only export cycles belonging to approved windows. Falls back toqualifies_for_exportflag when no manual approvals exist.- exclude_listlist of int, optional
Window IDs to explicitly exclude from export (after visual inspection in 034 or other audit notebooks).
Returns#
- (filtered_df, manifest)tuple
filtered_df— cycle-level DataFrame ready for XPalm calibration.manifest— dict written tocalibration_window_manifest.json.
- palmwtc.windows.merge_sensor_qc_onto_cycles(cycles_df: pandas.DataFrame, qc_df: pandas.DataFrame, co2_col: str = 'CO2_qc_flag', h2o_col: str = 'H2O_qc_flag', chamber_map: dict | None = None) pandas.DataFrame#
Aggregate per-cycle mean sensor QC flags from the high-frequency QC parquet.
Uses a vectorized interval approach via
pd.merge_asofto avoid per-row iteration over 58 k cycles. The result adds two columns tocycles_df:sensor_co2_qc_mean— mean CO₂ qc_flag across the cycle window (0=clean, 2=bad)sensor_h2o_qc_mean— mean H₂O qc_flag across the cycle window
Parameters#
- cycles_dfpd.DataFrame
Cycle-level data from notebook 030 (must have
flux_datetime,cycle_end, andSource_Chamber).- qc_dfpd.DataFrame
High-frequency sensor QC parquet (from notebooks 021/022). Must have a
TIMESTAMPcolumn plus chamber-specific flag columns. Column naming expected:CO2_C1_qc_flag,CO2_C2_qc_flag, etc. Pass a pre-loaded DataFrame — this function does not do I/O.- co2_col, h2o_colstr
Base column name stubs (without chamber suffix).
- chamber_mapdict or None
Maps
Source_Chambervalues to the suffix used inqc_df(e.g.,{"Chamber 1": "C1", "Chamber 2": "C2"}). If None, inferred automatically from the first unique chamber names.
Returns#
- pd.DataFrame
Copy of
cycles_dfwithsensor_co2_qc_meanandsensor_h2o_qc_meanappended.