Quickstart — your first flux number in 3 minutes

Quickstart — your first flux number in 3 minutes#

palmwtc is a Python library for automated whole-tree chamber measurements on individual oil palm trees. Each chamber encloses one tree and measures CO₂ and H₂O concentrations every 30 seconds using a LI-COR LI-850 analyser. This guide takes you from installation to a validated CO₂ flux number using the bundled synthetic sample — no field data required.

1. Install#

pip install 'palmwtc[ml,interactive]'

The ml extra adds scikit-learn (for ML-assisted QC outlier detection). The interactive extra adds Plotly (for the interactive dashboard). If you have an NVIDIA or Apple Silicon GPU, add [gpu] as well — it accelerates the Isolation Forest step in large datasets.

Requires Python 3.11–3.13.

2. Load the bundled synthetic sample#

palmwtc ships with a one-week synthetic dataset (30-second cadence, 2 chambers) so every step below works immediately after install — no config file needed.

from palmwtc.config import DataPaths

paths = DataPaths.resolve()
print(paths.describe())

Expected output:

DataPaths (source=sample (bundled synthetic), site=libz):
  raw_dir       = .../palmwtc/data/sample/synthetic
  processed_dir = .../palmwtc/data/sample/Data/Integrated_QC_Data
  exports_dir   = .../palmwtc/data/sample/exports
  config_dir    = .../palmwtc/data/sample/config
  extras        = <none>

When source=sample (bundled synthetic) appears, you are working with the bundled data. To point palmwtc at your own chamber data, set the PALMWTC_DATA_DIR environment variable or create a palmwtc.yaml config file — see the tutorials for details.

Load the raw sensor parquet:

import pandas as pd

df = pd.read_parquet(paths.raw_dir / "QC_Flagged_Data_synthetic.parquet")
print(df.shape)          # (20160, 19)  — 7 days × 2880 rows/day × 30 s
print(df.columns.tolist())

The dataset has one row per 30-second interval. Columns follow the pattern CO2_C1, H2O_C1 (chamber 1) and CO2_C2, H2O_C2 (chamber 2), plus temperature, humidity, atmospheric pressure, battery voltage, and existing QC flag columns for each chamber.

3. Quality-control the CO₂ signal#

QCProcessor checks each sensor variable against physical limits, rate-of-change thresholds, and a stuck-sensor (persistence) test. It adds a flag column for every variable you process:

0 = good
1 = suspect (minor deviation)
2 = bad (reject)

from palmwtc.qc import QCProcessor

# Define limits for CO2 from chamber 1.
# hard = absolute physical bounds (any value outside → flag 2).
# soft = expected operating range (outlier scoring uses this range).
co2_config = {
    "co2": {
        "columns": ["CO2_C1"],
        "hard": [300, 600],          # ppm — hard physical limits
        "soft": [350, 550],          # ppm — expected operating range
        "rate_of_change": {"limit": 50},   # max ppm per 30-s step
        "persistence": {"window": 5},      # flag if stuck for 5+ steps
    }
}

qc = QCProcessor(df=df, config_dict=co2_config)
result = qc.process_variable("CO2_C1", random_seed=42)

flagged_df = qc.get_processed_dataframe()
print(flagged_df[["TIMESTAMP", "CO2_C1", "CO2_C1_rule_flag", "CO2_C1_qc_flag"]].head(5))

Expected output:

            TIMESTAMP      CO2_C1  CO2_C1_rule_flag  CO2_C1_qc_flag
2026-03-01 00:00:00  401.461633                 0               0
2026-03-01 00:00:30  401.923752                 0               0
2026-03-01 00:01:00  403.618192                 0               0
2026-03-01 00:01:30  404.694238                 0               0
2026-03-01 00:02:00  404.516029                 0               0

Check how many points were flagged:

summary = result["summary"]
print(f"Good (flag 0): {summary['flag_0_count']} ({summary['flag_0_percent']:.1f} %)")
print(f"Suspect (flag 1): {summary['flag_1_count']} ({summary['flag_1_percent']:.2f} %)")
print(f"Bad (flag 2): {summary['flag_2_count']} ({summary['flag_2_percent']:.2f} %)")

Expected output (bundled synthetic sample):

Good (flag 0): 20155 (100.0 %)
Suspect (flag 1): 1 (0.00 %)
Bad (flag 2): 4 (0.02 %)

4. Compute CO₂ fluxes#

Each measurement cycle is one closed-chamber period (typically ~5 minutes of continuous 30-second readings). palmwtc fits a linear regression to the rising or falling CO₂ curve inside each cycle, converts the slope (ppm s⁻¹) to an absolute flux (µmol m⁻² s⁻¹), and returns one row per cycle.

First, prepare the single-chamber data stream:

from palmwtc.flux import prepare_chamber_data, calculate_flux_cycles

# Select chamber 1 columns, apply QC flag filtering, and run WPL correction.
chamber_df = prepare_chamber_data(flagged_df, "C1", require_h2o_for_wpl=False)

Then compute fluxes for every cycle:

cycles = calculate_flux_cycles(chamber_df, "Chamber 1", use_multiprocessing=False)

# "flux_date" is the output column name from calculate_flux_cycles.
# Rename it to "flux_datetime" — the name that WindowSelector and
# run_science_validation expect — so the same DataFrame works for all steps.
cycles = cycles.rename(columns={"flux_date": "flux_datetime"})

print(cycles[["cycle_id", "flux_datetime", "flux_absolute", "flux_slope", "r2", "qc_flag"]].head(5))

Expected output (values vary with the synthetic sample):

   cycle_id       flux_datetime  flux_absolute  flux_slope        r2  qc_flag
0         1 2026-03-01 00:00:00      -2.358445   -0.009627  0.512461        0
1         2 2026-03-03 12:45:00       0.036619    0.000150  0.109499        1

Key columns:

Column	Unit	Meaning
`flux_absolute`	µmol m⁻² s⁻¹	CO₂ flux (negative = uptake by tree)
`flux_slope`	ppm s⁻¹	raw CO₂ slope inside the cycle
`r2`	—	R² of the linear fit (higher = more linear cycle)
`qc_flag`	0/1/2	0 = A-grade, 1 = B-grade, 2 = rejected

5. Select calibration windows#

A calibration window is a consecutive span of high-quality days whose cycle data can be used to train the XPalm digital-twin model. WindowSelector scores every cycle across five components (regression quality, robustness, sensor QC, drift, and cross-chamber agreement), then identifies qualifying date ranges.

from palmwtc.windows import WindowSelector

ws = WindowSelector(cycles)
ws.score_cycles()
ws.identify_windows()
ws.summary()

After score_cycles(), your cycles DataFrame gains a cycle_confidence column (0–1 scale). After identify_windows(), ws.windows_df lists every qualifying window with its start_date, end_date, n_cycles, and window_score.

print(ws.cycles_df[["cycle_id", "flux_datetime", "cycle_confidence"]].head(5))

Note: the bundled one-week synthetic sample is too short for the minimum 5-day window requirement, so ws.windows_df will be empty on this dataset. Run WindowSelector on a real multi-week dataset to see qualifying windows. See tutorials/032_Window_Selection_Production.ipynb for a worked example.

6. Validate against ecophysiology literature#

run_science_validation runs four tests that check whether your flux data is consistent with published values for oil palm:

Light-response curve — does Amax fall within the expected range?
Temperature response (Q10) — is the nighttime respiration temperature sensitivity between 1.5 and 3.5?
Water-use efficiency (WUE) — does WUE decrease as VPD increases?
Inter-chamber agreement — are the two chambers tracking each other?

The validator needs several columns beyond the basic flux output. In a full pipeline run these come from the QC and H₂O steps; here we add them as placeholders so the validator can run and demonstrate the scorecard structure:

from palmwtc.validation import run_science_validation

# Columns required by run_science_validation but not produced by
# calculate_flux_cycles alone — in a real run, these come from the full
# pipeline (radiation logger, H2O flux, VPD from weather station).
cycles["Global_Radiation"] = float("nan")  # W m⁻²
cycles["h2o_slope"] = float("nan")         # mmol m⁻² s⁻¹
cycles["co2_slope"] = cycles["flux_slope"] # µmol m⁻² s⁻¹ (alias)
cycles["vpd_kPa"] = float("nan")           # kPa

report = run_science_validation(cycles)

scorecard = report["scorecard"]
print(f"Tests passed: {scorecard['n_pass']}")
print(f"Borderline:   {scorecard['n_borderline']}")
print(f"Failed:       {scorecard['n_fail']}")
print(f"Insufficient data (N/A): {scorecard['n_na']}")

Expected output (bundled sample — all tests return N/A because the one-week dataset has too few cycles to fit any curve):

Tests passed: 0
Borderline:   0
Failed:       0
Insufficient data (N/A): 7

That is the correct scientific response — palmwtc never fabricates a pass. Run the validator on at least two weeks of continuous data with radiation, temperature, and H₂O columns populated to get meaningful PASS/FAIL results. See tutorials/033_Science_Validation.ipynb for a full worked example.

7. Plot the flux heatmap#

The flux heatmap shows mean CO₂ flux by hour-of-day (y-axis) and month-year (x-axis). Blue cells indicate uptake (photosynthesis during the day), red cells indicate efflux (respiration at night). The diurnal pattern is the first visual check that the system is capturing a real biological signal.

from palmwtc.viz import set_style, plot_flux_heatmap

set_style()   # apply the palmwtc matplotlib theme

# plot_flux_heatmap reads the "flux_date" column.
# Add it back as an alias of "flux_datetime" so the plot function works.
cycles["flux_date"] = cycles["flux_datetime"]

fig = plot_flux_heatmap(cycles)
fig.savefig("flux_heatmap.png", dpi=150, bbox_inches="tight")

You should see three vertically stacked subplots — one for both chambers combined, one for chamber 1, and one for chamber 2. With only one week of synthetic data the x-axis will show a single month column, but the diurnal pattern (negative flux during the day, near-zero at night) should be visible even in the short sample.

Next steps#

The executable version of this walkthrough — with all outputs already run — is in tutorials/000_Integrated_End_to_End.ipynb (coming soon).

For deeper dives into each step, the individual tutorials are good starting points:

tutorials/020_QC_Rule_Based.ipynb — full QC pipeline with multi-variable config
tutorials/022_QC_ML_Enhanced.ipynb — add ML-assisted outlier detection
tutorials/030_Flux_Cycle_Calculation.ipynb — flux calculation in detail
tutorials/032_Window_Selection_Production.ipynb — calibration window selection
tutorials/033_Science_Validation.ipynb — ecophysiology validation scorecard

To use your own chamber data, set PALMWTC_DATA_DIR to your data directory or write a palmwtc.yaml config file in your working directory, then re-run the steps above. The Science Reference explains the methods and the thresholds behind each QC rule.