palmwtc.viz.qc_plots

palmwtc.viz.qc_plots#

QC flag visualisation for chamber sensor streams.

Static matplotlib plots that turn the QC output of palmwtc.qc.rules and palmwtc.qc.processor into human-readable diagnostics:

visualize_qc_flags() – multi-panel scatter coloured by flag value, with per-method breakdown panels and a summary row.
plot_qc_comparison() – 2x2 horizontal-bar + stacked-bar comparison of flag percentages across multiple variables.
plot_qc_summary_heatmap() – heatmap of flag percentages, one row per variable, three columns (Flag 0 / 1 / 2).
filter_plot() – two-chamber overlay for one variable with hard/soft threshold lines drawn on the axes.
plot_soil_var() – multi-depth soil sensor profile drawn on a single axis (depths 15, 48, 80, 200, 350 cm).
plot_high_quality_timeseries() – line plot of Flag-0 data only.
plot_drift_and_hq_timeseries() – two-panel figure: drift score (top) and Flag-0 timeseries (bottom), sharing the x-axis.
plot_baseline_drift() – daily minimum/mean overlay with an expected baseline value and tolerance band.
plot_breakpoints_analysis() – two-figure set: (1) timeseries with breakpoint lines and segment means; (2) breakpoint table.
visualize_breakpoints() – three-panel overview: timeseries with kept/ignored breakpoint lines, segment means, and confidence-score bars.
visualize_missing_data() – two-panel figure showing the raw timeseries (top) and gap-duration bars over time (bottom).
visualize_drift() – deprecated stub; prints a notice and returns.

Flag value semantics used throughout this module:

0 = Good (green).
1 = Suspect / soft warning (orange).
2 = Bad / hard fail (red).

All plots use matplotlib.pyplot() defaults. No custom style is applied inside this module; call palmwtc.viz.set_style() before plotting if you want the package-wide appearance.

Notes#

Visualization functions are deterministic and do not require random seeds.

Functions#

`visualize_qc_flags`() → matplotlib.figure.Figure \| None)	Multi-panel scatter plot of QC flags with per-method breakdown.
`plot_qc_comparison`(→ matplotlib.figure.Figure)	2x2 panel comparison of QC flag percentages across multiple variables.
`plot_qc_summary_heatmap`() → matplotlib.figure.Figure)	Heatmap of QC pass/suspect/fail percentages for all variables.
`filter_plot`(→ None)	Draw a two-chamber overlay for one variable onto an existing axis.
`plot_soil_var`(→ bool)	Draw multi-depth soil sensor profiles onto an existing axis.
`plot_high_quality_timeseries`(→ None)	Line plot of Flag-0 (high-quality) data for one variable.
`plot_drift_and_hq_timeseries`(→ None)	Two-panel figure: drift score (top) and Flag-0 timeseries (bottom).
`visualize_drift`(→ None)	Deprecated -- drift visualisation stub.
`plot_baseline_drift`(→ None)	Line plot of daily minimum and mean values to check sensor baseline drift.
`plot_breakpoints_analysis`(, max_table_rows, ...)	Two-figure breakpoint analysis: annotated timeseries + breakpoint table.
`visualize_missing_data`() → matplotlib.figure.Figure \| None)	Two-panel figure showing data availability and gap distribution.
`visualize_breakpoints`(→ None)	Three-panel overview of breakpoints with kept/ignored distinction.

Module Contents#

palmwtc.viz.qc_plots.visualize_qc_flags(df: pandas.DataFrame, var_name: str, qc_results: dict, config: dict | None = None, figsize: tuple[float, float] = (15, 20)) → matplotlib.figure.Figure | None#

Multi-panel scatter plot of QC flags with per-method breakdown.

Produces a figure whose row count adapts to how many QC methods fired:

Row 1 (combined): scatter plot of var_name coloured by the combined {var_name}_qc_flag column. Green = Flag 0, orange = Flag 1, red x = Flag 2. Hard/soft bound lines added when config is supplied.
Rows 2..N (one per active method): the raw timeseries in grey with flagged points coloured orange (Suspect) or red-x (Bad). Active methods checked: Physical Bounds, Persistence, Rate of Change, IQR Outliers.
Last row (summary): left panel = bar chart of flag counts (Good / Suspect / Bad); right panel = horizontal bar chart of flag counts by method (Bounds / IQR / RoC / Persist).

Parameters#

dfpd.DataFrame

QC-flagged DataFrame. Must contain:

{var_name}numeric: Raw sensor values.
{var_name}_qc_flagint in {0, 1, 2}: Combined QC flag column. May be absent if qc_results["final_flags"] is provided instead.

var_namestr

Name of the sensor variable column to plot.

qc_resultsdict

Per-variable QC result dict from palmwtc.qc.process_variable_qc(). Expected keys:

"final_flags"pd.Series: Combined flag series (used as fallback if the flag column is absent from df).
"bounds_flags"pd.Series or None: Flags from the Physical Bounds rule.
"persistence_flags"pd.Series or None: Flags from the Persistence rule.
"roc_flags"pd.Series or None: Flags from the Rate-of-Change rule.
"iqr_flags"pd.Series or None: Flags from the IQR Outlier rule.
"summary"dict: Keys "flag_0_count", "flag_1_count", "flag_2_count".

configdict, optional

Variable configuration used for axis labels and threshold lines. Expected keys (all optional):

"label"str: Y-axis label.
"soft"tuple of (float, float): (min, max) soft bounds drawn as orange dashed lines.
"hard"tuple of (float, float): (min, max) hard bounds drawn as red dashed lines.

figsizetuple of (float, float), optional

Figure width and height in inches. Height is overridden upward when many method panels are present (4 rows per extra method).

Returns#

matplotlib.figure.Figure or None: The figure, or None if var_name is not found in df.

Notes#

Flag values:

0 = Good (green scatter, small points).
1 = Suspect (orange scatter, medium points).
2 = Bad (red x scatter, slightly larger points).

The summary row reads counts from qc_results["summary"], which reflects the full (unfiltered) dataset.

Examples#

>>> from palmwtc.viz.qc_plots import visualize_qc_flags
>>> fig = visualize_qc_flags(df, "CO2_Avg", qc_results["CO2_Avg"])  

Parameters#

dfpd.DataFrame

QC-flagged DataFrame. Not directly read inside this function; passed for API consistency with other QC plot functions.

var_nameslist of str

Variable names to include. Only variables that exist in qc_results are shown.

qc_resultsdict

Mapping {var_name: result_dict}. Each result_dict must contain either:

A nested "summary" dict with keys "flag_0_percent", "flag_1_percent", "flag_2_percent" (V1 structure), or
Those same keys at the top level (V2 flat structure).

figsizetuple of (float, float), optional

Figure width and height in inches. Default: (18, height) where height scales with the number of variables (minimum 12 inches).

Returns#

matplotlib.figure.Figure: The 2x2 figure.

Notes#

All four panels share the same y-axis variable list. The x-axis is fixed to 0-105 % in all panels so the eye can compare across panels without re-scaling.

Examples#

>>> from palmwtc.viz.qc_plots import plot_qc_comparison
>>> fig = plot_qc_comparison(df, ["CO2_Avg", "H2O_Avg"], qc_results)  

Parameters#

qc_resultsdict

Mapping {var_name: result_dict}. Each result_dict must contain either a nested "summary" sub-dict or the summary keys at the top level. Required keys inside the summary:

"flag_0_percent"float: Percentage of timestamps with Flag 0 (Good).
"flag_1_percent"float: Percentage of timestamps with Flag 1 (Suspect).
"flag_2_percent"float: Percentage of timestamps with Flag 2 (Bad).

figsizetuple of (float, float), optional

Figure width and height in inches.

Returns#

matplotlib.figure.Figure: The heatmap figure.

Notes#

The three percentage columns for a given variable should sum to approximately 100 %. Small deviations are possible due to rounding.

Examples#

>>> from palmwtc.viz.qc_plots import plot_qc_summary_heatmap
>>> fig = plot_qc_summary_heatmap(qc_results)  

Parameters#

axmatplotlib.axes.Axes

Axes to draw on. The function modifies ax in place and returns None.

df_pd.DataFrame

DataFrame with a TIMESTAMP column and the sensor columns. Must contain:

TIMESTAMPpd.Timestamp: Datetime column used for the x-axis.
{col_c1}numeric, optional: Chamber 1 values.
{col_c2}numeric, optional: Chamber 2 values.

col_c1str

Column name for Chamber 1 data in df_.

col_c2str

Column name for Chamber 2 data in df_.

var_keystr

Key used to look up the variable’s configuration inside var_config.

var_configdict

Mapping {var_key: config_dict}. The config_dict may contain:

"hard"tuple of (float, float): (min, max) hard bounds. Points outside are removed when use_physical_limits=True. Drawn as red dashed lines.
"soft"tuple of (float, float): (min, max) soft bounds. Drawn as orange dotted lines.
"label"str: Y-axis label.
"title"str: Axis title.

use_physical_limitsbool, optional

If True, remove values outside hard bounds before plotting and draw threshold lines. Default: True.

ylim_padding_fracfloat, optional

Fraction of data range added as padding above and below the y-axis limits. Default: 0.06.

Returns#

None: The axis is modified in place.

Notes#

If var_key is not found in var_config, the function sets a plain title and grid, then returns without plotting.

If neither chamber column contains any valid (non-NaN, in-bounds) data, the function likewise returns after setting the title.

Examples#

>>> import matplotlib.pyplot as plt
>>> from palmwtc.viz.qc_plots import filter_plot
>>> fig, ax = plt.subplots()
>>> filter_plot(ax, df, "C1_CO2", "C2_CO2", "CO2", var_config)  

Parameters#

axmatplotlib.axes.Axes

Axes to draw on. Modified in place; returns a status boolean.

var_keystr

Key used to look up the variable’s configuration in var_config.

title_prefixstr

String prepended to the axis title (e.g. the site name or measurement period).

plot_dfpd.DataFrame

Data to plot. Must contain:

TIMESTAMPpd.Timestamp: Datetime column used for the x-axis.
{pattern}_{depth}_Avg_Soilnumeric: One column per depth that should be shown.

var_configdict

Mapping {var_key: config_dict}. The config_dict may contain:

"pattern"str: Column-name prefix (e.g. "SWC").
"hard"tuple of (float, float): Hard bounds; points outside are removed when use_physical_limits=True. Drawn as red dashed lines.
"soft"tuple of (float, float): Soft bounds drawn as orange dotted lines.
"label"str: Y-axis label.

use_physical_limitsbool, optional

If True, filter values outside hard bounds and draw threshold lines. Default: True.

ylim_padding_fracfloat, optional

Fraction of data range added as padding. Default: 0.06.

Returns#

bool: True if at least one depth was plotted; False if no data were available or var_key was not found in var_config.

Notes#

The legend uses up to 5 entries (one per depth) plus threshold lines, arranged in a single row with ncol=5 to keep the axis compact.

Examples#

>>> import matplotlib.pyplot as plt
>>> from palmwtc.viz.qc_plots import plot_soil_var
>>> fig, ax = plt.subplots()
>>> has_data = plot_soil_var(ax, "SWC", "Soil Water Content", df, var_config)  

Parameters#

dfpd.DataFrame

QC-flagged DataFrame indexed by datetime. Must contain:

{var_name}numeric: Sensor values.
{qc_flag_col}int in {0, 1, 2}, optional: QC flag column. Falls back to all data if absent.

var_namestr

Name of the sensor variable column to plot.

qc_flag_colstr, optional

Name of the QC flag column. Default: f"{var_name}_qc_flag".

titlestr, optional

Axis title. Default: f"High Quality Time Series: {var_name}".

Returns#

None: The figure is displayed via matplotlib.pyplot.show().

Notes#

Flag value 0 = Good (the only data shown in this plot).

This function creates a new matplotlib.pyplot.figure() on each call; it does not return the figure object. Use plot_drift_and_hq_timeseries() if you need a figure you can embed in a larger layout.

Examples#

>>> from palmwtc.viz.qc_plots import plot_high_quality_timeseries
>>> plot_high_quality_timeseries(df, "CO2_Avg")  

Parameters#

dfpd.DataFrame

QC-flagged DataFrame indexed by datetime. Must contain:

{var_name}numeric: Sensor values.
{qc_flag_col}int in {0, 1, 2}, optional: QC flag column. If absent, all non-NaN data are used for the bottom panel.

var_namestr

Name of the sensor variable column.

drift_resultdict or pd.DataFrame or None

Drift detection output from palmwtc.qc.drift.detect_drift_windstats(). Accepted forms:

dict with a "scores" key whose value is a DataFrame, a Series with a .to_pd() method, or a plain DataFrame.
A pd.DataFrame used directly as the drift score table.

The expected column in the resolved DataFrame is f"{var_name}_drift_score". Pass None to skip drift panel.

qc_flag_colstr, optional

QC flag column name. Default: f"{var_name}_qc_flag".

Returns#

None: The figure is displayed via matplotlib.pyplot.show().

Notes#

Flag value 0 = Good (the only data shown in the bottom panel).

Drift scores are produced by the windowed-statistics drift detector in palmwtc.qc.drift. A score near 0 indicates stable sensor baseline; large deviations suggest drift.

Examples#

>>> from palmwtc.viz.qc_plots import plot_drift_and_hq_timeseries
>>> plot_drift_and_hq_timeseries(df, "CO2_Avg", drift_res)  

Parameters#

dfpd.DataFrame: Ignored.
drift_resultobject: Ignored.
var_namestr: Ignored.

Returns#

None

Deprecated since version Use: plot_drift_and_hq_timeseries() with the windowed-statistics drift result from palmwtc.qc.drift instead.

palmwtc.viz.qc_plots.plot_baseline_drift(baseline_df: pandas.DataFrame, column: str, expected: float) → None#

Line plot of daily minimum and mean values to check sensor baseline drift.

Draws three elements on a single axis:

Daily minimum (blue solid line): f"{column}_daily_min" column.
Daily mean (green dashed line): f"{column}_daily_mean" column.
Expected baseline (red dotted horizontal line) at expected with a shaded tolerance band of +/-50 ppm around it.

Calls matplotlib.pyplot.show() before returning.

Parameters#

baseline_dfpd.DataFrame

DataFrame indexed by date. Must contain:

{column}_daily_minfloat: Daily minimum values of the sensor.
{column}_daily_meanfloat: Daily mean values of the sensor.

columnstr

Sensor column base name (e.g. "CO2_Avg"). Used for axis labels and the title.

expectedfloat

Expected baseline concentration in ppm (e.g. 400 for ambient CO2). The tolerance band spans expected - 50 to expected + 50.

Returns#

None: The figure is displayed via matplotlib.pyplot.show().

Notes#

The +/-50 ppm tolerance band is hard-coded and is intended for CO2 sensors (LI-COR LI-850). For other variables the band width may not be meaningful.

If {column}_daily_min is not present in baseline_df, the plot is created but remains empty (no error is raised).

Examples#

>>> from palmwtc.viz.qc_plots import plot_baseline_drift
>>> plot_baseline_drift(baseline_df, "CO2_Avg", expected=400.0)  

Parameters#

dfpd.DataFrame

QC-flagged DataFrame indexed by datetime. Must contain:

{var_name}numeric: Sensor values.
{qc_flag_col}int in {0, 1, 2}, optional: QC flag column. Only Flag-0 rows are plotted. If absent, all non-NaN values are used.

var_namestr

Name of the sensor variable column.

breakpoint_resultdict or None

Breakpoint detection output from palmwtc.qc.breakpoints.detect_breakpoints_ruptures(). Expected keys:

"breakpoints"list of pd.Timestamp: Detected breakpoint timestamps.
"confidence_scores"list of float: Confidence score (0-1) per breakpoint.
"segment_info"list of dict: Each dict has keys "mean", "std", "start", "end".
"n_breakpoints"int: Total breakpoint count (used when show_all_breakpoints is True).

Pass None to get (None, None) back.

qc_flag_colstr, optional

QC flag column name. Default: f"{var_name}_qc_flag".

figsizetuple of (float, float), optional

Size of Figure 1 (the analysis plot). Default: (16, 10).

max_table_rowsint, optional

Maximum rows in the table figure. Default: 15.

min_confidencefloat, optional

Minimum confidence score for breakpoints shown in Figure 1. Ignored when show_all_breakpoints=True. Default: 0.0 (show all).

show_all_breakpointsbool, optional

If True, ignore min_confidence and plot every breakpoint in Figure 1. Default: False.

Returns#

tuple of (matplotlib.figure.Figure, matplotlib.figure.Figure): (fig_analysis, fig_table) – the two figures.
tuple of (None, None): Returned when breakpoint_result is None or no data are available for the variable.

Notes#

Confidence score thresholds used in the table label:

>= 0.7 = “Robust”
>= 0.4 = “Moderate”
< 0.4 = “Weak”

Breakpoint detection uses the ruptures library (L2 cost, PELT algorithm). See palmwtc.qc.breakpoints.detect_breakpoints_ruptures() for details.

Examples#

>>> from palmwtc.viz.qc_plots import plot_breakpoints_analysis
>>> fig1, fig2 = plot_breakpoints_analysis(df, "CO2_Avg", bp_result)  

Parameters#

dfpd.DataFrame

DataFrame with a datetime index. Must contain:

{var_name}numeric: Sensor values. NaN rows are treated as missing.

var_namestr

Name of the sensor variable column.

frequency_secondsfloat, optional

Expected sampling interval in seconds (e.g. 4.0 for a 4 s LI-850 stream). If None, the function looks up config[key]["measurement_frequency"]; if still not found, defaults to 4.0 seconds with a warning printed to stdout.

configdict, optional

Variable configuration dict (same structure as var_config used elsewhere). Searched for var_name by direct key match, "columns" list membership, or "pattern" prefix.

figsizetuple of (float, float), optional

Figure width and height in inches.

Returns#

matplotlib.figure.Figure or None: The figure, or None if var_name is not in df or no valid data points exist.

Notes#

Gap detection threshold: a gap is counted when the time between two consecutive non-NaN records exceeds 2.5 * frequency_seconds. This allows for small timing jitter without false positives.

The missing-data percentage shown in the title is estimated as:

missing_pct = (expected_points - actual_points) / expected_points * 100

where expected_points = total_duration_sec / frequency_seconds.

Examples#

>>> from palmwtc.viz.qc_plots import visualize_missing_data
>>> fig = visualize_missing_data(df, "CO2_Avg", frequency_seconds=4.0)  

Parameters#

dfpd.DataFrame

DataFrame indexed by datetime. Must contain:

{var_name}numeric: Raw sensor values. Resampled to 1-hour mean for plotting.

var_namestr

Name of the sensor variable column.

bp_resultdict

Breakpoint detection output from palmwtc.qc.breakpoints.detect_breakpoints_ruptures(). Expected keys:

"n_breakpoints"int: Total number of detected breakpoints.
"breakpoints"list of pd.Timestamp: Breakpoint timestamps.
"segment_info"list of dict: Each dict has keys "mean", "std", "start", "end".
"confidence_scores"list of float: Confidence score per breakpoint.

filtered_bpslist of pd.Timestamp, optional

Subset of breakpoints to mark as “kept” (green solid lines). Breakpoints not in this list are drawn red dashed. If None, all breakpoints are drawn red dashed.

title_suffixstr, optional

Extra text appended to Panel 1’s title (e.g. a date range or site label).

Returns#

None: The figure is displayed via matplotlib.pyplot.show() and a summary table is printed to stdout.

Notes#

The data is resampled to 1-hour means before plotting so that very high-frequency streams (e.g. 4-second LI-850 data) render in a reasonable time without explicit downsampling.

The {var_name} (ppm) label is used literally on the y-axis of Panel 1, so it is most appropriate for CO2 and H2O concentration variables.

Examples#

>>> from palmwtc.viz.qc_plots import visualize_breakpoints
>>> visualize_breakpoints(df, "CO2_Avg", bp_result, filtered_bps=kept)  

palmwtc.viz.qc_plots

Contents

palmwtc.viz.qc_plots#

Notes#

Functions#

Module Contents#

Parameters#

Returns#

Notes#

Examples#

See Also#

Parameters#

Returns#

Notes#

Examples#

See Also#

Parameters#

Returns#

Notes#

Examples#

See Also#

Parameters#

Returns#

Notes#

Examples#

See Also#

Parameters#

Returns#

Notes#

Examples#

See Also#

Parameters#

Returns#

Notes#

Examples#

See Also#

Parameters#

Returns#

Notes#

Examples#

See Also#

Parameters#

Returns#

Parameters#

Returns#

Notes#

Examples#

See Also#

Parameters#

Returns#

Notes#

Examples#

See Also#

Parameters#

Returns#

Notes#

Examples#

See Also#

Parameters#

Returns#

Notes#

Examples#

See Also#