Xinter Documentation¶
Welcome to xinter - a comprehensive linting and data quality checking tool for xarray datasets.
Overview¶
xinter provides automated data quality checks for xarray datasets, helping you identify issues like missing values, outliers, incorrect units, and other data anomalies. It features an extensible architecture that allows you to easily add custom checkers for your specific data validation needs.
Features¶
25+ Built-in Checkers - Comprehensive checks for data quality
Extensible Architecture - Easily add custom checkers
Rich CLI Output - Beautiful terminal output with tables
Interactive Dashboard - Web-based GUI for exploring results
DataFrame Export - Convert results to pandas DataFrames
:material-coordinate-map: Coordinate Checking - Check coordinate arrays in addition to data variables
Group Support - Handle datasets with hierarchical groups
Installation¶
Install xinter using pip:
Or install from source:
Quick Start¶
Command Line Interface¶
Lint a single file:
Lint multiple files:
Check coordinates in addition to data variables:
Specify a group within the dataset:
Python API¶
Use xinter programmatically in your Python code:
from xinter.core import lint_dataset
from xinter.cli import reports_to_dataframe
# Lint a dataset
reports = lint_dataset("mydata.zarr", check_coords=True)
# Convert to DataFrame for analysis
df = reports_to_dataframe(reports)
# Filter for failed checks
failures = df[~df["success"]]
print(failures)
# Export to CSV
df.to_csv("lint_report.csv", index=False)
Basic Usage Example¶
Here's a complete example showing how to lint a dataset and analyze the results:
import xarray as xr
from xinter.core import lint_dataset
from xinter.cli import reports_to_dataframe
# Load your dataset
ds = xr.open_dataset("mydata.nc")
# Run linting
reports = lint_dataset("mydata.nc", check_coords=False)
# Convert to DataFrame
df = reports_to_dataframe(reports)
# Show statistics
print(f"Total checks: {len(df)}")
print(f"Passed: {df['success'].sum()}")
print(f"Failed: {(~df['success']).sum()}")
# View failed checks
print("\nFailed checks:")
print(df[~df["success"]][["variable_name", "checker", "message"]])
Output Format¶
The linting results are returned as a dictionary where each key is a variable name and each value is another dictionary of checker results. You can easily convert this to a pandas DataFrame for further analysis:
# Example output structure
{
'temperature': {
'nan_percent': LinterResult(value=0.05, message="5.00% NaNs found.", success=True),
'mean': LinterResult(value=273.15, message="Mean value: 273.15", success=True),
# ... more checks
},
'pressure': {
# ... checks for pressure variable
}
}
CLI Options¶
The xl command supports various options:
| Option | Description |
|---|---|
--coords |
Check coordinate arrays in addition to data variables |
--group <path> |
Specify a group within the dataset (e.g., /equilibrium) |
--output <file> |
Save results to a parquet file for use with the GUI |
-h, --help |
Show help message |
What Gets Checked?¶
By default, xinter runs all registered checkers on every data variable in your dataset:
- Statistical properties: mean, standard deviation, min, max, range
- Data quality: NaN values, infinite values, duplicates, constants
- Distribution metrics: skewness, kurtosis, entropy
- Outlier detection: IQR-based outlier proportion
- Type validation: data types, shape, size
- Metadata checks: units, dimension names
- Coordinate checks: uniformity, constant spacing (when
--coordsis used)
See the Available Linters page for a complete list and detailed descriptions.
Next Steps¶
- Explore all Available Linters and what they check for
- Learn about the interactive GUI Dashboard for visualizing results
- Create custom checkers for your specific validation needs
Support¶
For issues, questions, or contributions, visit the GitHub repository.