trailpack.validation ==================== .. py:module:: trailpack.validation .. autoapi-nested-parse:: Validation module for Trailpack. Provides standards-based validation for data packages to ensure quality and compliance before submission to repositories. The validation system checks: - Metadata completeness: All required fields present (name, title, resources, etc.) - Data quality metrics: Missing values and duplicates (logged as info) - Type consistency: Mixed types and schema matching (raises errors) - Field definitions: Proper types, units for numeric fields Key Components: - StandardValidator: Main validation class for all checks - ValidationResult: Result object with errors, warnings, info, and compliance level - Standards YAML: Versioned validation rules in standards/v*.yaml Data Quality vs Type Consistency: - Data quality metrics (nulls, duplicates) are logged as INFO messages - Type consistency issues (mixed types, schema mismatches) raise ERRORS - Only errors cause validation to fail Unit Requirements: All numeric fields (type "number" or "integer") must have units specified, even for dimensionless quantities like IDs and counts. Use the QUDT vocabulary for unit definitions (http://qudt.org/vocab/unit/). Inconsistency Tracking and Export: When type inconsistencies are detected (e.g., mixed types in a column), each inconsistent value is automatically tracked and exported to 'data_inconsistencies.csv' when the ValidationResult is printed. This CSV file contains: - row: Row index of the inconsistent value - column: Column name - value: The actual value - actual_type: Python type of the value - expected_type: Most common type in the column This export happens automatically for easy data cleaning workflows. .. rubric:: Example >>> from trailpack.validation import StandardValidator >>> validator = StandardValidator("1.0.0") >>> result = validator.validate_data_quality(df, schema) >>> print(result) # Automatically exports inconsistencies.csv if issues found >>> if result.is_valid: ... print(f"{result.level}") ... else: ... for error in result.errors: ... print(f"Error: {error}") Submodules ---------- .. toctree:: :maxdepth: 1 /content/api/trailpack/validation/standard_validator/index Attributes ---------- .. autoapisummary:: trailpack.validation.STANDARDS_DIR Functions --------- .. autoapisummary:: trailpack.validation.get_standard_path trailpack.validation.list_available_standards Package Contents ---------------- .. py:function:: get_standard_path(version: str = '1.0.0') -> pathlib.Path Get the path to a specific standard version. :param version: Standard version (default: "1.0.0") :returns: Path to the standard YAML file :raises FileNotFoundError: If the standard version doesn't exist .. py:function:: list_available_standards() -> list[str] List all available standard versions. :returns: List of version strings (e.g., ["1.0.0"]) .. py:data:: STANDARDS_DIR