trailpack.excel.reader
Excel reader module for loading and inspecting Excel files.
This module provides an ExcelReader class that: - Loads only the structure (sheets and columns) into memory - Provides access to sheet names - Provides access to column names for mapping
Classes
Excel file reader that loads only sheet structure (sheets and columns) into memory. |
Module Contents
- class trailpack.excel.reader.ExcelReader(file_path: str | pathlib.Path, header_row: int = 1)[source]
Excel file reader that loads only sheet structure (sheets and columns) into memory.
This class provides methods to inspect Excel file structure without loading all the data, making it memory-efficient for large files: - List available sheets - Get column names from specific sheets
Example
>>> reader = ExcelReader("data.xlsx") >>> sheet_names = reader.sheets() >>> columns = reader.columns("Sheet1")
Initialize ExcelReader and load sheet structure (sheets and columns) into memory.
Only the sheet names and column headers are loaded, not the actual data. This makes it memory-efficient for large Excel files.
- Parameters:
file_path – Path to the Excel file (.xlsx, .xlsm, .xltx, .xltm)
header_row – Row number containing column headers (1-indexed). Defaults to 1.
- Raises:
FileNotFoundError – If the file does not exist
ValueError – If the file is not a valid Excel file
- _load_structure()[source]
Load sheet structure (sheet names and column headers) from Excel file.
Opens the workbook in read-only mode, extracts structure, then closes it. Only loads metadata, not actual data, for memory efficiency.
- columns(sheet_name: str | None = None) List[str][source]
Get list of column names from a specific sheet.
- Parameters:
sheet_name – Name of the sheet to read columns from. If None, uses the first sheet.
- Returns:
List of column names as strings. Empty cells are returned as empty strings.
- Raises:
ValueError – If sheet_name doesn’t exist in the workbook
Example
>>> reader = ExcelReader("data.xlsx") >>> columns = reader.columns("Sheet1") >>> print(columns) ['ID', 'Name', 'Value', 'Date']
- get_structure() Dict[str, List[str]][source]
Get the complete sheet structure as a dictionary.
- Returns:
Dictionary mapping sheet names to their column lists
Example
>>> reader = ExcelReader("data.xlsx") >>> structure = reader.get_structure() >>> print(structure) {'Sheet1': ['ID', 'Name', 'Value'], 'Sheet2': ['Date', 'Amount']}
- reload()[source]
Reload the sheet structure from the Excel file.
Useful if the file has been modified and you want to refresh the structure.