trailpack.excel.reader

Excel reader module for loading and inspecting Excel files.

This module provides an ExcelReader class that: - Loads only the structure (sheets and columns) into memory - Provides access to sheet names - Provides access to column names for mapping

Classes

ExcelReader

Excel file reader that loads only sheet structure (sheets and columns) into memory.

Module Contents

class trailpack.excel.reader.ExcelReader(file_path: str | pathlib.Path, header_row: int = 1)[source]

Excel file reader that loads only sheet structure (sheets and columns) into memory.

This class provides methods to inspect Excel file structure without loading all the data, making it memory-efficient for large files: - List available sheets - Get column names from specific sheets

Example

>>> reader = ExcelReader("data.xlsx")
>>> sheet_names = reader.sheets()
>>> columns = reader.columns("Sheet1")

Initialize ExcelReader and load sheet structure (sheets and columns) into memory.

Only the sheet names and column headers are loaded, not the actual data. This makes it memory-efficient for large Excel files.

Parameters:
  • file_path – Path to the Excel file (.xlsx, .xlsm, .xltx, .xltm)

  • header_row – Row number containing column headers (1-indexed). Defaults to 1.

Raises:
  • FileNotFoundError – If the file does not exist

  • ValueError – If the file is not a valid Excel file

_load_structure()[source]

Load sheet structure (sheet names and column headers) from Excel file.

Opens the workbook in read-only mode, extracts structure, then closes it. Only loads metadata, not actual data, for memory efficiency.

columns(sheet_name: str | None = None) List[str][source]

Get list of column names from a specific sheet.

Parameters:

sheet_name – Name of the sheet to read columns from. If None, uses the first sheet.

Returns:

List of column names as strings. Empty cells are returned as empty strings.

Raises:

ValueError – If sheet_name doesn’t exist in the workbook

Example

>>> reader = ExcelReader("data.xlsx")
>>> columns = reader.columns("Sheet1")
>>> print(columns)
['ID', 'Name', 'Value', 'Date']
get_structure() Dict[str, List[str]][source]

Get the complete sheet structure as a dictionary.

Returns:

Dictionary mapping sheet names to their column lists

Example

>>> reader = ExcelReader("data.xlsx")
>>> structure = reader.get_structure()
>>> print(structure)
{'Sheet1': ['ID', 'Name', 'Value'], 'Sheet2': ['Date', 'Amount']}
reload()[source]

Reload the sheet structure from the Excel file.

Useful if the file has been modified and you want to refresh the structure.

sheets() List[str][source]

Get list of all sheet names in the workbook.

Returns:

List of sheet names as strings

Example

>>> reader = ExcelReader("data.xlsx")
>>> sheet_names = reader.sheets()
>>> print(sheet_names)
['Sheet1', 'Sheet2', 'Data']
_sheet_columns: Dict[str, List[str]][source]
file_path[source]
header_row = 1[source]