trailpack.excel.reader
======================

.. py:module:: trailpack.excel.reader

.. autoapi-nested-parse::

   Excel reader module for loading and inspecting Excel files.

   This module provides an ExcelReader class that:
   - Loads only the structure (sheets and columns) into memory
   - Provides access to sheet names
   - Provides access to column names for mapping


Classes
-------

.. autoapisummary::

   trailpack.excel.reader.ExcelReader


Module Contents
---------------

.. py:class:: ExcelReader(file_path: Union[str, pathlib.Path], header_row: int = 1)

   Excel file reader that loads only sheet structure (sheets and columns) into memory.

   This class provides methods to inspect Excel file structure without loading
   all the data, making it memory-efficient for large files:
   - List available sheets
   - Get column names from specific sheets

   .. rubric:: Example

   >>> reader = ExcelReader("data.xlsx")
   >>> sheet_names = reader.sheets()
   >>> columns = reader.columns("Sheet1")

   Initialize ExcelReader and load sheet structure (sheets and columns) into memory.

   Only the sheet names and column headers are loaded, not the actual data.
   This makes it memory-efficient for large Excel files.

   :param file_path: Path to the Excel file (.xlsx, .xlsm, .xltx, .xltm)
   :param header_row: Row number containing column headers (1-indexed). Defaults to 1.

   :raises FileNotFoundError: If the file does not exist
   :raises ValueError: If the file is not a valid Excel file


   .. py:method:: _load_structure()

      Load sheet structure (sheet names and column headers) from Excel file.

      Opens the workbook in read-only mode, extracts structure, then closes it.
      Only loads metadata, not actual data, for memory efficiency.


   .. py:method:: columns(sheet_name: Optional[str] = None) -> List[str]

      Get list of column names from a specific sheet.

      :param sheet_name: Name of the sheet to read columns from.
                         If None, uses the first sheet.

      :returns: List of column names as strings. Empty cells are returned as empty strings.

      :raises ValueError: If sheet_name doesn't exist in the workbook

      .. rubric:: Example

      >>> reader = ExcelReader("data.xlsx")
      >>> columns = reader.columns("Sheet1")
      >>> print(columns)
      ['ID', 'Name', 'Value', 'Date']


   .. py:method:: get_structure() -> Dict[str, List[str]]

      Get the complete sheet structure as a dictionary.

      :returns: Dictionary mapping sheet names to their column lists

      .. rubric:: Example

      >>> reader = ExcelReader("data.xlsx")
      >>> structure = reader.get_structure()
      >>> print(structure)
      {'Sheet1': ['ID', 'Name', 'Value'], 'Sheet2': ['Date', 'Amount']}


   .. py:method:: reload()

      Reload the sheet structure from the Excel file.

      Useful if the file has been modified and you want to refresh the structure.


   .. py:method:: sheets() -> List[str]

      Get list of all sheet names in the workbook.

      :returns: List of sheet names as strings

      .. rubric:: Example

      >>> reader = ExcelReader("data.xlsx")
      >>> sheet_names = reader.sheets()
      >>> print(sheet_names)
      ['Sheet1', 'Sheet2', 'Data']


   .. py:attribute:: _sheet_columns
      :type:  Dict[str, List[str]]


   .. py:attribute:: file_path


   .. py:attribute:: header_row
      :value: 1