smart_reader
Classes
Adaptive data reader that chooses optimal technology based on file size. |
Module Contents
- class smart_reader.SmartDataReader(file_path: str | pathlib.Path)[source]
Adaptive data reader that chooses optimal technology based on file size. Strategy: - <10MB: pandas (simplicity, compatibility) - 10-500MB: polars (speed, memory efficiency) - >500MB: polars lazy (streaming, minimal memory) - CSV always: pyarrow or polars (much faster than pandas)
- _read_pandas_chunked(sheet_name: str | None = None, chunk_size: int = 10000) pandas.DataFrame[source]
Read large Excel files in chunks, return first chunk for preview.
- _read_polars(sheet_name: str | None = None) pandas.DataFrame[source]
Medium files: Use polars, convert to pandas.
- _read_polars_lazy(sheet_name: str | None = None) pandas.DataFrame[source]
Large files: Use lazy evaluation, process in chunks.
- estimate_memory() str[source]
Estimate memory usage.
- Returns:
Human-readable memory estimate string
- read(sheet_name: str | None = None) pandas.DataFrame[source]
Read file using optimal engine, always return pandas DataFrame.
- Parameters:
sheet_name – Sheet name for Excel files (optional)
- Returns:
pandas DataFrame with file contents
Why pandas output? - Rest of codebase expects pandas - Can convert polars → pandas at end - Only final result in memory