Config models

class schemashift.FormatConfig(**data)[source]

Bases: BaseModel

Top-level configuration for a single source format.

model_dump(**kwargs)[source]

Override to serialise ColumnMapping fields respecting the _UNSET sentinel.

Return type:

dict[str, Any]

model_dump_json(**kwargs)[source]

Override to serialise ColumnMapping fields respecting the _UNSET sentinel.

Return type:

str

source_columns()[source]

Return all source column names referenced in this config.

Includes direct ‘source’ fields and all col(”…”) references inside ‘expr’ fields, extracted by walking the DSL AST rather than regex-matching strings.

Return type:

set[str]

columns: list[ColumnMapping]
description: str
drop_unmapped: bool
model_config: ClassVar[ConfigDict] = {'from_attributes': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
reader: ReaderConfig
target_schema: str | None
version: int
class schemashift.ColumnMapping(**data)[source]

Bases: BaseModel

Describes how to produce one output column.

has_constant()[source]
Return type:

bool

has_fillna()[source]
Return type:

bool

model_dump(**kwargs)[source]

Override to omit sentinel-valued fields from the output dict.

Return type:

dict[str, Any]

model_dump_json(**kwargs)[source]

Override to omit sentinel-valued fields from the JSON output.

Return type:

str

constant: Any
dtype: Literal['str', 'string', 'utf8', 'int8', 'int16', 'int32', 'int64', 'integer', 'uint8', 'uint16', 'uint32', 'uint64', 'float32', 'float64', 'number', 'bool', 'boolean', 'date', 'datetime', 'time', 'duration', 'binary', 'categorical', 'null'] | None
expr: str | None
fillna: Any
model_config: ClassVar[ConfigDict] = {'from_attributes': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

source: str | None
target: str
class schemashift.ReaderConfig(**data)[source]

Bases: BaseModel

Low-level options passed to the file reader.

encoding: str
model_config: ClassVar[ConfigDict] = {'from_attributes': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

separator: str | None
sheet_name: str | int | None
skip_rows: int
class schemashift.TargetSchema(**data)[source]

Bases: BaseModel

Defines the expected shape and types of an output DataFrame.

classmethod from_yaml(path)[source]

Load a TargetSchema from a YAML file.

Return type:

TargetSchema

required_columns()[source]

Return names of columns marked as required.

Return type:

list[str]

validate_eager(df)[source]

Full validation: columns, dtypes, and null checks on required columns.

Raises:

SchemaValidationError – with details of all issues found.

Return type:

None

validate_lazy(lf)[source]

Validate column names and dtypes against the LazyFrame schema.

Does not collect data — checks structural metadata only.

Raises:

SchemaValidationError – listing all missing columns and type mismatches.

Return type:

None

columns: list[TargetColumn]
description: str
model_config: ClassVar[ConfigDict] = {'from_attributes': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str