Config models¶

class schemashift.FormatConfig(**data)[source]¶

Bases: BaseModel

Top-level configuration for a single source format.

model_dump(**kwargs)[source]¶

Override to serialise ColumnMapping fields respecting the _UNSET sentinel.

Return type:: dict[str, Any]

model_dump_json(**kwargs)[source]¶

Override to serialise ColumnMapping fields respecting the _UNSET sentinel.

Return type:: str

source_columns()[source]¶

Return all source column names referenced in this config.

Includes direct ‘source’ fields and all col(”…”) references inside ‘expr’ fields, extracted by walking the DSL AST rather than regex-matching strings.

Return type:: set[str]

columns: list[ColumnMapping]¶

description: str¶

drop_unmapped: bool¶

model_config: ClassVar[ConfigDict] = {'from_attributes': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str¶

reader: ReaderConfig¶

target_schema: str | None¶

version: int¶

class schemashift.ColumnMapping(**data)[source]¶

Bases: BaseModel

Describes how to produce one output column.

has_constant()[source]¶

Return type:: bool

has_fillna()[source]¶

Return type:: bool

model_dump(**kwargs)[source]¶

Override to omit sentinel-valued fields from the output dict.

Return type:: dict[str, Any]

model_dump_json(**kwargs)[source]¶

Override to omit sentinel-valued fields from the JSON output.

Return type:: str

constant: Any¶

dtype: Literal['str', 'string', 'utf8', 'int8', 'int16', 'int32', 'int64', 'integer', 'uint8', 'uint16', 'uint32', 'uint64', 'float32', 'float64', 'number', 'bool', 'boolean', 'date', 'datetime', 'time', 'duration', 'binary', 'categorical', 'null'] | None¶

expr: str | None¶

fillna: Any¶

model_config: ClassVar[ConfigDict] = {'from_attributes': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

source: str | None¶

target: str¶

class schemashift.ReaderConfig(**data)[source]¶

Bases: BaseModel

Low-level options passed to the file reader.

encoding: str¶

model_config: ClassVar[ConfigDict] = {'from_attributes': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

separator: str | None¶

sheet_name: str | int | None¶

skip_rows: int¶

class schemashift.TargetSchema(**data)[source]¶

Bases: BaseModel

Defines the expected shape and types of an output DataFrame.

classmethod from_yaml(path)[source]¶

Load a TargetSchema from a YAML file.

Return type:: TargetSchema

required_columns()[source]¶

Return names of columns marked as required.

Return type:: list[str]

validate_eager(df)[source]¶

Full validation: columns, dtypes, and null checks on required columns.

Raises:: SchemaValidationError – with details of all issues found.
Return type:: None

validate_lazy(lf)[source]¶

Validate column names and dtypes against the LazyFrame schema.

Does not collect data — checks structural metadata only.

Raises:: SchemaValidationError – listing all missing columns and type mismatches.
Return type:: None

columns: list[TargetColumn]¶

description: str¶

model_config: ClassVar[ConfigDict] = {'from_attributes': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str¶