Config format reference¶
A FormatConfig is a JSON (or YAML) file that describes how to transform one specific source format into your target schema.
Full structure¶
{
"name": "camstar_mes",
"description": "Camstar MES lot movement export",
"version": 1,
"target_schema": "lot_movement",
"reader": {
"skip_rows": 0,
"separator": ",",
"encoding": "utf-8"
},
"columns": [
{ "target": "lot_id", "source": "LOT_ID" },
{ "target": "wafer_count", "source": "QTY", "dtype": "int32" },
{ "target": "track_in_time", "expr": "col(\"TRACKIN_DT\").str.to_datetime(\"%Y-%m-%d %H:%M:%S\")" },
{ "target": "hold_flag", "expr": "col(\"HOLD_STATUS\") != \"NONE\"" },
{ "target": "data_source", "constant": "camstar_mes" }
],
"drop_unmapped": true
}
Top-level fields¶
name(required)Unique identifier for this config within the registry. Used by
registry.get()and for auto-detection.descriptionOptional human-readable description. Included in LLM-generated configs.
versionInteger version number. Defaults to
1.target_schemaName of the
TargetSchemathis config produces. Used for validation.readerOptional
ReaderConfigcontrolling how the file is read. See Reader options.columns(required)List of
ColumnMappingobjects. See Column mappings.drop_unmappedIf
true, columns not listed incolumnsare dropped from the output. Defaults totrue.
Column mappings¶
Each entry in columns must have a target field and exactly one of source, expr, or constant.
Rename a column¶
{ "target": "lot_id", "source": "LOT_ID" }
Renames the source column as-is. No type conversion.
Apply a DSL expression¶
{ "target": "track_in_time", "expr": "col(\"TRACKIN_DT\").str.to_datetime(\"%Y-%m-%d %H:%M:%S\")" }
Evaluates the expression and assigns the result to target. See Expression DSL for the full expression reference.
Set a constant¶
{ "target": "data_source", "constant": "camstar_mes" }
Broadcasts a literal value to every row.
Common optional fields¶
dtypeCast the result to this Polars dtype after the mapping. Accepted values:
str,int32,int64,float32,float64,bool,date,datetime,duration.fillnaFill nulls in the output column with this value after the mapping is applied.
Reader options¶
The reader block controls low-level file reading.
skip_rowsNumber of rows to skip before the header. Default:
0.sheet_nameFor Excel files: sheet name (string) or 1-based sheet index (integer). Default: first sheet.
separatorFor CSV/TSV: field delimiter character. Default:
",".encodingFile encoding. Default:
"utf-8".
Supported file formats¶
Format |
Notes |
|---|---|
|
Lazy scan via Polars |
|
CSV with |
|
Lazy scan via Polars |
|
Via |
|
Read eagerly then lazy |
Validation¶
Use validate_config() to check that all DSL expressions parse correctly without running against real data:
import schemashift as ss
config = ss.FormatConfig.model_validate_json(open("my_config.json").read())
errors = ss.validate_config(config)
if errors:
for err in errors:
print(err)
Or from the CLI:
schemashift validate my_config.json