Expression DSL¶
Column mappings support a safe, closed expression language that compiles directly to native Polars expressions.
The DSL is not Python. There is no eval(), no imports, no arbitrary method calls — only the operations explicitly listed here. Anything outside this allowlist raises DSLSyntaxError at config-load time, before any data is touched.
Syntax overview¶
col("ColumnName") # reference a column
col("X") / 1000 # arithmetic: + - * /
col("X").method() # method chain
when(condition, value).otherwise(value) # conditional
coalesce(col("A"), col("B"), "fallback") # first non-null
col("x").cast("float64") # type cast
Column references¶
col("Name") # exact column name (case-sensitive)
col("Sales (USD)") # spaces and special chars are fine inside quotes
Arithmetic¶
col("t_in").cast("int64") * 1000000
col("yield_pct") * col("wafer_count")
col("step_sequence") + col("offset") - 10
Supports +, -, *, /.
String operations¶
All string methods are accessed via .str.:
Expression |
Description |
|---|---|
|
Strip leading/trailing whitespace |
|
Strip leading whitespace |
|
Strip trailing whitespace |
|
Lowercase |
|
Uppercase |
|
Parse string to datetime |
|
Literal string replace |
|
Regex replace |
|
Substring by position |
|
Split into list |
Methods can be chained:
col("Name").str.strip().str.lower()
Date/time operations¶
Date methods are accessed via .dt.:
Expression |
Description |
|---|---|
|
Extract year |
|
Extract month |
|
Extract day |
|
Extract hour |
|
Extract minute |
|
Extract second |
|
Extract date part |
|
Format as string |
Arithmetic on expressions¶
col("t_in").cast("int64") * 1_000_000 # Unix epoch seconds → Polars Datetime (microseconds)
Direct column methods¶
These are called directly on col(...) without a namespace:
Expression |
Description |
|---|---|
|
Cast to a Polars dtype |
|
Boolean null mask |
|
Boolean not-null mask |
|
Replace nulls with a literal |
|
Absolute value |
Accepted cast targets¶
str, int32, int64, float32, float64, bool, date, datetime, duration
Conditionals¶
when(col("Type") == "solar", "Solar").otherwise("Other")
Multi-branch:
when(col("T") == "A", "Alpha")
.when(col("T") == "B", "Beta")
.otherwise("Unknown")
when(condition, value) takes a boolean expression and the value to assign when true. .otherwise(value) is required to close the chain.
Comparison operators¶
==, !=, <, <=, >, >=
Coalesce¶
Returns the first non-null value across multiple expressions or literals:
coalesce(col("A"), col("B"), "fallback")
Combining expressions¶
col("t_in").cast("int64") * 1_000_000 # epoch seconds → microseconds, then cast to datetime
col("HOLD_STATUS") != "NONE" # boolean from string sentinel
when(col("t_out") == 0, null).otherwise(col("t_out").cast("int64") * 1_000_000)
Error handling¶
A bad expression raises DSLSyntaxError at parse time (before any data is read):
from schemashift.errors import DSLSyntaxError
try:
ss.validate_config(config)
except DSLSyntaxError as e:
print(e.expression) # the offending expression string
print(e.position) # character position of the error
A valid expression that fails at evaluation raises DSLRuntimeError.