LLM-assisted config generation¶

When a file arrives from a source you have no config for, smart_transform() can generate one automatically using a language model.

How it works¶

schemashift reads the file headers and a small sample (default 5 rows)
Sends them to your LLM along with the target schema and the DSL reference
The LLM returns a FormatConfig as JSON
schemashift validates the config: parses all DSL expressions, transforms the sample rows
On failure, retries up to N times (default 2) with the error appended to the prompt
On success, optionally saves the config to the registry

Installation¶

pip install "schemashift[llm]"

Supported LLM providers¶

schemashift accepts any LangChain BaseChatModel. Two providers are tested and supported:

Anthropic (direct)

from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(model="claude-haiku-4-5-20251001", temperature=0)

Set ANTHROPIC_API_KEY in your environment or .env file.

Azure AI Foundry

from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(
    model="claude-haiku-4-5",
    api_key="<FOUNDRY_API_KEY>",
    base_url="https://<resource>.services.ai.azure.com/anthropic",
)

Or via environment variables (FOUNDRY_API_KEY + FOUNDRY_RESOURCE).

Basic usage¶

import schemashift as ss
from langchain_anthropic import ChatAnthropic

schema = ss.TargetSchema.from_yaml("schemas/lot_movement.yaml")
registry = ss.FileSystemRegistry("./configs/")
llm = ChatAnthropic(model="claude-haiku-4-5-20251001", temperature=0)

result = ss.smart_transform(
    "sap_erp.csv",
    registry=registry,
    target_schema=schema,
    llm=llm,
    auto_register=True,  # saves the config so next run hits the registry
)

If the file matches an existing config in the registry, smart_transform() uses it directly — the LLM is only called on a miss.

Human review step¶

Pass a review_fn to inspect and optionally edit the generated config before it’s applied:

def review(config: ss.FormatConfig, sample_df) -> ss.FormatConfig | None:
    print(config.model_dump_json(indent=2))
    print(sample_df)
    # return config to accept, return None to reject
    return config

result = ss.smart_transform(
    "sap_erp.csv",
    registry=registry,
    target_schema=schema,
    llm=llm,
    review_fn=review,
    auto_register=True,
)

Generating a config without transforming¶

If you want only the config (e.g. to store it or inspect it before use):

from schemashift.llm import generate_config

config = generate_config(
    path="sap_erp.csv",
    target_schema=schema,
    llm=llm,
    max_retries=3,
)
registry.register(config)

Error handling¶

from schemashift.errors import LLMGenerationError

try:
    result = ss.smart_transform(...)
except LLMGenerationError as e:
    print(f"Failed after {len(e.attempts)} attempts")
    for i, attempt in enumerate(e.attempts):
        print(f"Attempt {i+1}: {attempt}")

LLMGenerationError.attempts contains the error message from each retry, useful for debugging prompt issues.

CLI¶

# Generate a config and print it
schemashift generate data.csv --target-schema schemas/lot_movement.yaml

# Generate and save to the registry
schemashift generate data.csv \
    --registry ./configs/ \
    --target-schema schemas/lot_movement.yaml

# Generate with interactive review before saving
schemashift generate data.csv \
    --registry ./configs/ \
    --target-schema schemas/lot_movement.yaml \
    --interactive

Credential auto-detection¶

The CLI loads a .env file from the working directory and picks the provider automatically:

Priority	Variables	Provider
1	`FOUNDRY_API_KEY` + `FOUNDRY_ENDPOINT`	Azure AI Foundry
2	`ANTHROPIC_API_KEY`	Anthropic

If FOUNDRY_RESOURCE is set without FOUNDRY_ENDPOINT, the endpoint is inferred as https://<FOUNDRY_RESOURCE>.services.ai.azure.com/api/projects/<FOUNDRY_RESOURCE>.