Generator internals¶
The generator turns the parser's APIModel tree into a runnable Python
project: a pyproject.toml, a vendored runtime, base-layer files for
every node, user-layer stubs, generated tests, and a manifest that
makes drift detection possible across runs.
The whole thing is structured as a virtual filesystem —
dict[str, GeneratedFile] keyed by POSIX-style relative path. The CLI
flushes that dict to disk; tests inspect it directly. No filesystem
side effects in the generator itself.
APIModel
│
▼
generate(api, raw_spec, ...) ← orchestration, in api.py
│
├─► emit_project_skeleton() pyproject, README, LICENSE, .gitignore (one-shot)
├─► emit_runtime() vendored runtime + base/__init__.py (regenerated)
├─► emit_models() base/models.py via datamodel-code-gen (regenerated)
├─► emit_client() sync + async ClientBase (regenerated)
├─► emit_tree() one base/<node>.py per parser-tree node (regenerated)
├─► emit_stubs() user-layer subclass stubs (one-shot)
├─► emit_tests() test scaffolding (one-shot)
└─► compute_manifest() base/_manifest.json (regenerated)
│
▼
dict[str, GeneratedFile] ← virtual FS
│
▼
write_to_disk(vfs, output_dir, dry_run=...) ← in vfs.py; respects one_shot
Lifecycle: one-shot vs. regenerated¶
Every GeneratedFile carries a one_shot flag.
one_shot=False— files undersrc/{package}/base/. Rewritten on every run. Includes: vendored runtime,models.pyfromdatamodel-code-generator, sync + async client base classes, one file per parser-tree node,_manifest.json.one_shot=True— files undersrc/{package}/(subclass stubs the customer customizes), the project skeleton (pyproject.toml,README.md,LICENSE,.gitignore,.python-version), and the generated test scaffolding.
write_to_disk honors the flag: existing one-shot files are never
overwritten; their absence is treated as "first generation, write the
stub". Base files are written unconditionally.
The order inside generate() matters:
- Project skeleton & runtime first — so subsequent emitters can rely on the package layout.
- Models next — the walker needs to know which model names are
actually emitted (some specs reference schemas
datamodel-code-generatorcan't represent), so it can drop dangling imports. - Client + walker — the walker is the bulk of the work, one base file per parser-tree node.
- User-layer stubs — written one-shot, auto-wiring every
__<child>_factory__for that subtree. - Tests — also one-shot, so customer edits to the suite survive.
- Manifest last — captures the full set of base files. Used by
drift detection and
--check.
The walker (emit/walk.py)¶
emit_tree(env, api, project_context, package_path, available_models)
walks the parser tree and emits one base file per node. The visitor
maintains a small context as it descends:
- The full breadcrumb of singular collection names (used for naming).
- The current namespace path (so files land in the right
base/<ns>/subdirectory). - The set of model names that actually exist in
models.py. References to missing schemas degrade gracefully: the type becomesAnyand the import is dropped.
For each node, the walker picks a Jinja template under
generator/templates/package/... and renders it with a context dict
that includes:
- The node itself.
- The factory hooks for its children (PascalCase class names).
- The list of operations and their method/path/request/response metadata.
- The set of imports needed (computed up front, deduped).
Templates have side-effect-free filters in runtime/filters.py and a
small templating layer (templating.py) that uses Jinja's
ChoiceLoader to look up user templates first (when --templates-dir
is passed) and packaged defaults second.
Models (models.py + datamodel-code-generator)¶
emit_models(raw_spec, model_templates_dir, python_version) invokes
datamodel-code-generator with:
- The raw spec (path, URL, or already-loaded dict).
pydantic-v2output mode.- Pinned Python version (
--target-python-version) so f-string and match syntax matches what the rest of the project emits. - Optional
model_templates_dirforwarded as--custom-template-dir.
The result is a single Python source string written to
src/{package}/base/models.py.
public_names(source) parses the emitted source and returns the set of
top-level class names. The walker uses that set to validate model
references — anything not in the set becomes Any in the generated
client.
--shape dicts skips this step entirely. The walker then drops every
from ..models import ... line and types every body / return as
dict[str, Any]. The client base also drops the shape= constructor
option and with_shape(...) — there is nothing to switch to. This is an
escape hatch for two situations:
datamodel-code-generatorcan't process the spec's schemas (rare, but it happens with very baroqueoneOfgraphs).- The consumer wants to bring their own model layer (e.g. they already have hand-written Pydantic types they prefer).
--shape models keeps models.py but locks the runtime to validation:
the shape= constructor option and with_shape(...) are dropped, and
bodies / returns are typed strictly as the recovered model
(Foo / Foo | None) rather than admitting a dict[str, Any] arm.
The runtime (runtime/ + emit_runtime)¶
A small set of files vendored into every generated client:
runtime/transport.py— theTransportwrapper aroundhttpx.Client/httpx.AsyncClient.runtime/strategies.py— pagination / filter / sort Protocols and the built-ins. (See Strategies.)runtime/filters.py+runtime/sort.py— the small DSL used to compose filter trees and sort term lists at call time.runtime/types.py— small shared types (PageOf[...], etc.).runtime/exceptions.py—ConfigurationError,UnsupportedFilterError,UnsupportedSortError, plus HTTP error mapping.
Vendoring (rather than depending on a separate okapipy-runtime PyPI
package) is intentional: it keeps the generated client self-contained
and lets us tighten the runtime API without breaking older clients.
The manifest (manifest.py + edges.py)¶
base/_manifest.json records:
{
"okapipy_version": "0.1.0",
"spec_hash": "<sha256>",
"rules_hash": "<sha256>",
"generated_at": "<ISO-8601>",
"base_files": [...],
"edges": [
{"parent": "ClientBase", "child": "OrdersCollectionBase",
"factory": "__orders_factory__", "user_module": "orders"},
...
]
}
base_filesdrives pruning: anybase/*.pyfile present on disk but absent from the new manifest is a stale leftover from a removed namespace/collection, and gets deleted on the next run.edgesdrives drift detection: each edge says "the parent's__<factory>__should point at the user-layer subclass in<user_module>."vfs.pychecks the actual content of one-shot user files against the expected edges and emits a warning per missing factory binding.
--check is a dry-run mode that runs the full pipeline up to disk
write, then refuses to write anything and exits non-zero if any base
file would change, any drift warning fires, or any stale base file
would be pruned. CI gate.
The VFS (vfs.py)¶
Two responsibilities:
@dataclass
class GeneratedFile:
content: str
one_shot: bool = False
def write_to_disk(
vfs: dict[str, GeneratedFile],
output_dir: Path,
*,
dry_run: bool = False,
) -> WriteReport: ...
write_to_disk:
- Reads the previous manifest (if any) under
output_dir. - For each entry in the new VFS:
- If
one_shot=Trueand the target file already exists → skip (and don't even read it; we don't merge). - Otherwise, write unconditionally.
- If
- Prune: any file under
<package>/base/that's listed in the previous manifest but not the new one → delete. - Compare the new VFS against the disk for drift detection on
one-shot files (stubs missing
__<child>_factory__bindings). - Return a
WriteReportsummarisingwritten,skipped,pruned,would_change, andwarnings.
When dry_run=True, no disk side effects happen — but the report is
populated as if it had. That's what powers --check.
Templating (templating.py)¶
Templates live under generator/templates/. The packaging layout is:
templates/
├── package/ # one template per generated source file
│ ├── client.py.j2
│ ├── namespace.py.j2
│ ├── collection.py.j2
│ ├── resource.py.j2
│ ├── singleton.py.j2
│ ├── action.py.j2
│ ├── stub_*.py.j2 # user-layer stubs
│ └── ...
├── project/ # pyproject, README, LICENSE, .gitignore
├── tests/ # generated test scaffolding
└── model/ # datamodel-code-generator overrides
make_environment(user_templates_dir) builds a Jinja Environment
with a ChoiceLoader: user templates first (when --templates-dir is
passed), packaged defaults second. StrictUndefined makes missing
context variables fail loudly at render time. After rendering, every
Python file passes through ruff check --fix --select I (isort) and
then ruff format for canonical output.
User-facing template overrides are documented in Template customization. When changing the packaged templates, keep in mind that user overrides may be tracking specific variables in the context dict — backwards-compatible renames should keep the old variable around for a release or two before removing it.
Errors (errors.py)¶
GenerationError— base class. CLI catches it and prints a friendly error.- Sub-types for: missing required model, ambiguous schema name, dmcg
failure, drift detection refusing to write under
--check, and a few internal-consistency checks the walker performs as it goes.
-vv (DEBUG logging on the CLI root) prints full tracebacks; otherwise
only the message reaches stderr.
Adding a new emitter¶
Most generator changes are template changes — new fields, renames, better naming. Reach for a new emitter only when you're adding a new file to the generated project (e.g. a new vendored runtime module, or a new project-skeleton file).
Steps:
- Add the template under
generator/templates/. - Add the emitter under
generator/emit/. It should accept the same shape as existing emitters:(env, project_context, package_path, ...) -> dict[str, str]. - Wire it into
generate()inapi.py, with the rightone_shotflag. - Add a test under
tests/generator/that:- Exercises a small parser tree.
- Asserts the new file lands in the VFS at the expected path.
- Asserts the lifecycle flag is correct.
- (For one-shot files) asserts a second
generate(...)with the file already in the VFS doesn't overwrite it.
Then run uv run pytest tests/generator/ and uv run mypy src and
you're done.