Parser API reference¶
The parser package exposes one supported entry point — parse(...) —
plus the Pydantic models that describe the parsed tree. Anything else
listed here is a documented internal you may import if you need to
build custom tooling on top.
Public entry point¶
parse
¶
parse(
source: str | Path,
rules: str | Path | None = None,
lang: str = "en",
*,
strip_prefix: str | None = None,
nlp_cache_dir: Path = DEFAULT_NLP_CACHE_DIR,
unmatched_namespace: str | None = None
) -> APIModel
Parse an OpenAPI 3.x document into an APIModel tree.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str | Path
|
A local filesystem path or http(s) URL pointing to a JSON or YAML OpenAPI document. Format is auto-detected by content. |
required |
rules
|
str | Path | None
|
Optional local path to a JSON/YAML rules file that mirrors the
OpenAPI extension shape (root |
None
|
lang
|
str
|
ISO language code controlling which spaCy model is loaded. |
'en'
|
strip_prefix
|
str | None
|
Optional path prefix to strip from every path before
classification, e.g. |
None
|
nlp_cache_dir
|
Path
|
Directory under which spaCy models are stored and looked up. On a cache miss the model is downloaded into this directory. |
DEFAULT_NLP_CACHE_DIR
|
unmatched_namespace
|
str | None
|
When set, operations that would otherwise be
dropped by the routing table are retained as synthetic actions
under a top-level namespace of this name. Raises
|
None
|
Returns:
| Type | Description |
|---|---|
APIModel
|
The fully-built APIModel rooted at the namespaces it discovered. |
The data model¶
The tree returned by parse(...) is a graph of these Pydantic v2
models. They're immutable in spirit (downstream code generators treat
them as read-only) and round-trip cleanly through JSON / YAML.
APIModel
¶
Bases: BaseModel
The root of the parsed structural tree.
The root holds top-level namespaces, collections, singletons, and actions.
Real-world OpenAPI documents commonly expose all four directly under / —
e.g. /orders (collection), /me (singleton), /login (action) — with no
namespace prefix.
Namespace
¶
Bases: BaseModel
A folder-like grouping of sub-namespaces, collections, singletons, and actions.
Namespace-level actions (e.g. /auth/login) and namespace-level singletons
(e.g. /admin/health) are valid: real APIs commonly host verb endpoints and
singleton resources directly under a folder prefix.
Collection
¶
Bases: BaseModel
A plural endpoint that fetches a list, creates, and contains a Resource.
Collections may also host sub-singletons that represent collection-level
aggregate views — /orders/stats, /datasets/summary — alongside the
per-item Resource reached via {id}. Sub-collections are not allowed
(collection-under-collection has no canonical meaning).
Resource
¶
Bases: BaseModel
The single-item endpoint of a collection (the segment after {id}).
Singleton
¶
Bases: BaseModel
A resourceful endpoint with no enclosing collection.
Examples: /me, /health, /version, or sub-singletons like
/users/{id}/avatar. Carries the same CRUD slots as Resource and may host
sub-collections, sub-singletons, and actions. Has no resource slot — a
Singleton is the resource.
Action
¶
Bases: BaseModel
A non-CRUD endpoint identified by a verb-phrase path segment.
Actions may attach at the root of the API, under a Namespace, under a Collection, under a Resource, or under a Singleton.
attr_override decouples the surface attribute name from the path's
last segment. The generator uses it when set (otherwise it falls back
to the path-derived snake_case). It exists to support --unmatched,
where the attribute should reflect the operation's operationId
rather than wherever in the URL it happens to land.
Operation
¶
Bases: BaseModel
A single HTTP operation declared on a path.
response_model names the literal 2xx response body schema (the envelope when
the response wraps a list, or the resource itself for single-item responses).
item_model names the inner schema of a list envelope when one is detected
(plain type: array, or an object with an items/data/results/records/
entries array property). The generator uses it so paginated iteration can
yield typed model instances instead of raw dicts; left as None when the
response isn't list-shaped or the item schema is anonymous.
request_model_members is non-empty when the request body is an inline
anyOf / oneOf union of $ref members (e.g. Login | RefreshAccessToken).
The generator renders the body parameter as a Member1 | Member2 Python
union type. When this list is empty and request_model is set, the body is
typed as that single class.
pagination_supported defaults to True and is only meaningful on
collection-fetch operations; the generator decides what to do with it on other
operations. filter_supported and sort_supported default to False and will
be flipped on by future x-okapipy-filter / x-okapipy-sort extensions; they
drive whether the generator emits filter() / order_by() on the collection.
response_headers lists the names of headers declared on the chosen 2xx
response, useful to the generator for detecting Link, X-Total-Count, etc.
Loading specs and rules¶
loader
¶
Load an OpenAPI 3.x document from a local path or an http(s) URL.
load_spec is the public entry point. It auto-detects JSON vs YAML from the file
content, fetches the document (off disk for paths, over HTTP for URLs), and returns
the parsed mapping. $ref pointers are deliberately left intact: downstream code
recovers schema names from the original $ref strings, and full reference
resolution would be both unnecessary and prohibitively expensive on real-world
specs (deeply self-referential schemas, unreachable external files).
detect_base_path reads the path component of the first servers[].url, and
strip_base_path removes that prefix from each path key so subsequent path-walking
sees segments relative to the API's logical root.
load_spec
¶
Load an OpenAPI 3.x document, preserving $ref pointers as-is.
The source may be a local filesystem path or an http(s) URL. Format (JSON or YAML) is auto-detected from the file content.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str | Path
|
Path or URL pointing to the spec. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
The parsed spec as a plain dict, with |
Raises:
| Type | Description |
|---|---|
SpecLoadError
|
When the document cannot be located, read, or parsed. |
detect_base_path
¶
Return the path component of the spec's first servers[].url, or an empty string.
OpenAPI 3.x uses servers to advertise base URLs; the path portion of the first
server URL is treated as the API's base path. If no servers are declared (or the
URL has no path), the empty string is returned and no stripping is performed.
rules
¶
External rules file: a project-local override layer for OpenAPI parsing.
A rules file lets a user supply (or override) x-okapipy-ns at the document
root and x-okapipy-kind / x-okapipy-paginated / x-okapipy-exclude on path-items
or operations without editing the OpenAPI document itself. Rules-file values
take precedence over values declared inline in the spec.
The file must be local. URLs are not supported.
Rules
¶
Bases: BaseModel
The full rules document.
PathRules
¶
Bases: BaseModel
Rules entry for a single OpenAPI path.
OperationRules
¶
Bases: BaseModel
Per-method override entry inside a path's rules block.
load_rules
¶
Load a rules file from a local path, returning empty rules when source is None.
The file may be JSON or YAML; the format is auto-detected by attempting JSON first and falling back to YAML. URLs are rejected because the rules file is project-local.
Raises:
| Type | Description |
|---|---|
RulesFormatError
|
When the file cannot be read or parsed, or when an
|
NLP¶
nlp
¶
spaCy-backed POS and morphology lookup for path segments.
The classifier needs to know, for a single path segment, whether it looks like a
plural noun (a collection: users, account-tokens), a verb or verb-phrase (an
action: login, force-reimport), or neither. This module produces that summary
by tagging the segment with a small spaCy model.
Three responsibilities live here:
- Map an ISO language code to its spaCy model name (
en->en_core_web_sm). - Load the spaCy pipeline from a user-controlled cache directory, downloading
the model on a cache miss via
python -m spacy download --target <cache_dir>. - Split a segment on
-/_, tag each token, and reduce the result to three mutually exclusive flags: is it a verb-phrase, is it plural, or is it singular/unknown. The compound-word logic uses the head-noun rule (last token determines role) with a postmodifier-word exception for constructions likeunits-of-measureorterms-and-conditions.
Two non-obvious workarounds preserve correctness against the small spaCy models:
- Bare path tokens (
tokens,users) get mistagged as singularPROPN. To detect plurality reliably, the segment is re-analyzed inside a definite-article wrapper fromPLURAL_CONTEXT(e.g."the tokens"); the head noun then carries the rightNumbermorphology. - Verbs (
reset,submit) keep theirVERBtag in isolation but lose it inside the article wrapper. Each token is analyzed both ways and the signals combined. - A small per-language
VERB_ACTION_REGISTRYcovers high-traffic API verb endpoints (login,refresh,ping, ...) that spaCy mistags even with the workarounds above.
load_pipeline
¶
Load the spaCy pipeline for lang, downloading it on a cache miss.
The pipeline is cached per-process keyed by (lang, cache_dir), so repeated calls
are cheap. On a cache miss the model is downloaded into cache_dir using
python -m spacy download <model> --target <cache_dir>. Subsequent calls reuse
the on-disk copy without touching the network.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lang
|
str
|
ISO language code; must exist in the language-to-model table. |
required |
cache_dir
|
Path
|
Directory under which model packages live. |
DEFAULT_CACHE_DIR
|
Returns:
| Type | Description |
|---|---|
Language
|
A loaded spaCy |
Raises:
| Type | Description |
|---|---|
NlpModelMissingError
|
When the language is unknown or the download fails. |
fetch_model
¶
Download the spaCy model for lang into cache_dir and return its path.
Uses spaCy's own download command, passing --target so the package is laid
out under cache_dir/<model_name>/... instead of being installed globally.
Raises:
| Type | Description |
|---|---|
NlpModelMissingError
|
When the download fails for any reason (network down, unknown model name, pip failure). |
model_path
¶
Return the on-disk directory that holds the spaCy model for lang.
python -m spacy download --target installs the model as a Python package laid
out as <cache_dir>/<package>/<package>-<version>/.... This helper resolves the
versioned subdirectory when one exists, and otherwise returns the package root
(which is what tests using a stub directory will see).
Classifier and builder¶
The classifier and builder are the heart of the pipeline. You generally
won't call them directly — use parse(...) — but their docstrings are
the most precise statement of what each phase does.
classifier
¶
Classify a single OpenAPI path segment into a SegmentKind.
The structural builder calls classify_segment once per segment as it walks each
path. The result decides whether the segment becomes a Namespace, Collection,
Resource (path parameter), Singleton, or Action node in the tree.
The classifier applies the following precedence chain, stopping at the first match:
- Path-parameter shape — a segment containing
{...}is always aRESOURCE_ID. - Explicit hint — an
x-okapipy-kindvalue passed in viaextension_hint. The caller is responsible for merging spec values with rules-file values (rules win) before passing the hint here. - Namespace registry — if the cumulative path is declared as a namespace
(via spec
x-okapipy-nsor rules), the segment is aNAMESPACE. - NLP signal —
analyze_segmentreports verb-phrase / plural / singular, producingACTION,COLLECTION, or (depending on parent)NAMESPACE/COLLECTION. - Fallback — emit a warning and treat the segment as a
COLLECTION.
SINGLETON never falls out of NLP heuristics: real singletons (/me, /health)
look identical to singular-noun namespaces, so the kind is reachable only through
an explicit hint.
classify_segment
¶
classify_segment(
*,
segment: str,
cumulative_path: str,
parent_kind: SegmentKind | None,
nlp: Language,
ns_registry: set[str],
extension_hint: str | None
) -> SegmentKind
Classify a single path segment into one of four kinds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
segment
|
str
|
The raw segment as it appears between |
required |
cumulative_path
|
str
|
The path so far, joined from previous segments without a leading or trailing slash; used for the namespace-registry lookup. |
required |
parent_kind
|
SegmentKind | None
|
The kind of the previous segment, or None when at the root. |
required |
nlp
|
Language
|
A loaded spaCy pipeline used for POS and morphology. |
required |
ns_registry
|
set[str]
|
The union of namespace paths declared by the spec and rules. |
required |
extension_hint
|
str | None
|
A pre-merged |
required |
Returns:
| Type | Description |
|---|---|
SegmentKind
|
The classified |
builder
¶
Walk an OpenAPI document and produce a populated APIModel tree.
build is the single public entry. It iterates paths, classifies each segment
via classify_segment, attaches the corresponding node (Namespace,
Collection, Resource, Singleton, or Action) under its parent, and routes
the path-item's HTTP methods to operation slots on that node. The function
mutates the APIModel and its children in place — there are no draft or
wrapper types.
Three concerns live in this module:
- Naming.
contextual_namejoins the full breadcrumb of singular collection names accumulated so far, so/organizations/{id}/datasources/{id}/force-reimportyieldsOrganizationDatasourceForceReimport. Resource names use the breadcrumb for the same reason.singularizereduces a plural collection segment via the spaCy-backed lemmatizer. - Node attachment. Each segment is mapped to a node kind by the classifier;
_attachthen either creates a new child or reuses an existing one with the same name. Namespace-level actions are valid (e.g./auth/login); a path that attempts to place an action directly under aNamespaceraisesInvalidStructureErroronly when structurally impossible. - Operation routing. GET/POST on a
Collectionmap tofetch/create; GET/PUT/PATCH/DELETE on aResourceorSingletonmap toretrieve/update/partial_update/delete. Operations that don't fit (e.g.POST /users/{id}with nox-okapipy-kind: actionhint, PUT on a bare collection) are dropped with a warning rather than coerced into a synthetic action; synthetic actions exist only for explicitx-okapipy-kind: actionopt-ins.
Schema names for request_model / response_model are recovered from the
unresolved raw_spec by reading the trailing segment of the original $ref,
falling back to the resolved schema's title when no ref is present.
x-okapipy-exclude skips whole paths ("*") or specific methods
(["DELETE", ...], case-insensitive); rules-file values override spec values
on every conflict.
build
¶
build(
spec: dict[str, Any],
rules: Rules,
nlp: Language,
*,
strip_prefix: str | None = None,
unmatched_namespace: str | None = None
) -> APIModel
Construct an APIModel from an OpenAPI document.
$ref pointers in the spec are left intact: schema names for request_model and
response_model are recovered from the trailing segment of each $ref, falling
back to inline schema title when no ref is present.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec
|
dict[str, Any]
|
The OpenAPI document, with |
required |
rules
|
Rules
|
A loaded |
required |
nlp
|
Language
|
A loaded spaCy pipeline used by the classifier and naming engine. |
required |
strip_prefix
|
str | None
|
Optional path prefix to strip from every path before
classification, e.g. |
None
|
unmatched_namespace
|
str | None
|
When set, operations that would otherwise be
dropped by the routing table are retained as synthetic actions
under a top-level namespace of this name. Raises
|
None
|
Returns:
| Type | Description |
|---|---|
APIModel
|
A populated APIModel. |
contextual_name
¶
Return a contextual PascalCase name built from the full breadcrumb chain.
Every singular collection name and singleton segment accumulated in breadcrumb
is concatenated, then the PascalCase form of current is appended. With an
empty breadcrumb, only PascalCase(current) is returned.
Namespaces never enter the breadcrumb — they're pure folders and carry no
semantic ownership. Singletons do, because the elements they host belong
to them (the orders under /me are Me's orders, not generic orders),
which also prevents file-name collisions when a top-level collection and
a singleton sub-collection share a segment (/orders vs /me/orders).
Examples:
contextual_name([], "orders") == "Orders" contextual_name(["Order"], "lines") == "OrderLines" contextual_name(["Me"], "orders") == "MeOrders" contextual_name(["Organization", "Datasource"], "force-reimport") == "OrganizationDatasourceForceReimport"
Dumping the tree¶
dump
¶
Serialize an APIModel to JSON or YAML, inferring the format from a path.
write
¶
Write the APIModel to path, choosing JSON or YAML by file extension.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api
|
APIModel
|
The model to serialize. |
required |
path
|
Path
|
Destination file. The extension must be one of |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
When the file extension is not recognized. |
to_json
¶
Return a pretty-printed JSON representation of the APIModel.
Errors¶
Error hierarchy raised by the okapipy structural parser.
ParserError
¶
Bases: Exception
Base class for all errors raised by the structural parser.
SpecLoadError
¶
Bases: ParserError
Raised when the OpenAPI document cannot be loaded, parsed, or validated.
RulesFormatError
¶
Bases: ParserError
Raised when the rules file cannot be parsed.
NlpModelMissingError
¶
Bases: ParserError
Raised when the requested spaCy model is unavailable and cannot be downloaded.
Attributes:
| Name | Type | Description |
|---|---|---|
lang |
The ISO language code that was requested. |
|
cache_dir |
The directory the loader looked in (and would have downloaded into). |
InvalidStructureError
¶
Bases: ParserError
Raised when the parsed structure violates the okapipy hierarchy rules.
Currently this signals an attempt to attach an Action directly under a Namespace, which is not permitted: every Action must live under a Collection or a Resource.
UnmatchedNamespaceCollisionError
¶
Bases: ParserError
Raised when --unmatched <name> collides with an existing top-level node.
The synthesized container for unmatched operations must not share a snake_case identifier with any top-level Namespace, Collection, Singleton, or Action: that would produce two attributes with the same name on the generated client class. The caller picks a different name.
Attributes:
| Name | Type | Description |
|---|---|---|
requested |
The name passed via |
|
conflict_kind |
The kind of the conflicting node ( |
|
conflict_name |
The original (pre-snake_case) name of the conflicting top-level node. |