Parser API reference¶

The parser package exposes one supported entry point — parse(...) — plus the Pydantic models that describe the parsed tree. Anything else listed here is a documented internal you may import if you need to build custom tooling on top.

Public entry point¶

parse ¶

parse(
    source: str | Path,
    rules: str | Path | None = None,
    lang: str = "en",
    *,
    strip_prefix: str | None = None,
    nlp_cache_dir: Path = DEFAULT_NLP_CACHE_DIR,
    unmatched_namespace: str | None = None
) -> APIModel

Parse an OpenAPI 3.x document into an APIModel tree.

Parameters:

Name	Type	Description	Default
`source`	`str \| Path`	A local filesystem path or http(s) URL pointing to a JSON or YAML OpenAPI document. Format is auto-detected by content.	required
`rules`	`str \| Path \| None`	Optional local path to a JSON/YAML rules file that mirrors the OpenAPI extension shape (root `x-okapipy-ns` and per-operation `x-okapipy-kind`). URLs are not accepted.	`None`
`lang`	`str`	ISO language code controlling which spaCy model is loaded.	`'en'`
`strip_prefix`	`str \| None`	Optional path prefix to strip from every path before classification, e.g. `/public/v1`. When set, overrides the prefix inferred from `servers[].url`.	`None`
`nlp_cache_dir`	`Path`	Directory under which spaCy models are stored and looked up. On a cache miss the model is downloaded into this directory.	`DEFAULT_NLP_CACHE_DIR`
`unmatched_namespace`	`str \| None`	When set, operations that would otherwise be dropped by the routing table are retained as synthetic actions under a top-level namespace of this name. Raises `UnmatchedNamespaceCollisionError` if the name collides with an existing top-level node.	`None`

Returns:

Type	Description
`APIModel`	The fully-built APIModel rooted at the namespaces it discovered.

The data model¶

The tree returned by parse(...) is a graph of these Pydantic v2 models. They're immutable in spirit (downstream code generators treat them as read-only) and round-trip cleanly through JSON / YAML.

APIModel ¶

Bases: BaseModel

The root of the parsed structural tree.

The root holds top-level namespaces, collections, singletons, and actions. Real-world OpenAPI documents commonly expose all four directly under / — e.g. /orders (collection), /me (singleton), /login (action) — with no namespace prefix.

Namespace ¶

Bases: BaseModel

A folder-like grouping of sub-namespaces, collections, singletons, and actions.

Namespace-level actions (e.g. /auth/login) and namespace-level singletons (e.g. /admin/health) are valid: real APIs commonly host verb endpoints and singleton resources directly under a folder prefix.

Collection ¶

Bases: BaseModel

A plural endpoint that fetches a list, creates, and contains a Resource.

Collections may also host sub-singletons that represent collection-level aggregate views — /orders/stats, /datasets/summary — alongside the per-item Resource reached via {id}. Sub-collections are not allowed (collection-under-collection has no canonical meaning).

Resource ¶

Bases: BaseModel

The single-item endpoint of a collection (the segment after {id}).

Singleton ¶

Bases: BaseModel

A resourceful endpoint with no enclosing collection.

Examples: /me, /health, /version, or sub-singletons like /users/{id}/avatar. Carries the same CRUD slots as Resource and may host sub-collections, sub-singletons, and actions. Has no resource slot — a Singleton is the resource.

Action ¶

Bases: BaseModel

A non-CRUD endpoint identified by a verb-phrase path segment.

Actions may attach at the root of the API, under a Namespace, under a Collection, under a Resource, or under a Singleton.

attr_override decouples the surface attribute name from the path's last segment. The generator uses it when set (otherwise it falls back to the path-derived snake_case). It exists to support --unmatched, where the attribute should reflect the operation's operationId rather than wherever in the URL it happens to land.

Operation ¶

Bases: BaseModel

A single HTTP operation declared on a path.

response_model names the literal 2xx response body schema (the envelope when the response wraps a list, or the resource itself for single-item responses). item_model names the inner schema of a list envelope when one is detected (plain type: array, or an object with an items/data/results/records/ entries array property). The generator uses it so paginated iteration can yield typed model instances instead of raw dicts; left as None when the response isn't list-shaped or the item schema is anonymous.

request_model_members is non-empty when the request body is an inline anyOf / oneOf union of $ref members (e.g. Login | RefreshAccessToken). The generator renders the body parameter as a Member1 | Member2 Python union type. When this list is empty and request_model is set, the body is typed as that single class.

pagination_supported defaults to True and is only meaningful on collection-fetch operations; the generator decides what to do with it on other operations. filter_supported and sort_supported default to False and will be flipped on by future x-okapipy-filter / x-okapipy-sort extensions; they drive whether the generator emits filter() / order_by() on the collection. response_headers lists the names of headers declared on the chosen 2xx response, useful to the generator for detecting Link, X-Total-Count, etc.

Loading specs and rules¶

loader ¶

Load an OpenAPI 3.x document from a local path or an http(s) URL.

load_spec is the public entry point. It auto-detects JSON vs YAML from the file content, fetches the document (off disk for paths, over HTTP for URLs), and returns the parsed mapping. $ref pointers are deliberately left intact: downstream code recovers schema names from the original $ref strings, and full reference resolution would be both unnecessary and prohibitively expensive on real-world specs (deeply self-referential schemas, unreachable external files).

detect_base_path reads the path component of the first servers[].url, and strip_base_path removes that prefix from each path key so subsequent path-walking sees segments relative to the API's logical root.

load_spec ¶

load_spec(source: str | Path) -> dict[str, Any]

Load an OpenAPI 3.x document, preserving $ref pointers as-is.

The source may be a local filesystem path or an http(s) URL. Format (JSON or YAML) is auto-detected from the file content.

Parameters:

Name	Type	Description	Default
`source`	`str \| Path`	Path or URL pointing to the spec.	required

Returns:

Type	Description
`dict[str, Any]`	The parsed spec as a plain dict, with `$ref`s left intact.

Raises:

Type	Description
`SpecLoadError`	When the document cannot be located, read, or parsed.

detect_base_path ¶

detect_base_path(spec: dict[str, Any]) -> str

Return the path component of the spec's first servers[].url, or an empty string.

OpenAPI 3.x uses servers to advertise base URLs; the path portion of the first server URL is treated as the API's base path. If no servers are declared (or the URL has no path), the empty string is returned and no stripping is performed.

rules ¶

External rules file: a project-local override layer for OpenAPI parsing.

A rules file lets a user supply (or override) x-okapipy-ns at the document root and x-okapipy-kind / x-okapipy-paginated / x-okapipy-exclude on path-items or operations without editing the OpenAPI document itself. Rules-file values take precedence over values declared inline in the spec.

The file must be local. URLs are not supported.

Rules ¶

Bases: BaseModel

The full rules document.

PathRules ¶

Bases: BaseModel

Rules entry for a single OpenAPI path.

OperationRules ¶

Bases: BaseModel

Per-method override entry inside a path's rules block.

load_rules ¶

load_rules(source: str | Path | None) -> Rules

Load a rules file from a local path, returning empty rules when source is None.

The file may be JSON or YAML; the format is auto-detected by attempting JSON first and falling back to YAML. URLs are rejected because the rules file is project-local.

Raises:

Type	Description
`RulesFormatError`	When the file cannot be read or parsed, or when an `x-okapipy-kind` value is not one of the four legal kinds.

NLP¶

nlp ¶

spaCy-backed POS and morphology lookup for path segments.

The classifier needs to know, for a single path segment, whether it looks like a plural noun (a collection: users, account-tokens), a verb or verb-phrase (an action: login, force-reimport), or neither. This module produces that summary by tagging the segment with a small spaCy model.

Three responsibilities live here:

Map an ISO language code to its spaCy model name (en -> en_core_web_sm).
Load the spaCy pipeline from a user-controlled cache directory, downloading the model on a cache miss via python -m spacy download --target <cache_dir>.
Split a segment on -/_, tag each token, and reduce the result to three mutually exclusive flags: is it a verb-phrase, is it plural, or is it singular/unknown. The compound-word logic uses the head-noun rule (last token determines role) with a postmodifier-word exception for constructions like units-of-measure or terms-and-conditions.

Two non-obvious workarounds preserve correctness against the small spaCy models:

Bare path tokens (tokens, users) get mistagged as singular PROPN. To detect plurality reliably, the segment is re-analyzed inside a definite-article wrapper from PLURAL_CONTEXT (e.g. "the tokens"); the head noun then carries the right Number morphology.
Verbs (reset, submit) keep their VERB tag in isolation but lose it inside the article wrapper. Each token is analyzed both ways and the signals combined.
A small per-language VERB_ACTION_REGISTRY covers high-traffic API verb endpoints (login, refresh, ping, ...) that spaCy mistags even with the workarounds above.

DEFAULT_CACHE_DIR `module-attribute` ¶

DEFAULT_CACHE_DIR = cwd() / '.spacy'

load_pipeline ¶

load_pipeline(lang: str, cache_dir: Path = DEFAULT_CACHE_DIR) -> Language

Load the spaCy pipeline for lang, downloading it on a cache miss.

The pipeline is cached per-process keyed by (lang, cache_dir), so repeated calls are cheap. On a cache miss the model is downloaded into cache_dir using python -m spacy download <model> --target <cache_dir>. Subsequent calls reuse the on-disk copy without touching the network.

Parameters:

Name	Type	Description	Default
`lang`	`str`	ISO language code; must exist in the language-to-model table.	required
`cache_dir`	`Path`	Directory under which model packages live.	`DEFAULT_CACHE_DIR`

Returns:

Type	Description
`Language`	A loaded spaCy `Language` pipeline ready for tagging.

Raises:

Type	Description
`NlpModelMissingError`	When the language is unknown or the download fails.

fetch_model ¶

fetch_model(lang: str, cache_dir: Path = DEFAULT_CACHE_DIR) -> Path

Download the spaCy model for lang into cache_dir and return its path.

Uses spaCy's own download command, passing --target so the package is laid out under cache_dir/<model_name>/... instead of being installed globally.

Raises:

Type	Description
`NlpModelMissingError`	When the download fails for any reason (network down, unknown model name, pip failure).

model_path ¶

model_path(lang: str, cache_dir: Path) -> Path

Return the on-disk directory that holds the spaCy model for lang.

python -m spacy download --target installs the model as a Python package laid out as <cache_dir>/<package>/<package>-<version>/.... This helper resolves the versioned subdirectory when one exists, and otherwise returns the package root (which is what tests using a stub directory will see).

Classifier and builder¶

The classifier and builder are the heart of the pipeline. You generally won't call them directly — use parse(...) — but their docstrings are the most precise statement of what each phase does.

classifier ¶

Classify a single OpenAPI path segment into a SegmentKind.

The structural builder calls classify_segment once per segment as it walks each path. The result decides whether the segment becomes a Namespace, Collection, Resource (path parameter), Singleton, or Action node in the tree.

The classifier applies the following precedence chain, stopping at the first match:

Path-parameter shape — a segment containing {...} is always a RESOURCE_ID.
Explicit hint — an x-okapipy-kind value passed in via extension_hint. The caller is responsible for merging spec values with rules-file values (rules win) before passing the hint here.
Namespace registry — if the cumulative path is declared as a namespace (via spec x-okapipy-ns or rules), the segment is a NAMESPACE.
NLP signal — analyze_segment reports verb-phrase / plural / singular, producing ACTION, COLLECTION, or (depending on parent) NAMESPACE / COLLECTION.
Fallback — emit a warning and treat the segment as a COLLECTION.

SINGLETON never falls out of NLP heuristics: real singletons (/me, /health) look identical to singular-noun namespaces, so the kind is reachable only through an explicit hint.

classify_segment ¶

classify_segment(
    *,
    segment: str,
    cumulative_path: str,
    parent_kind: SegmentKind | None,
    nlp: Language,
    ns_registry: set[str],
    extension_hint: str | None
) -> SegmentKind

Classify a single path segment into one of four kinds.

Parameters:

Name	Type	Description	Default
`segment`	`str`	The raw segment as it appears between `/` characters.	required
`cumulative_path`	`str`	The path so far, joined from previous segments without a leading or trailing slash; used for the namespace-registry lookup.	required
`parent_kind`	`SegmentKind \| None`	The kind of the previous segment, or None when at the root.	required
`nlp`	`Language`	A loaded spaCy pipeline used for POS and morphology.	required
`ns_registry`	`set[str]`	The union of namespace paths declared by the spec and rules.	required
`extension_hint`	`str \| None`	A pre-merged `x-okapipy-kind` hint with rules precedence; one of the five kind names, or None.	required

Returns:

Type	Description
`SegmentKind`	The classified `SegmentKind`.

builder ¶

Walk an OpenAPI document and produce a populated APIModel tree.

build is the single public entry. It iterates paths, classifies each segment via classify_segment, attaches the corresponding node (Namespace, Collection, Resource, Singleton, or Action) under its parent, and routes the path-item's HTTP methods to operation slots on that node. The function mutates the APIModel and its children in place — there are no draft or wrapper types.

Three concerns live in this module:

Naming. contextual_name joins the full breadcrumb of singular collection names accumulated so far, so /organizations/{id}/datasources/{id}/force-reimport yields OrganizationDatasourceForceReimport. Resource names use the breadcrumb for the same reason. singularize reduces a plural collection segment via the spaCy-backed lemmatizer.
Node attachment. Each segment is mapped to a node kind by the classifier; _attach then either creates a new child or reuses an existing one with the same name. Namespace-level actions are valid (e.g. /auth/login); a path that attempts to place an action directly under a Namespace raises InvalidStructureError only when structurally impossible.
Operation routing. GET/POST on a Collection map to fetch/create; GET/PUT/PATCH/DELETE on a Resource or Singleton map to retrieve/update/partial_update/delete. Operations that don't fit (e.g. POST /users/{id} with no x-okapipy-kind: action hint, PUT on a bare collection) are dropped with a warning rather than coerced into a synthetic action; synthetic actions exist only for explicit x-okapipy-kind: action opt-ins.

Schema names for request_model / response_model are recovered from the unresolved raw_spec by reading the trailing segment of the original $ref, falling back to the resolved schema's title when no ref is present. x-okapipy-exclude skips whole paths ("*") or specific methods (["DELETE", ...], case-insensitive); rules-file values override spec values on every conflict.

build ¶

build(
    spec: dict[str, Any],
    rules: Rules,
    nlp: Language,
    *,
    strip_prefix: str | None = None,
    unmatched_namespace: str | None = None
) -> APIModel

Construct an APIModel from an OpenAPI document.

$ref pointers in the spec are left intact: schema names for request_model and response_model are recovered from the trailing segment of each $ref, falling back to inline schema title when no ref is present.

Parameters:

Name	Type	Description	Default
`spec`	`dict[str, Any]`	The OpenAPI document, with `$ref`s preserved as in the source.	required
`rules`	`Rules`	A loaded `Rules` document (possibly empty).	required
`nlp`	`Language`	A loaded spaCy pipeline used by the classifier and naming engine.	required
`strip_prefix`	`str \| None`	Optional path prefix to strip from every path before classification, e.g. `/public/v1`. When set, this overrides the prefix inferred from `servers[].url`.	`None`
`unmatched_namespace`	`str \| None`	When set, operations that would otherwise be dropped by the routing table are retained as synthetic actions under a top-level namespace of this name. Raises `UnmatchedNamespaceCollisionError` when the name collides with an existing top-level node identifier.	`None`

Returns:

Type	Description
`APIModel`	A populated APIModel.

contextual_name ¶

contextual_name(breadcrumb: list[str], current: str) -> str

Return a contextual PascalCase name built from the full breadcrumb chain.

Every singular collection name and singleton segment accumulated in breadcrumb is concatenated, then the PascalCase form of current is appended. With an empty breadcrumb, only PascalCase(current) is returned.

Namespaces never enter the breadcrumb — they're pure folders and carry no semantic ownership. Singletons do, because the elements they host belong to them (the orders under /me are Me's orders, not generic orders), which also prevents file-name collisions when a top-level collection and a singleton sub-collection share a segment (/orders vs /me/orders).

Examples:

contextual_name([], "orders") == "Orders" contextual_name(["Order"], "lines") == "OrderLines" contextual_name(["Me"], "orders") == "MeOrders" contextual_name(["Organization", "Datasource"], "force-reimport") == "OrganizationDatasourceForceReimport"

Dumping the tree¶

dump ¶

Serialize an APIModel to JSON or YAML, inferring the format from a path.

write ¶

write(api: APIModel, path: Path) -> None

Write the APIModel to path, choosing JSON or YAML by file extension.

Parameters:

Name	Type	Description	Default
`api`	`APIModel`	The model to serialize.	required
`path`	`Path`	Destination file. The extension must be one of `.json`, `.yaml`, `.yml`.	required

Raises:

Type	Description
`ValueError`	When the file extension is not recognized.

to_json ¶

to_json(api: APIModel) -> str

Return a pretty-printed JSON representation of the APIModel.

Errors¶

Error hierarchy raised by the okapipy structural parser.

ParserError ¶

Bases: Exception

Base class for all errors raised by the structural parser.

SpecLoadError ¶

Bases: ParserError

Raised when the OpenAPI document cannot be loaded, parsed, or validated.

RulesFormatError ¶

Bases: ParserError

Raised when the rules file cannot be parsed.

NlpModelMissingError ¶

NlpModelMissingError(lang: str, cache_dir: str)

Bases: ParserError

Raised when the requested spaCy model is unavailable and cannot be downloaded.

Attributes:

Name	Type	Description
`lang`		The ISO language code that was requested.
`cache_dir`		The directory the loader looked in (and would have downloaded into).

lang `instance-attribute` ¶

lang = lang

cache_dir `instance-attribute` ¶

cache_dir = cache_dir

InvalidStructureError ¶

Bases: ParserError

Raised when the parsed structure violates the okapipy hierarchy rules.

Currently this signals an attempt to attach an Action directly under a Namespace, which is not permitted: every Action must live under a Collection or a Resource.

UnmatchedNamespaceCollisionError ¶

UnmatchedNamespaceCollisionError(
    requested: str, conflict_kind: str, conflict_name: str
)

Bases: ParserError

Raised when --unmatched <name> collides with an existing top-level node.

The synthesized container for unmatched operations must not share a snake_case identifier with any top-level Namespace, Collection, Singleton, or Action: that would produce two attributes with the same name on the generated client class. The caller picks a different name.

Attributes:

Name	Type	Description
`requested`		The name passed via `unmatched_namespace`.
`conflict_kind`		The kind of the conflicting node (`"namespace"`, `"collection"`, `"singleton"`, or `"action"`).
`conflict_name`		The original (pre-snake_case) name of the conflicting top-level node.

requested `instance-attribute` ¶

requested = requested

conflict_kind `instance-attribute` ¶

conflict_kind = conflict_kind

conflict_name `instance-attribute` ¶

conflict_name = conflict_name

Parser API reference¶

Public entry point¶

parse ¶

The data model¶

APIModel ¶

Namespace ¶

Collection ¶

Resource ¶

Singleton ¶

Action ¶

Operation ¶

Loading specs and rules¶

loader ¶

load_spec ¶

detect_base_path ¶

rules ¶

Rules ¶

PathRules ¶

OperationRules ¶

load_rules ¶

NLP¶

nlp ¶

DEFAULT_CACHE_DIR module-attribute ¶

load_pipeline ¶

fetch_model ¶

model_path ¶

Classifier and builder¶

classifier ¶

classify_segment ¶

builder ¶

build ¶

contextual_name ¶

Dumping the tree¶

dump ¶

write ¶

to_json ¶

Errors¶

ParserError ¶

SpecLoadError ¶

RulesFormatError ¶

NlpModelMissingError ¶

lang instance-attribute ¶

cache_dir instance-attribute ¶

InvalidStructureError ¶

UnmatchedNamespaceCollisionError ¶

requested instance-attribute ¶

conflict_kind instance-attribute ¶

conflict_name instance-attribute ¶

DEFAULT_CACHE_DIR `module-attribute` ¶

lang `instance-attribute` ¶

cache_dir `instance-attribute` ¶

requested `instance-attribute` ¶

conflict_kind `instance-attribute` ¶

conflict_name `instance-attribute` ¶