Architecture · Pipeline
Parsing and crawling fan out per package; the resolver and dedupe stages run as the package's graph completes; storage commits in batches.
Pipeline internals
This page describes the engine's internal layout. Only the CLI surface (nci index, nci query, nci sql) is part of the public contract — everything below is pub(crate) and may move between releases. It is documented so contributors and curious users can map “what the CLI does” to a real module.
Crate layout
| Module | Surface | What it does |
|---|---|---|
cli.rs | public | The clap parser. Every flag the user can type is here. |
config.rs | public | NciConfigFile (the on-disk schema), PackageScope, merge order helpers. |
scanner.rs | internal | Walks install roots and returns one row per discovered package install. |
filter.rs | internal | Applies package_scope, packages.include, packages.exclude. |
parser.rs | internal | Reads a .d.ts, emits parsed declarations and imports. |
crawler.rs | internal | Walks the per-package module graph from each entry. |
resolver.rs | internal | Resolves re-exports and dependency edges (nci-dep-v1) to terminal declarations. |
graph.rs | internal | Owns the in-memory symbol graph between resolve and store. |
dedupe.rs | internal | Identical-fold and overload-key merging — produces merge_provenance_json. |
storage.rs | internal | Bulk SQLite writes, FTS sync, schema migrations, per-package cache keys. |
pipeline.rs | internal | Orchestrates the modules above. nci index calls in here. |
cache.rs | internal | Per-package cache so re-indexing only touches changed packages. |
High-level data flow
- scanner
- filter
- parser
- crawler
- resolver
- dedupe
- graph
- storage
Why split it like this
- Splitting parser from crawler keeps parsing pure — the crawler decides how far to walk based on
--max-hopswithout re-parsing files. - Splitting resolver from dedupe lets
merge_provenance_jsoncarry exactly which mechanism merged a row, instead of mixing the two. - Splitting storage from everything else means the writer can run on its own thread; the rest of the pipeline does not block on disk.
For the on-disk shape the storage stage produces, see SQLite schema. For how the resolver decides what is “the same” symbol, see Re-export resolution.