Architecture · Pipeline

Pipeline internals

This page describes the engine's internal layout. Only the CLI surface (nci index, nci query, nci sql) is part of the public contract — everything below is pub(crate) and may move between releases. It is documented so contributors and curious users can map “what the CLI does” to a real module.

Crate layout

Module	Surface	What it does
`cli.rs`	public	The clap parser. Every flag the user can type is here.
`config.rs`	public	`NciConfigFile` (the on-disk schema), `PackageScope`, merge order helpers.
`scanner.rs`	internal	Walks install roots and returns one row per discovered package install.
`filter.rs`	internal	Applies `package_scope`, `packages.include`, `packages.exclude`.
`parser.rs`	internal	Reads a `.d.ts`, emits parsed declarations and imports.
`crawler.rs`	internal	Walks the per-package module graph from each entry.
`resolver.rs`	internal	Resolves re-exports and dependency edges (`nci-dep-v1`) to terminal declarations.
`graph.rs`	internal	Owns the in-memory symbol graph between resolve and store.
`dedupe.rs`	internal	Identical-fold and overload-key merging — produces `merge_provenance_json`.
`storage.rs`	internal	Bulk SQLite writes, FTS sync, schema migrations, per-package cache keys.
`pipeline.rs`	internal	Orchestrates the modules above. `nci index` calls in here.
`cache.rs`	internal	Per-package cache so re-indexing only touches changed packages.

High-level data flow

scanner
filter
parser
crawler
resolver
dedupe
graph
storage

Parsing and crawling fan out per package; the resolver and dedupe stages run as the package's graph completes; storage commits in batches.

Why split it like this

Splitting parser from crawler keeps parsing pure — the crawler decides how far to walk based on --max-hops without re-parsing files.
Splitting resolver from dedupe lets merge_provenance_json carry exactly which mechanism merged a row, instead of mixing the two.
Splitting storage from everything else means the writer can run on its own thread; the rest of the pipeline does not block on disk.

Index concurrency (`concurrency.rs`)

Before the per-package loop, the pipeline picks a concurrency plan:

Default: one package at a time; multi-core work happens inside that package (parallel file reads and symbol linking when the package is large enough).
--package-parallel: several packages at once when more than one package is indexed; per-package multi-core work is turned off; finished packages wait in a bounded queue for the single SQLite writer.

nci thread-budget --package-count N prints that plan without indexing. See Indexing · Concurrency for a plain-language summary and how to read timing lines.