Step 04 · Data layer · Data layer · Ingestion and ETL ENPT
Alembic Complete Visual Course

Ingestion and ETL

Pull raw sources into wiki packages and distill them through T0–T3.

Read the plain version, or open the technical layer on any section.
1

From sources to signals


@alembic/ingestion turns files, browser snapshots, or API exports into append-only wiki packages: a standard folder with source.md, understanding.md, research-index.md, qa.md, metadata.json, chunks.jsonl, media_manifest.json, and raw.pointer.json.

@alembic/etl then runs a tiered pipeline over those packages: T0 scores and deduplicates them deterministically; T1–T3 extract, shortlist, and verify business signals with increasingly powerful models.

Think of it like… a gold refinery: ore (raw files) is crushed, assayed, and smelted into bars (verified signals).

Under the hood

The collector contract has eight phases: preflight → read cursor → select work → capture → materialize → reindex → validate → audit. Cursors guarantee idempotent re-runs. ETL uses SHA-256 dedupe, a six-axis package score (completeness, accuracy, clarity, actionability, novelty, provenance), a fail-closed budget guard, and PII redaction before emitting private-channel signals.

2

In one picture


T0 ETLscore / dedupe T1 Extractlocal model T2 Shortlistfrontier model T3 Verifycouncil + verifier GO
Raw wiki packages move through T0 scoring, T1 extraction, T2 shortlisting, and T3 verification.
3

In the code


# ingest a folder of notes
alembic ingest ./notes

# run the full distill funnel
alembic distill ./corpus --from discover --to review

Implementation notes

runT0Pipeline is the deterministic substrate. It emits scored packages and a _alembic-residue.jsonl for items that need higher tiers. The funnel in @alembic/harness orchestrates T1–T3, using council and verifier gates before appending verified-GO signals to the opportunity graph.

4

Try it


Create a Markdown file with a business idea and run alembic ingest. Inspect the generated wiki package structure.

Observe: What happens in the output when you ingest the same file twice?
5

Quick check


What does T0 in the ETL pipeline do?