Pull raw sources into wiki packages and distill them through T0–T3.
@alembic/ingestion turns files, browser snapshots, or API exports into append-only wiki packages: a standard folder with source.md, understanding.md, research-index.md, qa.md, metadata.json, chunks.jsonl, media_manifest.json, and raw.pointer.json.
@alembic/etl then runs a tiered pipeline over those packages: T0 scores and deduplicates them deterministically; T1–T3 extract, shortlist, and verify business signals with increasingly powerful models.
Think of it like… a gold refinery: ore (raw files) is crushed, assayed, and smelted into bars (verified signals).
The collector contract has eight phases: preflight → read cursor → select work → capture → materialize → reindex → validate → audit. Cursors guarantee idempotent re-runs. ETL uses SHA-256 dedupe, a six-axis package score (completeness, accuracy, clarity, actionability, novelty, provenance), a fail-closed budget guard, and PII redaction before emitting private-channel signals.
# ingest a folder of notes alembic ingest ./notes # run the full distill funnel alembic distill ./corpus --from discover --to review
runT0Pipeline is the deterministic substrate. It emits scored packages and a _alembic-residue.jsonl for items that need higher tiers. The funnel in @alembic/harness orchestrates T1–T3, using council and verifier gates before appending verified-GO signals to the opportunity graph.
Create a Markdown file with a business idea and run alembic ingest. Inspect the generated wiki package structure.