/ changelog

What we shipped.

Public build log. Truthful, technical, no marketing. If you'd rather read the why behind it, see the thesis.

  1. 2026-05-05

    Entity resolution v1: deterministic floor + probabilistic match

    Layered architecture for resolving the same contractor / project across messy public bid sources. Deterministic exact-match catches the obvious cases; Splink-backed probabilistic matching handles the long tail of name/address variations. Eval set + Wilson confidence intervals make the lift measurable.

    intelligencedata
  2. 2026-05-05

    NCDOT 2024 ingestion + cross-source resolution

    North Carolina DOT bid-tab XLS ingestion, end-to-end. The interesting part isn't the scrape — it's resolving entities across NCDOT, FDOT, and the project ontology so the same contractor in two states is one record.

    scrapersdata
  3. 2026-05-05

    Next.js 16 site + Supabase waitlist wired

    Migrated cassandri.com from a static prototype to a Next.js 16 app on Vercel. Waitlist signups now persist to Supabase. Foundation for the rest of the surface area.

    web
  4. 2026-05-05

    Scrapers SOP: onboarding skill + manifest + health checks

    Standard operating procedure for adding a new public source: onboarding skill, manifest schema, health check, fixture invariants. Phase 1 + Phase 6 gates hard-enforced. Adding the next state is now a checklist, not a snowflake.

    scrapersprocess
  5. 2026-05-04

    ER eval set + Wilson CI baseline

    Cross-source entity resolution now has a versioned eval set, with Wilson confidence intervals on precision/recall. Future ER changes are measured against this baseline, not eyeballed.

    intelligenceevals
  6. 2026-05-04

    Splink probabilistic ER pipeline + Parquet bridge

    Splink (probabilistic record linkage) integrated into the ER pipeline with a Parquet bridge between the warehouse and the matcher. Handles the cases deterministic match can't: name variants, address normalization, partial matches.

    intelligencedata
  7. 2026-05-04

    FDOT 2025 backfill + multi-slug URL resolution

    Florida DOT publishes the same bid letting under multiple URL slugs depending on the year/letting type. Backfilled 2025 + made the resolver tolerant to the slug variations so we don't miss future lettings.

    scrapersdata
  8. 2026-05-04

    FDOT bid-tab scraper end-to-end

    First public source landed: Florida DOT bid tabs, scraped, parsed, and persisted. Sets the pattern every subsequent state follows.

    scrapers
  9. 2026-05-04

    dbt scaffolding + synthetic bid_tabs source

    dbt models for bid_tabs and a synthetic source so transforms can be tested before real data lands. Lets us evolve the schema with confidence.

    data
  10. 2026-05-04

    Engine + ontology layer scaffolded

    Foundation for the intelligence layer: an engine that runs over a typed ontology of construction-domain entities (contractors, projects, lettings, line items) so downstream questions are queries against meaning, not text.

    intelligencedata
  11. 2026-05-04

    Supabase pooler + R2 archival

    Connected to Supabase via Session Pooler for warehouse workloads; raw scraped artifacts archived to Cloudflare R2 so we always have provenance.

    infra