Single-file browser prototype. It extracts text with PDF.js, classifies pages into reusable document families,
applies deterministic extraction rules, normalizes data into one JSON schema, and shows source pages and confidence.
No LLM, no backend.
Extracted knowledge
Upload a PDF to begin.
Key facts
Validation checks
Detected page families
Asset allocation and financial rows
Watch-list / monitoring items
Agenda items
Motions
Attendees
Policies
Raw extracted text by page
Canonical JSON output
Provider alias config
This is where you scale beyond a few documents. Add aliases and asset-class names here, then click
“Re-run extraction.”