PDF Knowledge Extractor

Single-file browser prototype. It extracts text with PDF.js, classifies pages into reusable document families, applies deterministic extraction rules, normalizes data into one JSON schema, and shows source pages and confidence. No LLM, no backend.

Extracted knowledge

Upload a PDF to begin.

Key facts

Validation checks

Detected page families

Asset allocation and financial rows

Watch-list / monitoring items

Agenda items

Motions

Attendees

Policies

Raw extracted text by page

Canonical JSON output

Provider alias config

This is where you scale beyond a few documents. Add aliases and asset-class names here, then click “Re-run extraction.”