NovaConvert is a production SaaS I designed, architected, and shipped end-to-end. An accounting invariant — not the language model — has the final vote over every extracted transaction. Patent pending (Canada).
A closed-form check that every bank statement already satisfies — used as the oracle, not the LLM.
The accounting identity is a free, deterministic check:
Every bank statement in existence satisfies this equation. If the extracted transactions don't, the extraction is wrong — full stop.
Making it the final authority in the pipeline — rather than the LLM — inverts the usual problem. Instead of trying to make the model accurate, you make the architecture tolerant of the model being wrong.
The deterministic check can always veto. That single inversion is what the rest of the system is built around.
FIRE flags the mismatch; the UI is where it gets resolved. A cell-addressable grid, live balance recomputation on every edit, and the bank's stated closing shown beside the computed one. When they match, the invariant turns green.
A scanned PDF. Multiple monthly statements, out of order, some spanning page breaks. Most tools can't handle it. NovaConvert sorts it automatically — before extraction even starts.
Lightweight OCR reads only the header region of each page — bank name, account number, date range. That thin text stream is sent to Claude Sonnet, which classifies every page and groups them by the statement they belong to. The expensive full-extraction model never sees pages from the wrong statement, and never wastes cycles deciding where the boundaries are.
Most extraction tools assume one statement per file. For accounting firms handling shoebox clients, scanned archives, or bulk batch uploads, manually splitting PDFs is often the longest step of the job — and the most thankless.
With statement boundaries already resolved, each statement flows through three stages. Narrow contracts, testable in isolation. Swappable AI, deterministic core, human finish.
For each statement identified by boundary detection, a language model produces a candidate set of transactions — dates, amounts, descriptions, running balances. Gemini, Claude, or Grok behind a factory.
FIRE enforces the accounting identity. When values fail the check, it tests alternate interpretations and commits only the one that balances. Every correction is audit-logged.
Anything the pipeline can't resolve surfaces in a reconciliation UI — the PDF alongside the transactions, live balance recomputation, one-click sign flips.
Default is shared SaaS behind Cloudflare Access. For firms with data-residency requirements, the same code ships to a private VPS you own — your tenant, your database, zero shared surface.
Whether the app runs on my infrastructure or yours, the same four properties ship on day one.
The interesting thing about a system isn't what's in it. It's what was considered and left out.
This is financial data. 'Mostly right' isn't acceptable — a wrong sign produces a wrong ledger. But blocking on human review for every disagreement would make the product unusable. The LLM gets no vote on the final answer: a deterministic validator arbitrates.
Every auto-correction is staged — never written to the ledger until the invariant re-validates. If a fix makes the balance worse, the whole correction set is rolled back to the primary extraction. No change lands that the check didn't confirm improved the state.
No extraction pipeline is 100%. Unresolved cases shouldn't dead-end into an error state. The reconciliation page is built as a live spreadsheet — PDF on one side, transaction grid on the other. Cell-addressable edits, sign flips, and missing-row inserts all recompute the running balance immediately, and the bank's stated closing is rendered alongside the computed one at every row.
If your output has a deterministic invariant, make the invariant the authority and let the models propose. Don't try to make models more accurate — make your architecture tolerant of them being wrong, using math they can't argue with.
I'm taking on new projects. Full-stack SaaS, document-processing pipelines, AI-augmented tools, internal platforms. Free 20-minute discovery call, projects from $2,500 CAD.
steven.sutankayo@novaconvert.ca