HomeWorkAboutContact

Case · AI · Civic tech

Ballot OCR Counter

Photos of handwritten ballots into an auditable tally, with rules around the OCR.

Role
Sole engineer
Year
2026
Status
In progress
Stack
.NET 10 · Python · Next.js
Problem

Handwritten ballots need OCR help, but the model cannot be the final authority.

My role

Sole engineer

Result

MVP built: roster checks, four-tier name matching, and tests against a real PostgreSQL.

01The problem

Manual counts take time, and children's handwriting on Malay-language ballots is a rough case for OCR. This project turns ballot photos into a tally, but it doesn't ask the model to be the final source of truth.

02What I built

  • Upload a ballot image, run vision extraction, apply eligibility rules, then store the counted result.
  • Run the web app, API, OCR worker, and PostgreSQL with Docker so the whole stack is repeatable.
  • Check group eligibility against a seeded roster before a vote is counted.

03Key decisions

Model as extractor

Gemini reads the handwriting and suggests a name; a four-tier matcher (exact, accent-folded, fuzzy, phonetic) plus the eligibility and counting rules all run outside the model.

Database rules matter

The tally depends on seeded rosters, eligibility checks, and stored state, so the tests run against PostgreSQL rather than mocks.

Real fixtures stay out of git

Test ballot photos and roster images are real, but git-ignored and never committed, so no real voter data lands in the repo.

04Checks and tests

  • 110 pgTAP assertions plus a Testcontainers suite exercise a real PostgreSQL instance, with state reset between tests — no ORM mocks.
  • Seeded rosters cover eligible, ineligible, duplicate, and wrong-group vote cases.
  • The upload path checks JPEG magic bytes, caps body size, and rejects open redirects; every endpoint sits behind Supabase JWT auth.

05Trade-offs

  • I chose an auditable pipeline over a one-shot OCR answer, even though it adds more moving parts.
  • Low-confidence extraction should go to review rather than being silently counted.

06Result

An OCR counting pipeline where AI helps with extraction, while eligibility rules, database constraints, and checks decide what gets counted.

07What I would improve next

  • Add a human review queue for low-confidence OCR results.
  • Add batch import/export so a full voting session can be checked outside the app.