How Actalux works
What we publish
Actalux indexes public records of the Plan Commission & Architectural Review Board in Clayton, Missouri: videos and searchable transcripts of public meetings, and the records produced in session. We do not claim completeness of the record. We publish what we have gathered and can verify against a source.
Document processing
Documents are converted from their original format (PDF, HTML, or transcript) to plain text. Text is split into verbatim passages of approximately 200 words, preserving section boundaries and sentence integrity. No text is paraphrased or rewritten.
Meeting transcripts
Transcripts of public meetings are produced by automatic speech recognition — the open-source Whisper model — not by a human transcriber. They are accurate enough to search and read, but they can contain errors: misheard words, missing punctuation, and unlabeled or misattributed speakers. We publish the machine output as produced; we do not hand-correct it. Every transcript is shown next to the meeting's video and can be cued to the moment a passage was spoken, so any text can be checked against the recording. Longer meetings are divided into topic sections to aid navigation; those section labels are generated automatically and are a finding aid, not part of the record.
How search works
Search uses two methods in parallel:
- Keyword matching finds passages containing your exact search terms, using PostgreSQL full-text search.
- Semantic search finds passages with similar meaning, even when the exact words differ, using a 384-dimensional embedding (bge-small-en-v1.5) and pgvector cosine similarity.
Results from both methods are combined using reciprocal rank fusion (k=60); passages that appear in both result sets rank higher than those appearing in only one. The combined results are then re-ordered by a cross-encoder model that scores how directly each passage answers the query. To widen recall, your query may also be expanded into a few alternate phrasings (for example, "bond measure" alongside "bond referendum"), each searched in parallel; the re-ordering always scores against your original wording.
Citations and generated text
At ingestion time, every passage is verified to be an exact substring of its source document. Passages that fail this check are rejected. Each displayed passage has a stable hash ID (for example, #q3f9a1c20) that you can cite and share; the ID stays the same when a document is re-processed.
Actalux generates two kinds of plain-language text on top of the records:
- Cited summaries and answers. The summary shown above search results and the answers from "Ask the archive" are written so that every factual sentence cites a passage by its hash ID. After generation, each citation is checked against the passages actually retrieved, and any sentence whose citation does not check out is removed before display.
- Document descriptions and section labels. Each document carries a short description of what it contains, and longer transcripts carry automatic topic-section labels. These are generated from the document's own text to help you find and orient to a record. They are held to the same neutrality rules below, but they describe the document rather than citing individual passages — treat them as a finding aid. The record itself is the source documents and their verbatim passages.
What we do not do
- We do not editorialize or express opinions.
- We do not publish closed session content.
- We never publish sensitive personal data such as Social Security numbers or dates of birth; an automated check screens for these and blocks the record before it enters the archive.
- We do not infer intent or make causal claims unless explicitly stated in a source document.
- We do not advocate for or against any candidate, ballot measure, or policy position.
- We do not characterize a tax, levy, or rate change as an increase, a decrease, or "no increase." Whether a change counts as an increase depends on a chosen baseline, which is a political judgment; we report the actual levy or rate figures from the source instead.
Corrections
If you find an error, use the "Report an error" link on any page. Include the quote ID so we can locate the passage. Corrections are tracked publicly as GitHub issues; anyone can see what has been reported, investigated, and resolved.
Open source
The full source code is available at github.com/Actalux/actalux. Actalux is an independent public-records project. We take no paid advertising, track no users, and sell no data.