Methodology

How Actalux works

How Actalux works

What we publish

Actalux indexes public records of the Board of Adjustment in Clayton, Missouri: videos and searchable transcripts of public meetings, and the records produced in session. We do not claim completeness of the record. We publish what we have gathered and can verify against a source.

Document processing

Documents are converted from their original format (PDF, HTML, or transcript) to plain text. Text is split into verbatim passages of approximately 200 words, preserving section boundaries and sentence integrity. No text is paraphrased or rewritten.

Meeting transcripts

Transcripts of public meetings are produced by automatic speech recognition — the open-source WhisperX model — not by a human transcriber. The recognizer is run with no name hints, so the words are a faithful record of what was said; it can still contain errors such as misheard words or missing punctuation. Every transcript is shown next to the meeting's video and can be cued to the moment a passage was spoken, so any text can be checked against the recording. Longer meetings are divided into topic sections to aid navigation; those section labels are generated automatically and are a finding aid, not part of the record.

Speaker labels and name corrections

Two automatic steps add structure to a transcript while preserving the verbatim words:

  • Speaker turns. A diarization model (pyannote) groups the audio by distinct voice, so the transcript reads as a back-and-forth of speaker turns rather than one undivided block. This groups the words; it never changes them.
  • Identifying a speaker. A voice is given a person's name only when the recording itself provides strong, checkable evidence — a roll call or a self-introduction that matches a member of the body's published roster. When that evidence is unambiguous, the name is shown; otherwise the voice stays anonymous, labeled "Speaker 2," "Speaker 3," and so on. We never guess a name; proposals that do not clear that bar are held for human review rather than published.
  • Name corrections. Speech recognition often misspells proper nouns that the public record spells consistently — a council member's surname, a street, a contractor. The transcript is shown with the corrected spelling, drawn only from names already in this jurisdiction's public record; the verbatim "as transcribed" text stays one click away via the toggle, and the original words are never discarded.

Because identification depends on what the recording makes explicit, many speakers — members of the public, and officials who never state their name aloud — remain anonymous. That is deliberate: we would rather leave a voice unlabeled than attach the wrong name.

How search works

Search uses two methods in parallel:

  • Keyword matching finds passages containing your exact search terms, using PostgreSQL full-text search.
  • Semantic search finds passages with similar meaning, even when the exact words differ, using a 384-dimensional embedding (bge-small-en-v1.5) and pgvector cosine similarity.

Results from both methods are combined using reciprocal rank fusion (k=60); passages that appear in both result sets rank higher than those appearing in only one. The combined results are then re-ordered by a cross-encoder model that scores how directly each passage answers the query. To widen recall, your query may also be expanded into a few alternate phrasings (for example, "bond measure" alongside "bond referendum"), each searched in parallel; the re-ordering always scores against your original wording.

Citations and generated text

At ingestion time, every passage is verified to be an exact substring of its source document. Passages that fail this check are rejected. Each displayed passage has a stable hash ID (for example, #q3f9a1c20) that you can cite and share; the ID stays the same when a document is re-processed.

Actalux generates two kinds of plain-language text on top of the records:

  • Cited summaries and answers. The summary shown above search results and the answers from "Ask the archive" are written so that every factual sentence cites a passage by its hash ID. After generation, each citation is checked against the passages actually retrieved, and any sentence whose citation does not check out is removed before display.
  • Document descriptions and section labels. Each document carries a short description of what it contains, and longer transcripts carry automatic topic-section labels. These are generated from the document's own text to help you find and orient to a record. They are held to the same neutrality rules below, but they describe the document rather than citing individual passages — treat them as a finding aid. The record itself is the source documents and their verbatim passages.

What we do not do

  • We do not editorialize or express opinions.
  • We do not publish closed session content.
  • We never publish sensitive personal data such as Social Security numbers or dates of birth; an automated check screens for these and blocks the record before it enters the archive.
  • We do not infer intent or make causal claims unless explicitly stated in a source document.
  • We do not advocate for or against any candidate, ballot measure, or policy position.
  • We do not characterize a tax, levy, or rate change as an increase, a decrease, or "no increase." Whether a change counts as an increase depends on a chosen baseline, which is a political judgment; we report the actual levy or rate figures from the source instead.

Corrections

If you find an error, use the "Report an error" link on any page. Include the quote ID so we can locate the passage. Corrections are tracked publicly as GitHub issues; anyone can see what has been reported, investigated, and resolved.

Source available

The full source code is available to read at github.com/Actalux/actalux — published so the methods behind every page can be inspected. Actalux is an independent public-records project. We take no paid advertising, track no users, and sell no data.