Site directory
Every page at a glance
A single index of all public pages on budov.org — grouped by what you are here to do.
CtrlKanywhere on the site to fuzzy-find routes.
Core
The main entry points most visitors use.
Explore
Interactive and visual tools to understand the corpus.
/playgroundPreview
PlaygroundLive NER demo on construction text./tokenizerPreview
TokenizerCompare GPT / mT5 / Liberta / BUDOVA tokenization./glossary
GlossaryBrowse construction terminology./coverage
Coverage25-oblast regional coverage map./compare
CompareBUDOVA vs UberText / CC-100 / UA-GEC./benchmarksPreview
BenchmarksModel leaderboard on the BUDOVA test set./embeddingsPreview
Embeddings2D semantic-space scatter./voicesAwaiting data
VoicesDialect audio gallery.Methodology
How the corpus is built and what quality it guarantees.
/guidelines
GuidelinesPublic annotation rulebook, citable by anchor./iaa
IAAInter-annotator agreement metrics./sources
ProvenanceSource catalogue by category./bias
Bias auditOver- and under-represented domains./annotation-tool
Annotation toolProduct preview of the contributor workspace./diffPreview
DiffGit-style change log between releases.Resources
Docs-adjacent utilities you will reach for after the first visit.
External
Where the data and code actually live.
huggingface.co/budovaAwaiting data
Hugging FaceDataset repositories.github.com/F1-bot/budova
GitHubSource code for platform and demos.zenodo.org/communities/budovaAwaiting data
ZenodoLong-term archival DOIs.www.knuba.edu.ua/
KNUCAHost institution.Legal
Policies that govern use of the site and the corpus.