This site uses essential browser storage for authentication and preferences. No tracking cookies are used. Privacy Policy
Version control

What changed

Every corpus release ships with a git-style diff of the underlying data — so downstream researchers can reproduce or roll back their benchmarks.

Preview · live diff generated at release
diff --corpusv0.3v0.42026-04
+812
Added fragments
−63
Removed fragments
~148
Re-annotated
+12
New sources
+lexicon"захисний шар арматури" — new canonical term, 42 occurrences
+sourceДБН В.1.2-7:2008 — seismic classification, 47 fragments
+speech+3.2 hours from Zakarpattia region
~nerreclassified 38 "material" spans → "structure" (finishing)
sourceRemoved 12 fragments from a retracted paper
+lexicon"двокамерний склопакет" — brand-qualified variant added
~nertightened span boundaries on 61 "measurement" entries
+estimate+14 field-estimate samples from Dnipro projects
corporateRemoved 51 proprietary lines by partner request
~speechRe-transcribed 7 audio samples after annotator feedback

Releases beyond v1.0 will produce this diff automatically from the dataset-versioning snapshot table. Until then, curated by the BUDOVA release manager.

Collaboration

Join BUDOVA

We are looking for researchers, construction professionals, and language specialists to participate in the project.

Supported by
Microsoft AI for Good Lab