This site uses essential browser storage for authentication and preferences. No tracking cookies are used. Privacy Policy
Evaluation

Benchmark leaderboard

Model performance on the BUDOVA held-out test set. Ships alongside v1.0 with full fine-tune runs; numbers below are indicative targets.

Preview · v1.0 will have real runs
#ModelNER F1PerplexityTerm acc.
1
BUDOVA-XLM-R-base (ours)Domain-adapted XLM-R, LoRA fine-tune on BUDOVA v1.0
0.891+0.2711.2-18.100.880+0.22
2
Liberta-UK-largeUkrainian LM with supervised NER head
0.782+0.1615.4-13.900.740+0.08
3
XLM-R-large (zero-shot)No BUDOVA fine-tune
0.651+0.0322.4-6.900.520-0.14
4
mBERT-baseMultilingual baseline
0.61226.8-2.500.480-0.18
5
GPT-4o (few-shot)5-shot prompt, no fine-tune
0.5980.610-0.05
6
Random baselineUniform label assignment
0.0720.160
Collaboration

Join BUDOVA

We are looking for researchers, construction professionals, and language specialists to participate in the project.

Supported by
Microsoft AI for Good Lab