Version control
What changed
Every corpus release ships with a git-style diff of the underlying data — so downstream researchers can reproduce or roll back their benchmarks.
Preview · live diff generated at releasediff --corpusv0.3…v0.42026-04
+812
Added fragments
−63
Removed fragments
~148
Re-annotated
+12
New sources
+lexicon
+source
+speech
~ner
−source
+lexicon
~ner
+estimate
−corporate
~speech
Releases beyond v1.0 will produce this diff automatically from the dataset-versioning snapshot table. Until then, curated by the BUDOVA release manager.