Semantic space
Corpus embeddings
A 2D projection of BUDOVA sentence embeddings — registers cluster cleanly because the domain vocabulary is so distinctive.
Preview · synthetic layout until v1.0Current layout is a seeded synthetic scatter with cluster centroids matching observed register separability. Real UMAP projection replaces this after v1.0 when the trained encoder runs over the full corpus.