Six charts from real RESON data: BPE VMML across 42 corpora, Zipf law, cross-text instability, boundary concentration, the 3D discriminant space, and EVA character frequency. All values computed from primary analysis; no approximations.
Sorted by VMML value. Gold bar = Voynich with 95% CI whiskers [5.77–6.05]. Color-coded by morphological type. The alphabetic ceiling (5.748) is the empirical upper bound for all tested natural languages — the Voynich exceeds it.
Source: Paper 7 typological analysis (Silva 2026, DOI: 10.5281/zenodo.20386119). 42 corpora · BPE max_merges=200, min_freq=3 · Voynich computed on 37,025 EVA tokens.
Log-log rank-frequency plot for 5,000 EVA tokens (2,016 types). The Voynich follows Zipf's law tightly — α = 0.916, R² = 0.930 — consistent with natural language, inconsistent with random or encoded text.
Source: voynich_5k.txt · 5,000 tokens, 2,016 types · Top tokens: daiin (230×), chol (131×), shedy (98×) · Fitted on ranks 1–200 · α=0.916 close to Zipf's ideal α≈1.0.
The same computation run on two different texts. Voynich is stable across sections; Tagalog's supra-ceiling value does not reproduce in a second text by the same author.
Δ(Tagalog) = 0.336 units across two Rizal novels. Δ(Voynich) across A/B sections ≈ 0.18 units (much smaller relative to scale). The instability gap is decisive.
BC measures edge-concentration of morpheme boundaries (1 = all at edges, 0 = all interior). Voynich = 0.361. All Philippine-branch languages ≈ 0.20. The gap is consistent and decisive.
43 corpora plotted in VMML (x-axis) × Boundary Concentration (y-axis) space. Bubble size = CBMI. The Voynich occupies an isolated region that no tested natural language simultaneously reaches on all three metrics.
Source: Paper 7 extended discriminant analysis (discriminant_3d.json). 43 corpora · Voynich = (5.918, 0.361, CBMI=0.724) · Gold dashed ellipse = discriminant zone · No language overlaps all 3 dimensions simultaneously.
Four-step diagram of the Byte-Pair Encoding pipeline used to compute VMML. Real pair-merge frequencies from the Voynich corpus: (ch) = 21,847 merges, (ee) = 18,203, (da) = 14,991.
Pair frequencies computed from full Voynich EVA corpus (37,025 tokens). Alphabetic ceiling = VMML at max_merges where no new merge crosses word boundaries.
22 EVA glyphs ranked by corpus frequency (n = 191,545 characters across 37,025 tokens). Color = positional role. Note that o leads overall (13.3%) but y is exclusively final-position — a hard structural constraint with no equivalent in random text.
Source: Paper 2 character analysis (eva_character_analysis.json). 22-glyph EVA alphabet · Positional roles from eva_functional_units.json · Strict positional enforcement rules out random generation.