Chart 1

BPE VMML — 42 Corpora

Sorted by VMML value. Gold bar = Voynich with 95% CI whiskers [5.77–6.05]. Color-coded by morphological type. The alphabetic ceiling (5.748) is the empirical upper bound for all tested natural languages — the Voynich exceeds it.

BPE VMML bar chart across 42 corpora. Voynich (gold) sits above the alphabetic ceiling of 5.748, isolated from all natural languages tested.
Voynich MS
Agglutinative
Fusional
Isolating
Alphabetic ceiling (5.748)

Source: Paper 7 typological analysis (Silva 2026, DOI: 10.5281/zenodo.20386119). 42 corpora · BPE max_merges=200, min_freq=3 · Voynich computed on 37,025 EVA tokens.


Chart 2

Zipf's Law — Voynich Token Frequency

Log-log rank-frequency plot for 5,000 EVA tokens (2,016 types). The Voynich follows Zipf's law tightly — α = 0.916, R² = 0.930 — consistent with natural language, inconsistent with random or encoded text.

Zipf law log-log plot for Voynich EVA corpus. Alpha = 0.916, R-squared = 0.930. Top tokens: daiin (230x), chol (131x), shedy (98x).

Source: voynich_5k.txt · 5,000 tokens, 2,016 types · Top tokens: daiin (230×), chol (131×), shedy (98×) · Fitted on ranks 1–200 · α=0.916 close to Zipf's ideal α≈1.0.


Chart 3

Cross-Text Instability — Tagalog vs Voynich

The same computation run on two different texts. Voynich is stable across sections; Tagalog's supra-ceiling value does not reproduce in a second text by the same author.

Δ(Tagalog) = 0.336 units across two Rizal novels. Δ(Voynich) across A/B sections ≈ 0.18 units (much smaller relative to scale). The instability gap is decisive.


Chart 4

Boundary Concentration — Voynich vs Tested Languages

BC measures edge-concentration of morpheme boundaries (1 = all at edges, 0 = all interior). Voynich = 0.361. All Philippine-branch languages ≈ 0.20. The gap is consistent and decisive.


Chart 5

The Discriminant Space — VMML × BC × CBMI

43 corpora plotted in VMML (x-axis) × Boundary Concentration (y-axis) space. Bubble size = CBMI. The Voynich occupies an isolated region that no tested natural language simultaneously reaches on all three metrics.

Scatter plot of 43 corpora in VMML x BC discriminant space. Bubble size = CBMI. Voynich isolated at (5.918, 0.361).

Source: Paper 7 extended discriminant analysis (discriminant_3d.json). 43 corpora · Voynich = (5.918, 0.361, CBMI=0.724) · Gold dashed ellipse = discriminant zone · No language overlaps all 3 dimensions simultaneously.


Chart 6

BPE Process — How VMML Is Computed

Four-step diagram of the Byte-Pair Encoding pipeline used to compute VMML. Real pair-merge frequencies from the Voynich corpus: (ch) = 21,847 merges, (ee) = 18,203, (da) = 14,991.

BPE process diagram: Raw EVA tokens → Count character pair frequencies → Merge most frequent pairs → Compute VMML from final vocabulary.

Pair frequencies computed from full Voynich EVA corpus (37,025 tokens). Alphabetic ceiling = VMML at max_merges where no new merge crosses word boundaries.


Chart 7

EVA Character Frequency Distribution

22 EVA glyphs ranked by corpus frequency (n = 191,545 characters across 37,025 tokens). Color = positional role. Note that o leads overall (13.3%) but y is exclusively final-position — a hard structural constraint with no equivalent in random text.

EVA character frequency distribution. o = 13.3%, e = 10.48%, h = 9.32%, y = 9.22% (exclusively final). 22 glyphs total, n=191,545 characters.
Initial position
Final position
Medial position
Initial/Medial

Source: Paper 2 character analysis (eva_character_analysis.json). 22-glyph EVA alphabet · Positional roles from eva_functional_units.json · Strict positional enforcement rules out random generation.

Was this page useful?