● Confirmed Eliminations & Open Questions · 18 entries · Last updated June 2026
Random or Gibberish Text
BPE structure survives character perturbation testing. When characters are randomly shuffled (n=50 trials, corpus-size matched), Voynich VMML drops 22% — a structured decay signature. Truly random text collapses by more than 60% under identical perturbation. The internal structure is real.
Paper 8 · permutation test · n=50Tagalog as base language
Critical three-metric mismatch. While Tagalog (Noli Me Tangere, full corpus) enters the Voynich 95% CI for VMML, Boundary Concentration decisively diverges: Tagalog BC ≈ 0.20 (infix-dominated) vs Voynich 0.361 (edge-concentrated). The structural mechanisms are incompatible.
Papers 7 & 8 · BC divergence = 0.161Tagalog VMML as a stable language property
El filibusterismo (full corpus, same author as Noli) yields VMML = 5.578 — below the Alphabetic Ceiling. Cross-text variation Δ = 0.336 units. Supra-ceiling VMML in Tagalog is text-specific, not a property of the language. The Noli result does not generalise.
Paper 8 · Rizal cross-text instabilityPhilippine languages broadly
Cebuano (NLLB corpus): VMML = 5.609, below the Alphabetic Ceiling. No Philippine language tested simultaneously matches Voynich on VMML + BC + CBMI. VMML elevation, where it appears, is language- and text-specific, not a family-level pattern.
Paper 8 · multi-language Austronesian analysisIndonesian and Malay
VMML values are well below the Alphabetic Ceiling: Indonesian 4.838, Malay 5.293. Both languages sit comfortably in the mid-range of natural language distribution. Neither approaches the Voynich zone on any of the three metrics.
Paper 7 · 55-corpus baselineBasque as high-VMML reference
Basque (Euskara) is a canonical agglutinative language used as a control. VMML = 4.475 on natural text corpus — significantly below the Alphabetic Ceiling. Confirms that agglutination alone does not produce high VMML; morphological architecture matters more than typological label.
Paper 7 · agglutinative controlSchinner's stochastic hoax model (2007)
Schinner proposed the Voynich Manuscript was generated by a stochastic process simulating language without encoding real content. Random EVA generation under Schinner's model fails to replicate the full 3-metric profile — particularly the BC + CBMI combination that accompanies Voynich's VMML.
Paper 7 · structural profile falsificationTimm's self-citation mechanism (2020)
Timm and Schindler proposed a word-stress positional model involving self-citation of previous tokens. The EBNF/MI trade-off irreconcilable with observed BPE structure: self-citation would produce distinct boundary concentration patterns inconsistent with Voynich's measured BC = 0.361.
Paper 7 · mechanism incompatibilityAny of 63 tested natural language profiles
The Voynich Discriminant Zone — defined by VMML + BC + CBMI simultaneously — is occupied by no natural language corpus from any of the 35+ families tested. Not because we stopped looking: we tested 63 corpora and none qualifies. The zone is empirically empty.
Papers 7 & 8 · full 63-corpus baselineIlocano as candidate language
Ilocano VMML = 5.785 provisionally exceeds the Alphabetic Ceiling, making it the only language besides Tagalog (Noli) to do so. However, BC = 0.248 — infix-dominated, incompatible with Voynich's 0.361. Additionally, n = 12,807 tokens sits near the stability threshold; result requires larger corpus confirmation.
Paper 8 · provisional · n near stability limitCebuano (formal literary register)
The NLLB Cebuano corpus contains substantial code-switching with English and Filipino, artificially suppressing VMML. Formal literary Cebuano — comparable to Rizal's novels for Tagalog — has not yet been tested. Result could move in either direction.
Corpus gap · register bias identifiedKapampangan, Hiligaynon, Malagasy
Untested Philippine-branch languages with complex focus-morphology (Kapampangan, Hiligaynon) and a remotely-derived Austronesian outlier (Malagasy). These would either confirm or refute the emerging hypothesis that VMML elevation tracks focus-morphology density rather than phylogenetic membership.
Untested · next phaseKapampangan
Untested. Philippine-branch language with exceptionally complex focus-morphology. VMML unknown. High priority given the Tagalog and Ilocano results — Kapampangan's morphological architecture may test whether supra-ceiling VMML is tied to a specific focus-system subtype rather than the broader Philippine voice system.
Corpus acquisition pending · high priorityHiligaynon (Ilonggo)
Untested. Approximately 9.3 million speakers. Philippine voice system with morphological architecture distinct from Tagalog and Cebuano. A result either above or below the Alphabetic Ceiling would directly test the focus-morphology density gradient hypothesis — whether VMML elevation is a structural gradient or a discrete threshold.
Corpus acquisition pendingMalagasy
Untested. Austronesian but geographically isolated — the language reached Madagascar via maritime migration, diverging from other Austronesian branches for over a millennium. VOS word order and distinct morphological system. A result here would test whether the pattern emerging from Philippine-branch languages is Philippine-specific or a broader Austronesian property.
Corpus acquisition pending · geographical outlier testFormal literary Cebuano
The NLLB Cebuano corpus (social media, code-switching with English and Filipino) yields VMML = 5.609 — below the Alphabetic Ceiling. This is plausibly a register artifact. Formal literary Cebuano equivalent to the Rizal corpus used for Tagalog has not yet been tested. The result could move in either direction and is needed before Cebuano can be definitively characterized.
Register gap identified · literary corpus neededVoynichese as classical cipher of Latin or Greek
If the Voynich text were a substitution cipher of Latin or Greek, BPE segmentation would recover the morphological density of the underlying language, placing VMML within the alphabetic zone (≤ 5.748 — the Alphabetic Ceiling derived from 55+ corpora). Voynich VMML = 5.918 is inconsistent with any tested alphabetic natural language. A monoalphabetic substitution cipher cannot produce this structural elevation.
Paper 7 · 55+ corpora · alphabetic ceiling falsificationFinnish or Uralic languages as structural analogue
Finnish (Wikipedia corpus) VMML ≈ 4.6–4.9 in preliminary tests — well below the Alphabetic Ceiling despite canonical agglutinative morphology. The high-VMML pattern is not explained by agglutination generally. Finnish's morphological complexity produces a distinct structural signature that does not approach the Voynich zone on any of the three discriminant metrics.
Preliminary · agglutination ≠ VMML elevation confirmed