How the Voynich Manuscript compares structurally to other undeciphered and poorly understood writing systems.
The Voynich Manuscript is not the only undeciphered script. Linear A, the Indus Valley script, Rongorongo, and Proto-Elamite all remain unread. Each presents different structural challenges. Placing the Voynich within this comparative framework clarifies what kind of problem it represents — and what strategies are most likely to succeed.
| Script | Est. Date | Script Type | Bilingual Text | Corpus Size | Current Status | BPE VMML |
|---|---|---|---|---|---|---|
| Voynich | 1404–1438 carbon dated, UA 2009 |
Unknown Disputed |
None | ~37,000 tokens Beinecke MS 408 |
Undeciphered. Active research. Multiple competing hypotheses. | 5.918 above alphabetic ceiling |
| Linear A | c. 1800–1450 BCE Minoan Crete |
Syllabary Probable |
None known Linear B related but distinct |
~1,500 signs fragmentary |
Undeciphered. Syllabic values partially inferred from Linear B. Language unknown. | ≈ 4.85 preliminary, small corpus |
| Indus Script | c. 2600–1900 BCE Harappan Civilization |
Logographic/syllabic Debated |
None | ~4,000 inscriptions mostly seals, very short |
Undeciphered. Contested whether it constitutes a full writing system. | Not computed corpus too short for reliable BPE |
| Rongorongo | Pre-1722 CE Easter Island |
Possibly logographic Uncertain |
None | ~14,000 glyphs 25 surviving tablets |
Undeciphered. Relationship to oral tradition debated. Corpus severely limited. | ≈ 5.09 highly uncertain, small corpus |
| Proto-Elamite | c. 3200–2900 BCE SW Iran |
Logographic Partial |
None contemporary with proto-cuneiform |
~5,000 tablets large but mostly numeric |
Undeciphered. Accounting function partially understood; semantic content inaccessible. | Not computed predominantly numeric, unsuitable |
Among undeciphered scripts, three quantitative properties distinguish the Voynich from everything else in the comparative set.
BPE Mean Morpheme Length of 5.918 places the Voynich above the alphabetic ceiling established across 63 language corpora spanning 35 families. Among the undeciphered scripts where VMML is computable, Linear A reaches approximately 4.85 (preliminary, small corpus) and Rongorongo approximately 5.09 (highly uncertain). The Voynich value is an outlier in both the deciphered and undeciphered comparative sets.
A 47-morpheme EBNF grammar covers 92% of 37,025 tokens. No other undeciphered script has been shown to exhibit this level of internal grammatical regularity at this coverage rate. The grammar is compact — 47 rules — yet achieves near-total corpus coverage. This either reflects a genuine generative grammar or an exceptionally consistent encoding procedure. Either interpretation is remarkable.
Boundary Concentration (BC) of 0.361 exceeds any tested natural language from the Philippine, Malayo-Polynesian, or Basque families. High BC indicates morpheme boundaries cluster at token edges — the signature of strong prefix/suffix morphology. For undeciphered scripts where morphological structure is unknown by definition, this measurement is particularly significant: it suggests we can detect morphological architecture without reading the script.
The absence of corrections is significant. Working manuscripts — medical herbals, astronomical texts, recipe compilations — typically show evidence of revision, annotation, and correction. The Voynich shows none. This is consistent with either a clean copy (suggesting an exemplar existed) or a text produced without error because its content was not semantically meaningful to the scribe.
Medieval cipher manuscripts exist as a documentary category. Two are particularly relevant to Voynich comparison: the Rohonc Codex (Hungarian, probable 16th–19th century, long considered undeciphered) and the Copiale Cipher (German, 18th century, deciphered by Knight and Megyesi in 2011 using statistical methods).
The Rohonc Codex was recently shown to exhibit a high degree of structural regularity measurably different from natural language — its statistics behave differently from both known languages and the Voynich. The Copiale, once deciphered, revealed an ophthalmological ritual text; statistical methods succeeded because the encoding was a direct substitution on a phonetic base.
The Voynich resists both approaches. Its structure is more language-like than known cipher manuscripts on multiple metrics, yet less decodable than known languages using any currently available method. It occupies a structurally distinct position in the comparative space — neither clearly a cipher nor clearly a natural language, but exhibiting properties that would be unusual in either category.
This is, precisely, why it remains unsolved after more than a century of serious cryptographic and linguistic attention.
"Researchers specializing in undeciphered scripts, codicology, or medieval manuscript analysis are particularly welcome to reach out. Comparative data from other scripts — especially Linear A and Rongorongo — would significantly extend this analysis."
contact@voynichlucidity.com →