State of Research · Voynich Lucidity

What Is Established

Confirmed

1

The vellum dates to 1404–1438.

Carbon-14 dating (University of Arizona, 2009) of four vellum samples confirms a pre-Renaissance origin with 95% confidence. This result has not been seriously contested and places the manuscript firmly in early fifteenth-century Europe.

2

The text is not random.

Montemurro & Zanette (2013, PLOS ONE) showed that word distribution follows patterns consistent with meaningful semantic structure — words cluster in ways that are statistically inconsistent with random generation. The text behaves as if it carries content.

3

The Currier A/B distinction is real.

Parisel (2026, arXiv:2604.25979) confirmed with 89.2% predictive accuracy that two statistically distinct "languages" exist in the manuscript. This is not a codicological artifact — the distinction persists within individual quires. An independent structural signal corroborates this: Cross-Boundary Mutual Information (CBMI) shows a large effect (Cohen's d = −1.01, permutation p < 0.001) between A and B folios, surviving a within-quire control (p = 0.0008) and yielding a Fisher combined full-corpus χ²(6) = 40.66, p < 0.000002 (Silva 2026, Zenodo).

4

The script has consistent grammar-like structure.

Zipf distribution, type-token ratio, and entropy measures all fall within the range of natural language. The text behaves statistically like language across multiple independent metrics — not like a simple substitution cipher or random noise.

5

No known natural language simultaneously matches the full structural profile.

63+ corpora from 35+ language families have been tested against the manuscript's typological fingerprint. None replicate the combination of VMML, Boundary Concentration, and CBMI simultaneously (Silva 2026, Zenodo). The manuscript occupies a structurally anomalous position relative to all known alphabetic scripts.

What Is Contested

Debated

1

Whether Tagalog (or any Philippine language) is structurally relevant.

One text (Noli Me Tangere by Rizal) enters the discriminant zone on VMML; a second Rizal novel does not. The pattern may be text-specific rather than a property of the language as a whole, which severely limits any claim about Tagalog's relationship to the manuscript (Silva 2026).

2

Whether the A/B distinction is binary or the surface of a more complex system.

Parisel (2026) identifies three structural layers within the manuscript; the traditional A/B label may be a projection of a higher-dimensional generative system. The binary framing may be too coarse to capture what is actually present.

3

Whether Bax (2014)'s identification of 14 proposed words is valid.

Bax applied comparative illustration analysis — matching visual context (plants, stars) to proposed words — and identified 14 candidate readings. The methodology is sound in principle. However, no independent replication has been published and the results remain disputed within the research community.

4

Whether the text encodes a natural language, an artificial language, or a complex cipher.

Current statistical evidence is consistent with all three hypotheses. The language-like statistics do not rule out a constructed language or a sophisticated cipher designed to mimic natural language patterns. No discriminating test has yet been definitive.

What Remains Unknown

Open

1

Who wrote it, and why.

No author has been identified with any confidence. The manuscript's purpose — whether medical, alchemical, devotional, fictional, or something else entirely — remains unknown. Proposed attributions (Roger Bacon, Voynich himself, others) have all been discredited or remain unsubstantiated.

2

Whether the illustrations correspond to real or imagined subjects.

The plants depicted do not match any known species. The astronomical diagrams do not clearly correspond to known constellation systems. Whether the illustrations encode real-world knowledge or are deliberate obfuscations — or illustrations of an imagined world — is unresolved.

3

Whether Currier A and B represent two scribes, two linguistic registers, or something else entirely.

The A/B distinction is statistically robust. Its cause is not. The difference could reflect two authors, two languages, two modes of encoding within a single language, or two phases of composition. No hypothesis has been confirmed.

4

What language family, if any, the script represents.

No confirmed linguistic affiliation has been established. The script's structural properties are consistent with certain typological families but rule out others. Definitive identification remains one of the central open problems in the field.

Recent Publications

2024–2026

Parisel (2026)

A Quantitative Confirmation of the Currier Language Distinction

arXiv:2604.25979 — Beta-Binomial mixture model. 89.2% predictive accuracy confirming two statistically distinct structural layers within the manuscript. The most rigorous quantitative treatment of the A/B distinction to date.

Silva (2026)

BPE Morpheme-Length Clustering Across 55+ Writing System Corpora

DOI: 10.5281/zenodo.20668229 — Systematic typological fingerprinting using BPE VMML, Boundary Concentration, and CBMI across 63 corpora from 35+ language families. Establishes the Voynich Manuscript's anomalous structural position relative to all tested alphabetic natural languages. v2.6 (2026-06-12) adds §5.10.1 prose-only robustness of the section null model.97 bits vs CBMI_B = 1.51 bits; Cohen's d = −1.01; within-quire p = 0.0008).

Silva (2026)

BPE VMML Cross-Text Instability in Tagalog

DOI: 10.5281/zenodo.20668970 — Demonstrates that Tagalog VMML varies by 0.336 units across two Rizal novels, undermining claims of language-level correspondence. Introduces cross-text stability as a methodological requirement for any structural comparison.

Zandbergen (ongoing)

Voynich.nu — Reference Site

voynich.nu — The most comprehensive maintained reference on the manuscript. Covers codicology, transcription systems, statistical analyses, and the history of research. Essential starting point for anyone entering the field.