The Work

The answer required building a comparison framework first. That framework — which segments corpora into byte-pair encoded tokens and computes morphological density, hapax ratio, token-type distribution, and affix boundary behavior across 29 language families — became the research itself.

The Voynich Manuscript is the test case that motivated the tool. The tool has outlasted any particular hypothesis about the manuscript, and can now be applied to other undeciphered scripts. The current focus is on validating the typological fingerprint methodology against known scripts before drawing firm conclusions about unknown ones.

Every dataset used in this research is available on Zenodo. Every claim on this site has gone through adversarial internal review before publication. The research is explicitly designed for replication — and L. would rather be refuted by good methodology than affirmed by credulous acceptance.

What L. does
What L. doesn't do

The Approach

"Every finding goes through what we call 'hostile peer review' before publication. If a finding has a fatal flaw, it stays in the draft folder. The bar for going public is not confidence — it is surviving the strongest objection we can construct."

Adversarial self-criticism is not a quality-control step appended at the end of the process. It is the process. For each major finding, L. formulates the most rigorous, methodologically competent rebuttal possible before the finding is considered publishable. Not the most convenient rebuttal — the strongest one. The one that, if valid, would invalidate the result entirely.

This means findings take longer to appear here than they would in a less disciplined workflow. It also means that what does appear has been tested against its own failure modes. Draft folders accumulate. Publication lists move slowly. That asymmetry is intentional.

The practice emerged directly from observing how Voynich research fails. Proposals accumulate, peer circles form around them, and the proposals eventually collapse — not because the underlying ideas were necessarily wrong, but because they were not stress-tested before being invested in. L. runs the stress test first.

Research Philosophy

The typological approach used here is deliberately modest in its scope. Identifying a family-level structural signature is not the same as identifying a language. Identifying a language is not the same as reading the text. Identifying a statistical similarity is not the same as proving a causal relationship. These distinctions are maintained explicitly throughout the research and are not treated as rhetorical caveats — they are load-bearing methodological constraints.

The methodology is designed for falsifiability. Every metric has a defined threshold. Every threshold has an operational definition. Every claim maps to a specific dataset that can be downloaded, re-run, and challenged. If the methodology is wrong, it should be possible for another researcher to demonstrate that with the same tools. That kind of exposure is not a risk to be minimised — it is the condition of doing science rather than speculation.

L. would rather be refuted by good methodology than affirmed by credulous acceptance.

On Anonymity

The research should stand on its methodology and data, not on the credentials of the researcher. In a field where authority is frequently invoked as a substitute for evidence, anonymity is a small corrective — it forces engagement with the work itself.

L. is not anonymous for dramatic effect. The pseudonym reflects a genuine methodological commitment: conclusions should be evaluated on their internal logic, their data quality, and their testability — not on whether the person producing them has a professorship, a PhD, or a name that appears in other publications.

L. accepts correspondence from researchers who engage with the work on its merits. Requests to reveal identity are not entertained. Requests to discuss methodology are welcomed.

Published Work

All papers are deposited on Zenodo with full datasets. arXiv preprints available where indicated.

Paper 7 · Typological Analysis

Structural Fingerprinting of the Voynich Manuscript: A BPE-Based Typological Comparison Across 29 Language Families

The core methodological paper. Applies byte-pair encoding segmentation to 55 corpora across 29 language families and computes VMML, Boundary Concentration, and CBMI scores against the Voynich text. Identifies a Philippine-branch morphological profile — the only zone in typological space partially overlapping with the Voynich Discriminant Zone. Explicit uncertainty quantification and adversarial counterarguments are embedded in the paper body.

Paper 8 · Austronesian Validation

Austronesian Focus-Morphology and the Voynich Morpheme Boundary: A Corpus-Based Assessment

Extension paper testing the Paper 7 hypothesis against a broader Austronesian corpus, with particular attention to the morphological density gradient distinguishing Philippine-branch languages from other Austronesian subfamilies. Introduces the cross-text VMML instability finding (Tagalog: Δ = 0.336 across Rizal novels). Includes an independent replication protocol for external researchers and a permutation test establishing that Voynich structural signatures are not an artifact of corpus size.

Currently Testing

Corpus acquisition and validation in progress. Results will be added to the What It Isn't tracker as they meet the evidence threshold.

In progress
Formal literary Cebuano
Acquiring corpus equivalent to the Rizal novels used for Tagalog. Register bias in NLLB corpus identified; literary register test pending.
In progress
Kapampangan
High priority. Philippine-branch language with complex focus-morphology architecture. Corpus sourcing underway.
In progress
Hiligaynon
~9.3M speakers. Philippine voice system, distinct morphology. Would test the VMML gradient hypothesis against a third Philippine-branch data point.
In progress
Malagasy
Austronesian isolate (Madagascar). VOS word order, morphologically distinct from Philippine-branch. Geographic outlier test for whether high-VMML is a Philippine-specific or broader Austronesian property.

Technical Correspondence

L. responds to technical correspondence within one week. Methodological questions, replication inquiries, and corpus contribution offers are welcome. Decipherment proposals and requests for identity disclosure are not.

Write to L.