Collaborate

01 · Corpus Contributors

Corpus Contributors

Do you have access to natural language corpora we haven't tested? We are particularly seeking formal literary text in Philippine-branch and Austronesian languages with sufficient token density for typological analysis.

Languages currently prioritized: formal literary Cebuano, Kapampangan, Hiligaynon, Malagasy, Ilocano, Waray. Minimum threshold: n ≥ 30,000 tokens. Corpus should be pre-modern or formal register where possible.

02 · Linguistic Experts

Linguistic Experts

We need expert assessment of our morphological density gradient hypothesis — specifically the claim that the Voynich text's morpheme boundary behavior is consistent with Philippine focus-morphology rather than agglutinative or isolating typologies.

Specialists sought: Philippine focus-morphology, Austronesian typology, historical linguistics of insular Southeast Asia, morphological complexity metrics. We can provide the full dataset and pipeline for independent review.

03 · Computational Researchers

Computational Researchers

Researchers with experience in BPE tokenization, unsupervised morpheme segmentation, or corpus linguistics who want to replicate or extend the analysis. Replication is explicitly encouraged — we would rather be refuted by good methodology than confirmed by bad faith agreement.

Full data access provided: raw corpus, tokenization pipeline, segmentation outputs, comparison matrices. Code is available on request pending formal collaboration agreement. No institutional affiliation required.

04 · Institutional Partners

Institutional Partners

Academic institutions or libraries interested in formal collaboration. The typological fingerprinting methodology developed for the Voynich Manuscript is generalizable — it can be applied to any sufficiently large undeciphered script corpus.

We can provide the complete analysis pipeline for manuscript collections, run typological analysis on institutional corpora, and co-author publications under appropriate academic frameworks. We welcome inquiries from linguistics departments, digital humanities centers, and rare manuscript libraries.

What We Offer Collaborators

Authorship

Co-authorship

Substantive contributions to methodology or corpus are credited as co-authorship on relevant publications.

Data

Full data access

All raw corpora, segmentation outputs, comparison matrices, and intermediate analysis files available to formal collaborators.

Documentation

Methodology docs

Complete technical documentation of the BPE pipeline, structural fingerprinting algorithm, and typological comparison framework.

Attribution

All contributions acknowledged publicly in paper acknowledgments, Zenodo dataset records, and this site.

On decipherment proposals: We prioritize methodological collaboration, not ready-made solutions. That said, we recognize that an unexpected observation from any direction — including from people outside the academic field — can open a genuinely productive line of investigation. If you have an observation about the manuscript's structure, statistics, or patterns that you think deserves systematic attention, we're interested. What we're less well-positioned to evaluate are proposed readings of the text (e.g., "I believe word X means Y") without an accompanying falsifiable methodology. The appropriate place for decipherment proposals is voynich.ninja — the most active community for such discussion.