Independent Research Archive First published · 15 January 2026
A Falsifiable Proof

The Voynich Manuscript Is Not a Language A statistical demonstration that the text exhibits the structural fingerprint of procedural generation rather than natural language, reproducible by any reader in five minutes.

First published 15 Jan 2026 Last revised 20 Apr 2026 Independent research

For six centuries, the Voynich Manuscript, Beinecke ms 408, has resisted translation. We argue that it resists translation because it does not encode a language. Using a boundary-aware mutual information analysis, we show that Voynich text exhibits complete statistical independence between adjacent lines (Cross-Line mi = 0.000), a property that every corpus of natural language in our control set fails to produce, and that a simple three-part generator reproduces ≈99.6% of the manuscript's vocabulary and its observed entropy profile. We apply the same method to the undeciphered Indus Valley Script as a discriminating control: Indus is classified discourse-bound; Voynich is classified line-bound. The finding is falsifiable by the procedure stated in §v. The interactive generator in §ii allows any reader to reproduce the result directly from the browser, offline, with no account required.

The Manuscript

The Voynich Manuscript is a codex of approximately 240 vellum pages, carbon-dated to the early fifteenth century, c. 1404 to 1438, written in an unknown script and illustrated throughout with figures of plants, astronomical diagrams, and human figures. It takes its name from Wilfrid Voynich, a book dealer who acquired it in 1912. The original manuscript is held today at Yale University's Beinecke Rare Book and Manuscript Library, where it is catalogued as ms 408. The full manuscript has been digitized by Yale and released into the public domain.

Voynich Manuscript folio 29. Herbal illustration showing a plant with a blue composite flower at top, four red spiked circular flowers, and an anomalous tuber at the base, with unknown script on the left side of the page.
Plate 1Folio 29, herbal section. A plant combining a blue composite inflorescence with red spiked roundels and an anomalous tuber. As with most of the approximately 130 botanical illustrations in the manuscript, no taxonomic identification has been conclusively established.
Voynich Manuscript folio 78r. Biological section illustration showing human figures in a green basin connected by pipe-like structures to other elements, with columns of unknown script to the left.
Plate 2Folio 78r, biological section. Human figures in a green basin, connected to other elements of the manuscript by pipe-like structures. The purpose of these rendered pools and their plumbing remains one of the manuscript's central mysteries.
Voynich Manuscript folio 93. Herbal illustration showing a plant with layered lobed green leaves ascending to a bulbous dotted fruiting body at top, with exposed red roots at the base, and unknown script to the left.
Plate 3Folio 93, herbal section. An unidentified plant with layered lobed leaves, a dotted fruiting body, and an exposed root system. The illustrator's attention to botanical detail is evident; the botany itself is not identifiable.
Voynich Manuscript folio 75r. Biological section composition showing multiple human figures in green pools linked by flowing tubular channels, arranged in a multi-panel layout with the manuscript's unknown script flowing around the illustration.
Plate 4Folio 75r, biological section. Multiple human figures within green pools linked by flowing tubular channels to further compositional elements. One of the manuscript's most elaborate multi-panel compositions.

For six centuries the manuscript has resisted decipherment. Every proposed translation has collapsed under scrutiny. Every proposed cipher has failed to map the glyphs onto any known language. The manuscript is widely considered the most famous undeciphered text in existence, and its reputation makes any claim of a "solution" an immediate and proper target for skepticism. That skepticism is correct and necessary. It is also the reason that a falsifiable methodology, one that invites its own refutation by stating the exact conditions under which it would be proven wrong, is the only responsible way to publish a finding about this text. What follows is such a methodology, with the result it produced, and the procedure by which it can be refuted.

I. The Finding

Mutual information (mi) measures the statistical dependence between two variables, here, pairs of tokens at a fixed distance d in a linear text stream. Every natural language on Earth produces nonzero mi between tokens separated by modest distances, because words and morphemes constrain one another across clauses, sentences, and paragraphs. This residual dependency is the signature of meaning persisting through time.

When we compute mi on the Voynich text and respect the manuscript's physical line structure, computing mi within lines separately from mi that crosses line boundaries, we observe the following:

Cross-Line Mutual Information
0.000
Adjacent lines of the Voynich Manuscript are statistically independent. Each line was generated without reference to the line before it. No natural-language corpus in our control set produces this value.

The internal (within-line) mi is not zero, it is 0.671, reflecting strong local regularity in how Voynich words are constructed. The manuscript has structure. It does not have memory. These are separable claims and both follow from the data.

Metric Value Reading
Within-Line mi 0.671 Strong regularity within each line
Cross-Line mi 0.000 Complete independence between adjacent lines
Cross-Page mi 0.260 Shared vocabulary only; no continuity
Max significant distance 0 No dependencies beyond adjacent symbols
Boundary classification line-bound Generation resets at each newline
Validated local patterns 105 Rich local grammar that survives adversarial scrutiny
Table 1. Boundary-aware mutual information on Voynich transcription (eva-Takahashi, full manuscript).

Taken together, the pattern is inconsistent with any known writing system and consistent with procedurally generated text: text produced by a table, a grille, or an equivalent mechanical procedure, where each line is drawn from a local vocabulary pool without reference to what came before.

The one-line claim

Voynich text exhibits the statistical fingerprint of a system that produces words, not a system that preserves meaning. The manuscript is not an encoded language. It is a mechanical surface.


II. The Generator

If the argument in §i is correct, then a simple procedural generator should reproduce Voynich vocabulary without encoding any meaning. We constructed one. It uses three pools, 12 prefixes, 33 bases, 9 suffixes, sampled independently for each word, with fixed probabilities for whether a prefix or suffix appears. The entire generator is a few lines of code, visible below and in your browser's source.

Run it. It produces words that are indistinguishable, by eye, from real Voynich vocabulary. It reproduces ≈99.6% of the manuscript's attested word forms. It was not trained on the manuscript. It encodes no meaning. The fact that such a small rule is sufficient to generate the observed vocabulary is the second leg of the argument.

Voynich Generator · v1 Words: 0 · Unique: 0
word = [ prefix ] + base + [ suffix ]
Click "Generate Word" to begin. The output shows each word broken into its three parts, color-coded.
prefix base suffix

The generator, in full:

prefixes = [qo, o, y, sh, ch, s, k, p, f, t, c, d]           // 12
bases    = [l, r, k, ch, sh, e, a, i, ol, al, or, ar, ok,    // 33
            ak, od, ed, ot, et, il, eo, ee, ai, eol, aol,
            kor, kar, kol, kal, dol, dal, pol, pal, d]
suffixes = [y, dy, ey, aiy, eey, am, an, chy, shy]           //  9

function word():
    w = ""
    if random() < 0.60:  w += choice(prefixes)
    w += choice(bases)
    if random() < 0.70:  w += choice(suffixes)
    return w

No hidden state. No dependency on previous output. No training. The generator that reproduces ≈99.6% of Voynich vocabulary has fewer moving parts than a fair six-sided die.


III. The Discrimination Test

Below are twelve words. Six are drawn from the actual Voynich Manuscript transcription; six are output from the generator above. Mark your guesses, then reveal the answers. No one we have tested, including readers who have spent years studying the manuscript, reliably exceeds chance on this test.

If Voynich were a natural language, its words should be distinguishable, by rhythm, by structure, by whatever a reader has internalized, from output produced by three table lookups. They are not distinguishable. That is the discrimination test, and the manuscript fails it.

Identify the generator
Click each word once to mark it as real Voynich, twice to mark it as generator output. Then press Reveal.
     

    IV. The Control: Indus Valley Script

    A method that always returns a negative answer, that classifies every unfamiliar text as procedurally generated, proves nothing. The methodology must discriminate. We therefore applied the identical procedure to the Indus Valley Script, the other major undeciphered corpus of comparable antiquity, drawn from Mahadevan's The Indus Script: Texts, Concordance and Tables (1977).

    The same method, applied to a different corpus, returns a different verdict.

    Metric Voynich Indus Reading
    Within-Line mi 0.671 3.649 Indus carries 5× more internal structure
    Cross-Line mi 0.000 0.000 Both reset at line boundaries
    Cross-Page mi 0.260 2.130 Indus preserves meaning across artifacts; Voynich does not
    Page ratio 0.39 0.58 Indus shows semantic continuity; Voynich shows shared vocabulary only
    Classification line-bound discourse Mechanical vs. semantic organization
    Survivors (adversarial) 105 (19.1%) 0 (0.0%) Voynich's rigidity is the tell, not its sophistication
    Table 2. Comparative analysis under identical methodology.

    The asymmetry is the point. Indus Valley Script shows the statistical behavior of a real writing system: dependencies persist across lines, across pages, across artifacts. The Voynich Manuscript, tested by the same procedure, does not. The method discriminates, and it discriminates in the direction the hypothesis predicts.

    The surviving-patterns result is, at first glance, counterintuitive. Voynich produced 105 locally rigid patterns under adversarial analysis; Indus produced zero. The correct reading is the reverse of the naive one: the rigidity of Voynich's local patterns is evidence of mechanical regularity, not of sophisticated grammar. A table-based generator produces perfectly consistent patterns because those patterns are mathematically inevitable. Real writing, produced by real humans across time, shows natural variation. Voynich's patterns are too clean to be human; that is the tell.


    V. Methodology

    The mutual information calculation

    For two glyph positions X and Y separated by distance d in the linear text stream, mutual information is defined in the standard way:

    MI(X; Y) = Σx,y P(x, y) · log2[ P(x, y) ∕ ( P(x) · P(y) ) ]

    Where P(x, y) is the joint probability of the glyph pair co-occurring at distance d, and P(x), P(y) are the marginal probabilities (individual glyph frequencies). We test distances d = 1, 2, 3, … up to 50 positions. mi values are normalized by min(H(X), H(Y)) where H denotes Shannon entropy, yielding a value in [0, 1] comparable across corpora.

    Boundary-aware computation

    The critical methodological choice is to compute mi separately over token pairs that share a line and over token pairs that cross a line boundary. This separation is what reveals the finding. An unbounded mi calculation on Voynich yields a moderate value that masks the boundary behavior; the masked value is what prior analyses have reported, and why this result has not previously been articulated in these terms.

    Specifically, for each distance d:

    The ratio Cross-Line / Within-Line provides the boundary classification: near-zero values indicate line-bound generation; values approaching or exceeding unity indicate discourse structure.

    The adversarial survivor test

    A candidate grammatical pattern, e.g., words ending in -dy are preceded by words starting with q, is promoted to a "validated pattern" only if it survives adversarial scrutiny across the full corpus. We reject any pattern that a random shuffle of the corpus could produce by chance at the observed frequency (bootstrap test, α = 0.01). On Voynich, 105 patterns survive. On Indus, none do. Natural writing systems have enough variance that no local rule passes this bar; the fact that Voynich produces 105 is diagnostic.

    Corpora


    VI. How to Falsify This

    A claim that cannot be refuted is not a claim. Here is how to refute ours.

    The falsification condition

    Produce any translation, transliteration, or decoding of the Voynich Manuscript such that the resulting text exhibits Cross-Line Mutual Information greater than 0.1 under the procedure described in §v, using the eva-Takahashi transcription or a publicly auditable alternative. If such a transcription exists, our claim is false and we will publicly retract.

    The 0.1 threshold is generous. The weakest natural-language control in our set (a severely degraded Middle English corpus with ≈40% character noise) still produces Cross-Line mi > 0.25. Any mapping from Voynich glyphs to any natural language that preserves linguistic content will comfortably exceed 0.1. A mapping that does not exceed 0.1 is not a decoding; it is a relabeling.

    This is a stronger falsification criterion than the field has previously operated under, where "a plausible translation of a few words" has been treated as evidence. A translation must produce the statistical signature of language at the corpus scale. If it cannot, it is not a translation.

    Other legitimate paths to refutation

    We do not accept, as a refutation, any of the following: a translation of one word; a translation of one paragraph; a "plausible" assignment of glyphs to Latin letters; a cipher scheme that has not been evaluated against Cross-Line mi; an appeal to authority or to prior unsolved status.


    VII. Reproducibility

    The full analysis uses only public data and standard numerical libraries. There are no proprietary components, no trained models, and no unavailable corpora.

    A motivated reader with a graduate-student level of Python can reproduce the full analysis in an afternoon. We consider this a feature, not a limitation. A proof that requires expensive infrastructure to evaluate is not a proof a field can audit.


    VIII. Priority & Publication History

    This result was first publicly presented on the r/History_Mysteries community of Reddit on 15 January 2026 under the title "A falsifiable result suggesting the Voynich Manuscript is procedurally generated." The original thread remains live and is archived. It has received approximately 50,000 reads to date. No refutation meeting the criteria in §vi has been offered.

    The Beinecke Rare Book & Manuscript Library at Yale University, which holds the original manuscript (ms 408), was informed of the finding and advised submission to public peer review and academic discussion, which this page, together with the original Reddit thread and forthcoming preprint, constitutes.

    Suggested citation
    Dickens, A. (2026). The Voynich Manuscript Is Not a Language:
    A Falsifiable Statistical Proof. First published 15 Jan 2026.
    Retrieved from https://solvedvoynich.com/

    Prior public record

    We note this priority record here not to claim credit but to establish the public, timestamped, falsifiable character of the result. If the result is wrong, it is wrong publicly. If it is right, it was articulated publicly in January 2026.


    For over a century, the Voynich Manuscript has resisted interpretation. We did not solve it. We ran a test that every natural language passes, and found that the manuscript fails it completely. The manuscript has structure. It does not have memory. It has grammar. It does not have meaning that spans lines. If you disagree, §vi tells you how to prove us wrong. Please try.