Machine LearningIntermediate

Markov Chain Text Generator in Python

Build a Markov chain text generator in Python from scratch. Train on any text, generate new sentences in the same style — no ML libraries needed.

Try it yourself

Run this code directly in your browser. Click "Open in full editor" to experiment further.

Loading...

Click Run to see output

Or press Ctrl + Enter

How it works

Markov chains are one of those ideas that look like magic the first time you see them. You feed in a few paragraphs of Dickens, run a tiny algorithm, and out comes new text that sounds like Dickens — even though the program has no concept of grammar, meaning, or what a sentence even is. They were the backbone of generative text for decades before neural networks showed up.

What A Markov Chain Actually Is

A Markov chain is a system that hops from state to state, where the next state depends only on the current state — not on the full history of how it got there. For text generation, the "state" is the last few words, and the "hop" is choosing the next word.

The core trick is just bookkeeping:

1. Slide a window of size N through your training text.

2. For every N-word window, record which word came right after it.

3. To generate, pick a starting window, randomly choose one of the words that has historically followed it, slide the window forward by one, and repeat.

That's the whole algorithm. The intelligence is entirely in the statistics of the training text.

The N Parameter — Chaos vs. Coherence vs. Plagiarism

N (sometimes called the "order" of the chain) is the single most important knob:

  • N=1 — only the current word predicts the next. Output is wildly random and grammatically broken. Words sort of fit together pairwise but nothing makes sense beyond two-word spans.
  • N=2 — a two-word context. The sweet spot. Output reads like a real sentence, often surreal, occasionally poetic. This is what most fun Markov demos use.
  • N=3 or higher — three+ words of context. Output starts to sound coherent, but the bigger N gets, the more the generator just regurgitates chunks of the training text verbatim. At N=5, you're basically copying.
  • The trade-off is fundamental: more context means more realistic-sounding output but less originality. If your training text only contains a few hundred unique three-word sequences, the chain has very little choice at each step and ends up just retracing the original.

    Why It Works At All

    The model has no understanding — but the training text does. Real text has strong local patterns:

  • After "the", a noun is much more likely than another "the".
  • After "It was the", the word "best" or "worst" or "age" is way more probable than "banana".
  • After "I am", verbs ending in "-ing" are common.
  • A Markov chain captures all of these patterns implicitly by just counting what follows what. Grammar emerges from the statistics for free.

    This is, in a real sense, the same insight that powers modern language models. GPT and Claude are also predicting the next token based on context — they just use a much richer notion of "context" (thousands of tokens, weighted attention) and a much smarter notion of "prediction" (a deep neural network instead of a lookup table).

    The Code In Plain English

    The defaultdict(list) is the whole engine. Its keys are tuples like ('it', 'was') and its values are lists of every word that ever followed that pair in the training text:

    chain[('it', 'was')] = ['the', 'the', 'the', 'so', 'clearer']

    When generating, random.choice(chain[state]) picks one of those words at random — so the more often a word followed the state, the more likely it is to be picked. This naturally weights common transitions heavily without us having to compute any probabilities explicitly.

    The state = tuple(output[-n:]) line is what makes it a chain — after each step, the window slides forward and we look up the new state.

    Things That Will Surprise You

  • The output sometimes makes complete sense for a few sentences before falling apart. This is the signature look of Markov text.
  • Capitalization and punctuation matter. If you don't lowercase, 'The' and 'the' become different states with different statistics — splitting your data.
  • Dead ends happen. If the chain reaches a state that never appeared as a window in the training text (which can happen with seed phrases), there's nothing to choose from. The snippet handles this by stopping generation.
  • More text helps a lot. A few paragraphs gives you something fun. A whole novel gives you something genuinely interesting. Try pasting in song lyrics, a wikipedia article, or your favorite blog and see what comes out.
  • Practical Things You Can Build

  • "In the style of" generators — train on Shakespeare, get pseudo-Shakespeare. Train on Trump tweets, get pseudo-tweets. This was a viral genre for years.
  • Procedural fantasy names — train a character-level (not word-level) Markov chain on a list of names from a culture. Generate convincing new ones.
  • Test data — generate plausible-looking but meaningless text for filling out UI mockups or stress-testing text-handling code.
  • Music and chord progressions — same algorithm, the "words" are notes or chords.
  • Random level layouts in games — the "words" are tiles or rooms.
  • How To Switch From Word-Level To Character-Level

    The code is identical — just split the corpus into characters instead of words:

    words = list(corpus)            # instead of corpus.split()

    Use N around 5-8 for character-level chains. The output looks like fake words from a fictional language and works beautifully for name generation.

    Where Markov Chains Hit Their Ceiling

  • No long-range coherence. A chain literally cannot remember anything more than N words back. It will never produce text that has a topic or argument that spans more than a few sentences.
  • No semantics. It can't avoid contradicting itself, because it doesn't know what "contradicting" means.
  • Sparsity at high N. As N grows, most possible windows never appear in training, so the model has nothing to say.
  • This is exactly the wall that neural language models broke through — they share the basic premise (predict the next token from context) but use learned representations instead of literal lookup, so they can generalize to contexts they've never seen.

    Run the snippet above and you'll watch the algorithm produce something that genuinely sounds Dickens-adjacent — words flow, phrases connect, and you can clearly see where N=1 falls into chaos and N=3 starts plagiarizing the original.

    Related examples