Markov Chain Text Generator in Python
Build a Markov chain text generator in Python from scratch. Train on any text, generate new sentences in the same style — no ML libraries needed.
Try it yourself
Run this code directly in your browser. Click "Open in full editor" to experiment further.
Click Run to see output
Or press Ctrl + Enter
How it works
Markov chains are one of those ideas that look like magic the first time you see them. You feed in a few paragraphs of Dickens, run a tiny algorithm, and out comes new text that sounds like Dickens — even though the program has no concept of grammar, meaning, or what a sentence even is. They were the backbone of generative text for decades before neural networks showed up.
What A Markov Chain Actually Is
A Markov chain is a system that hops from state to state, where the next state depends only on the current state — not on the full history of how it got there. For text generation, the "state" is the last few words, and the "hop" is choosing the next word.
The core trick is just bookkeeping:
1. Slide a window of size N through your training text.
2. For every N-word window, record which word came right after it.
3. To generate, pick a starting window, randomly choose one of the words that has historically followed it, slide the window forward by one, and repeat.
That's the whole algorithm. The intelligence is entirely in the statistics of the training text.
The N Parameter — Chaos vs. Coherence vs. Plagiarism
N (sometimes called the "order" of the chain) is the single most important knob:
The trade-off is fundamental: more context means more realistic-sounding output but less originality. If your training text only contains a few hundred unique three-word sequences, the chain has very little choice at each step and ends up just retracing the original.
Why It Works At All
The model has no understanding — but the training text does. Real text has strong local patterns:
A Markov chain captures all of these patterns implicitly by just counting what follows what. Grammar emerges from the statistics for free.
This is, in a real sense, the same insight that powers modern language models. GPT and Claude are also predicting the next token based on context — they just use a much richer notion of "context" (thousands of tokens, weighted attention) and a much smarter notion of "prediction" (a deep neural network instead of a lookup table).
The Code In Plain English
The defaultdict(list) is the whole engine. Its keys are tuples like ('it', 'was') and its values are lists of every word that ever followed that pair in the training text:
chain[('it', 'was')] = ['the', 'the', 'the', 'so', 'clearer']When generating, random.choice(chain[state]) picks one of those words at random — so the more often a word followed the state, the more likely it is to be picked. This naturally weights common transitions heavily without us having to compute any probabilities explicitly.
The state = tuple(output[-n:]) line is what makes it a chain — after each step, the window slides forward and we look up the new state.
Things That Will Surprise You
'The' and 'the' become different states with different statistics — splitting your data.Practical Things You Can Build
How To Switch From Word-Level To Character-Level
The code is identical — just split the corpus into characters instead of words:
words = list(corpus) # instead of corpus.split()Use N around 5-8 for character-level chains. The output looks like fake words from a fictional language and works beautifully for name generation.
Where Markov Chains Hit Their Ceiling
This is exactly the wall that neural language models broke through — they share the basic premise (predict the next token from context) but use learned representations instead of literal lookup, so they can generalize to contexts they've never seen.
Run the snippet above and you'll watch the algorithm produce something that genuinely sounds Dickens-adjacent — words flow, phrases connect, and you can clearly see where N=1 falls into chaos and N=3 starts plagiarizing the original.
Related examples
Build a TF-IDF Chatbot in Python
Build a real chatbot in Python with TF-IDF and cosine similarity. No deep learning required — just scikit-learn, runnable instantly in your browser.
Logistic Regression in Python
Learn logistic regression in Python with scikit-learn. Binary classification, decision boundary, probabilities, and ROC curve — all explained and runnable in your browser.