PCA (Principal Component Analysis) in Python
Learn PCA in Python with scikit-learn. Reduce high-dimensional data to 2D, visualize hidden structure, and understand explained variance — runnable in your browser.
Try it yourself
Run this code directly in your browser. Click "Open in full editor" to experiment further.
Click Run to see output
Or press Ctrl + Enter
How it works
PCA is the Swiss Army knife of dimensionality reduction. When you have 50 columns, or 500, or 50,000, PCA takes that mountain of features and compresses it into a handful of new ones that capture most of what matters. The result: data you can actually visualize, models that train faster, and noise that quietly disappears.
What PCA Actually Does
Imagine your data as a cloud of points floating in N-dimensional space. PCA finds the directions along which the cloud is most stretched out — those are the principal components.
Each component captures less variance than the one before it. By the time you get to PC50 in a 50-dimensional dataset, that component is usually capturing nothing but noise. The whole game is keeping the first few components and throwing the rest away.
Why "Standardize First" Is Non-Negotiable
PCA finds directions of maximum variance, and variance is sensitive to scale. If one feature is in millimeters (range 0–1000) and another is in meters (range 0–1), the millimeter feature will dominate every principal component just because its numbers are bigger.
Always run `StandardScaler` on your features before PCA. It centers them at zero and scales them to unit variance, putting every feature on equal footing.
Reading The Output
After pca.fit(X), two attributes carry the whole story:
explained_variance_ratio_ — array showing what fraction of the total variance each component captures. [0.73, 0.23, 0.04, ...] means PC1 alone explains 73% of the variation in your data.cumsum(explained_variance_ratio_) — running total. Tells you "if I keep the first K components, I keep this much of the original information".A common rule of thumb: keep enough components to retain 95% of the variance. The snippet above shows exactly how to find that number from the cumulative variance curve.
The Killer Use Case: Visualization
Humans can see 2D scatter plots. Sometimes 3D. Never 64D. PCA is how you visualize high-dimensional data:
This is one of the fastest sanity checks for any dataset: if PCA-to-2D shows zero structure, no model is going to find structure either.
PCA As Compression
The inverse_transform method is one of PCA's most underrated tricks. It takes a compressed vector and rebuilds the original-shape data — imperfectly, but recognizably. The image reconstruction demo at the bottom of the snippet shows this in action: at 1 component the digit is a blurry blob, at 4 you can almost guess it, at 8 it's clearly a digit, by 16 it's nearly identical to the original.
This is exactly the principle JPEG compression uses (with discrete cosine transform instead of PCA, but the same idea).
Other Things PCA Quietly Solves
Where PCA Falls Down
When To Reach For PCA
When To Skip It
Run the snippet above and you'll see iris flowers cleanly cluster in 2D after losing two of their four dimensions, watch all ten digit classes find their own corners of a 2D map after being squashed from 64 dimensions, and see a digit get progressively rebuilt from a handful of components.
Related examples
K-Means Clustering in Python
Learn K-Means clustering in Python with scikit-learn. Visualize clusters forming, pick the right K with the elbow method, and run it all in your browser.
Logistic Regression in Python
Learn logistic regression in Python with scikit-learn. Binary classification, decision boundary, probabilities, and ROC curve — all explained and runnable in your browser.
NumPy Array Operations in Python
Learn NumPy basics in Python! A fun and easy guide to super-fast arrays, matrices, and data science math without using slow for-loops.