How AI Learns: Supervised, Unsupervised and Reinforcement Learning
AI doesn't learn the way you learned to ride a bike. It learns by finding patterns in data, and there are three distinct ways to set up that learning. Getting these straight changes everything about how you think about what AI can and can't do.
menu_book In this lesson expand_more
Supervised Learning: Learn From Examples You Label
Supervised learning is learning from a teacher who tells you the right answer. The system gets examples with labels, finds patterns, and applies those patterns to new unlabelled examples.
This is the most common type of machine learning in real businesses because it's conceptually straightforward. You have data. You label what you want the system to learn. You train it. Then you test it on new data to see how well it does.
A hospital trains a system on thousands of X-rays where radiologists have confirmed whether cancer is present. The system learns to recognise pixel patterns associated with cancer. When a new X-ray arrives, it makes a prediction. This also happens in email spam filtering, house price prediction, and most other everyday AI applications.
The catch: this only works if your labels are correct and if the patterns in your training data match the real world the system will encounter. If your spam filter was trained on 2015 emails, spam that has evolved since will get through. If your medical model was trained mostly on one demographic, it will fail on others. Bad labels poison the whole thing — if 30% of your training labels were wrong, the system learns corrupted patterns.
Unsupervised Learning: Find Patterns Without Being Told What To Look For
Unsupervised learning is harder conceptually. You give the system data with no labels and it figures out what patterns exist on its own.
This sounds powerful but it's actually limited. The system can find clusters and groupings, and it can compress data. What it can't do is tell you whether what it found is meaningful — that's still your job.
A retail store uses it for customer segmentation: thousands of transactions, no pre-defined categories. The algorithm discovers that some customers buy cheap bulk items, others buy expensive specialties, others are inconsistent. The system found those groups without being told to look for them. Recommendation systems work similarly — by finding which items have similar viewing or purchase patterns without explicit labelling.
The downside is evaluation. With supervised learning you count how many predictions were right. With unsupervised learning, judging whether the system found something useful is subjective. You have to assess that yourself. If you have labelled data available, supervised learning usually wins.
Reinforcement Learning: Learn Through Trial, Error, and Reward
Reinforcement learning is learning by interacting with an environment and getting feedback on whether your actions were good or bad. This is how AlphaGo learned to beat humans at Go, how robots learn to walk, and how recommendation systems optimise for what actually gets watched.
You set up a system with actions it can take and a reward signal — a number that says "good" or "bad." The system tries millions of actions, learns which ones lead to rewards, and optimises for maximising future rewards.
YouTube's recommendation system doesn't just show videos similar to what you watched — its actual goal is "show videos that you'll watch for a long time and come back for more." That's a reward signal. The system constantly tests small changes, keeping the ones that increase watch time.
The catch is that the reward signal has to be designed carefully. If it incentivises the wrong thing, the system optimises for the wrong thing brilliantly. A content system that optimises purely for engagement time might recommend increasingly extreme content, because extreme content is engaging. A robot optimising for forward progress might ignore obstacles. Get the reward wrong and you get exactly what you asked for — which isn't what you wanted.
Reinforcement learning also needs millions of trials. That's why game-playing AI works well — you can simulate millions of rounds quickly. Real-world applications like robot learning are harder because physical experiments are slow and expensive. See the next unit for how these learning approaches connect to neural networks.
Which Is Actually Hardest To Understand?
Reinforcement learning. It requires thinking in three dimensions: the current state, the action, and the delayed consequence. Most people's brains don't naturally do this kind of multi-step reasoning. Supervised learning is intuitive — show examples, system learns. Unsupervised is odd but at least simple — find patterns.
Reinforcement learning requires grasping that small decisions now compound into outcomes later, and the system has to learn this relationship across millions of trials. That's why these systems take so long to train and why the reward signal is so critical to get right.
Once you understand reinforcement learning, though, you understand how most real AI systems actually work. They're not optimising for truth or accuracy. They're optimising for a metric someone defined. That changes how you read news about AI behaving in strange ways.
Check your understanding
2 questions — select an answer then check it
Question 1 of 2
A hospital trains an AI system on thousands of X-rays, each labelled by a radiologist as showing cancer or not. Which type of learning is this?
Question 2 of 2
What is the main risk when a reinforcement learning reward signal is poorly designed?
