How AI Learns: Supervised, Unsupervised and Reinforcement Learning

AI doesn't learn the way you learned to ride a bike. It learns by finding patterns in data, and there are three distinct ways to set up that learning. Getting these straight changes everything about how you think about what AI can and can't do.

How AI Learns - the three learning paradigms
John Bowman
John Bowman Owner / AI Developer
Unit 1 5 April 2026 9 min read
menu_book In this lesson expand_more
  1. Supervised learning
  2. Unsupervised learning
  3. Reinforcement learning
  4. Which is hardest to understand?

Listen to this lesson
0:00
0:00

Supervised Learning: Learn From Examples You Label

Supervised learning is learning from a teacher who tells you the right answer. The system gets examples with labels, finds patterns, and applies those patterns to new unlabelled examples.

This is the most common type of machine learning in real businesses because it's conceptually straightforward. You have data. You label what you want the system to learn. You train it. Then you test it on new data to see how well it does.

A hospital trains a system on thousands of X-rays where radiologists have confirmed whether cancer is present. The system learns to recognise pixel patterns associated with cancer. When a new X-ray arrives, it makes a prediction. This also happens in email spam filtering, house price prediction, and most other everyday AI applications.

The catch: this only works if your labels are correct and if the patterns in your training data match the real world the system will encounter. If your spam filter was trained on 2015 emails, spam that has evolved since will get through. If your medical model was trained mostly on one demographic, it will fail on others. Bad labels poison the whole thing — if 30% of your training labels were wrong, the system learns corrupted patterns.

Unsupervised Learning: Find Patterns Without Being Told What To Look For

Unsupervised learning is harder conceptually. You give the system data with no labels and it figures out what patterns exist on its own.

This sounds powerful but it's actually limited. The system can find clusters and groupings, and it can compress data. What it can't do is tell you whether what it found is meaningful — that's still your job.

A retail store uses it for customer segmentation: thousands of transactions, no pre-defined categories. The algorithm discovers that some customers buy cheap bulk items, others buy expensive specialties, others are inconsistent. The system found those groups without being told to look for them. Recommendation systems work similarly — by finding which items have similar viewing or purchase patterns without explicit labelling.

The downside is evaluation. With supervised learning you count how many predictions were right. With unsupervised learning, judging whether the system found something useful is subjective. You have to assess that yourself. If you have labelled data available, supervised learning usually wins.

Reinforcement Learning: Learn Through Trial, Error, and Reward

Reinforcement learning is learning by interacting with an environment and getting feedback on whether your actions were good or bad. This is how AlphaGo learned to beat humans at Go, how robots learn to walk, and how recommendation systems optimise for what actually gets watched.

You set up a system with actions it can take and a reward signal — a number that says "good" or "bad." The system tries millions of actions, learns which ones lead to rewards, and optimises for maximising future rewards.

YouTube's recommendation system doesn't just show videos similar to what you watched — its actual goal is "show videos that you'll watch for a long time and come back for more." That's a reward signal. The system constantly tests small changes, keeping the ones that increase watch time.

The catch is that the reward signal has to be designed carefully. If it incentivises the wrong thing, the system optimises for the wrong thing brilliantly. A content system that optimises purely for engagement time might recommend increasingly extreme content, because extreme content is engaging. A robot optimising for forward progress might ignore obstacles. Get the reward wrong and you get exactly what you asked for — which isn't what you wanted.

Reinforcement learning also needs millions of trials. That's why game-playing AI works well — you can simulate millions of rounds quickly. Real-world applications like robot learning are harder because physical experiments are slow and expensive. See the next unit for how these learning approaches connect to neural networks.

Which Is Actually Hardest To Understand?

Reinforcement learning. It requires thinking in three dimensions: the current state, the action, and the delayed consequence. Most people's brains don't naturally do this kind of multi-step reasoning. Supervised learning is intuitive — show examples, system learns. Unsupervised is odd but at least simple — find patterns.

Reinforcement learning requires grasping that small decisions now compound into outcomes later, and the system has to learn this relationship across millions of trials. That's why these systems take so long to train and why the reward signal is so critical to get right.

Once you understand reinforcement learning, though, you understand how most real AI systems actually work. They're not optimising for truth or accuracy. They're optimising for a metric someone defined. That changes how you read news about AI behaving in strange ways.

Check your understanding

2 questions — select an answer then check it

Question 1 of 2

A hospital trains an AI system on thousands of X-rays, each labelled by a radiologist as showing cancer or not. Which type of learning is this?

Question 2 of 2

What is the main risk when a reinforcement learning reward signal is poorly designed?

Deep Dive Podcast

How AI Learns

Created with Google NotebookLM · AI-generated audio overview

0:00 0:00
Frequently Asked Questions

What are the three main types of machine learning?

The three main paradigms are supervised learning (learning from labelled examples), unsupervised learning (finding patterns in unlabelled data), and reinforcement learning (learning through trial, error and reward signals). Most practical business AI uses supervised learning because it's the most predictable and easiest to evaluate.

What does supervised learning require?

Supervised learning requires labelled training data — examples where the correct answer is already known. A spam filter needs emails labelled as spam or not spam. A medical diagnostic system needs X-rays labelled with confirmed diagnoses. The quality of labels directly affects the quality of what the system learns.

Why is reinforcement learning harder to train than supervised learning?

Reinforcement learning requires millions of trials and a carefully designed reward signal. It can't be easily run on expensive real-world interactions, which is why it tends to work well in simulated environments like games. It also requires reasoning about the relationship between immediate actions and delayed consequences, which is more complex than matching inputs to labels.

What happens when training data doesn't match the real world?

The model will perform poorly or fail in predictable ways. If patterns in training data don't reflect deployment conditions — due to demographic differences, changes over time, or data quality problems — predictions degrade. This is called distribution shift, and it's one of the most common reasons AI systems fail after deployment.

How It Works

In supervised learning, a model receives paired inputs and outputs — for example, images and their correct labels. During training, it makes predictions and compares them to the correct answers. The difference (the error) is used to update the model's internal parameters so the next prediction is slightly better. This process repeats across thousands or millions of examples until the error rate is low enough to be useful.

In unsupervised learning, the model has only inputs and no labels. Clustering algorithms like k-means group similar examples together based on distance in feature space. Dimensionality reduction methods like PCA find the directions in which data varies most, allowing it to be compressed without losing the most important patterns.

Reinforcement learning works differently again. An agent interacts with an environment, takes actions, receives rewards or penalties, and updates its strategy to maximise cumulative reward over time. The key challenge is credit assignment — figuring out which past actions caused the reward received now. Methods like Q-learning and policy gradient solve this by tracking action-value estimates across many trials.

Key Points
  • Supervised learning trains on labelled examples and is the most common approach in business AI — it's predictable and easy to evaluate
  • Label quality is critical in supervised learning — incorrect or biased labels directly corrupt what the model learns
  • Unsupervised learning finds patterns without labels but requires humans to judge whether what it found is meaningful
  • Reinforcement learning optimises for a reward signal through trial and error — a poorly designed signal leads to a system that optimises for the wrong thing
  • All three approaches fail when the world they're deployed in doesn't match the world they were trained on
  • Reinforcement learning requires millions of trials — which is why it works well in simulatable environments like games but is harder in the physical world
  • Most AI failures in production trace back to one of these root causes: bad labels, wrong reward signal, or distribution shift
Sources