What is a decision tree in machine learning?

A decision tree is a machine learning model that makes predictions through a series of yes/no questions about input features. At each node, it finds the question that best separates the classes. The result is interpretable: you can trace any prediction back to the rules that produced it. Decision trees tend to overfit, which is why random forests (many trees voting together) are more commonly used in practice.

What is a support vector machine (SVM)?

An SVM finds the decision boundary with the largest margin - the biggest gap between the boundary and the nearest training points of each class. A larger margin generalises better to new data. When classes aren't linearly separable, SVMs use the kernel trick to implicitly map data into a higher-dimensional space where a linear boundary works. SVMs perform well on small to medium structured datasets.

Which machine learning algorithm should beginners learn first?

Start with logistic regression or decision trees. Logistic regression teaches loss functions, gradient descent, and probability - concepts that transfer to every other model. Decision trees teach overfitting, feature importance, and the bias-variance trade-off intuitively. Then learn random forests as an extension. Resist jumping to neural networks first - you'll spend weeks debugging things you don't understand.

Decision Trees, SVMs and More: Key Machine Learning Algorithms for Beginners

Q: What is a random forest?

A random forest is an ensemble of hundreds of decision trees, each trained on a random subset of the data and features. Predictions are made by majority vote. Because the trees are diverse (each overfits in a different way), their errors average out and the ensemble is more accurate and robust than any individual tree. Random forests are a strong default for most structured data problems.

There are many machine learning algorithms. This lesson covers the ones you'll encounter most often: decision trees, random forests, SVMs, KNN, and Naive Bayes. Each has a different strength, a different weakness, and a different situation where it's the right tool.

Key machine learning algorithms - decision trees, SVMs, KNN and more

John Bowman AI Strategist & Developer

Unit 55 April 20269 min read

menu_book In this lesson expand_more

Decision Trees
Random Forests
Support Vector Machines
KNN and Naive Bayes
When to Use Which
Lesson Quiz

Listen to this lesson

0:00

Decision Trees: How They Work and Why They're Intuitive

Decision trees are probably the most intuitive machine learning model. A human can look at a trained decision tree and understand exactly what it's doing.

The idea: you ask a series of yes/no questions about the data, and each answer narrows down the prediction. Is the email long? If yes, branch A. If no, branch B. In branch A, does it contain certain words? If yes, probably spam. This is how you'd manually build a flowchart. The scikit-learn user guide has good practical examples of all these algorithms. A decision tree is that flowchart, built automatically from data.

Mathematically, the tree is built by recursively splitting the data. At each node, the algorithm finds the single question that splits data into two groups as homogeneously as possible - the same kind of process that underpins how AI learns - where "homogeneous" means all items in each group belong to the same class. The algorithm measures this with Gini impurity or entropy, and keeps splitting until groups are pure or a stopping condition is met.

Why are trees intuitive? You can trace any prediction back to the rule that produced it: "This email is spam because it's long and contains the word 'prize'." You can argue with the tree. You can understand why it learned that rule. This is in stark contrast to neural networks, which produce a prediction with no auditable explanation.

The downside: single trees tend to overfit. They'll learn very specific rules that fit the training data but don't generalise. That's where random forests come in.

Random Forests: An Extension of Decision Trees

A random forest is hundreds of decision trees trained slightly differently. You train each tree on a random subset of the data and a random subset of the features. Then you predict by having all trees vote. If 300 trees say "spam" and 200 say "not spam," predict spam.

Why does voting help? Because the trees are diverse. Each overfits in a different way. The errors average out. The ensemble is more robust than any individual tree - and usually more accurate than a single tree by a significant margin.

Random forests are widely used and a strong default for most structured data problems. They're less interpretable than a single tree (you can't point to "the rule"), but far more accurate. Wikipedia's random forest article covers the statistical foundations in more depth. The trade-off: single trees for interpretability, forests for accuracy.

Support Vector Machines: The Core Idea

SVMs try to find the best decision boundary between two classes. Imagine two groups of points in 2D space. SVM finds the line that separates them, but not just any line - the one with the largest margin, the biggest gap between the boundary and the nearest points of each class.

A larger margin means better generalisation. A line that barely separates training data is fragile - new points will likely land on the wrong side. A line with a wide gap is more robust.

When data isn't linearly separable, SVMs use the kernel trick: they transform data into a higher-dimensional space where a linear boundary does work, without explicitly doing the transformation. Mathematically elegant, a bit abstract in practice. For practical evaluation of how well any of these perform, see the model evaluation lesson.

SVMs work well on small to medium structured datasets. Training gets slow for large datasets. They also require feature scaling - if features aren't normalised to similar ranges, the algorithm gets confused about what matters.

KNN and Naive Bayes

K-nearest neighbours (KNN) is simple: to predict a new point, find the k nearest points in training data and let them vote. To predict whether an email is spam, find the 5 most similar emails in training data and check how many are spam.

Pros: easy to understand, often works. Cons: slow at prediction time (compare to every training example), struggles with high-dimensional data (in high dimensions, all points are roughly equidistant from each other, so "nearest" loses meaning), and doesn't learn anything - it's just storing training data.

Naive Bayes uses probability. Given an email's features, what's the probability it's spam? It uses Bayes' theorem with one simplifying assumption: features are independent of each other. That's almost always false (spam words tend to co-occur), hence "naive." Despite the bad assumption, it works surprisingly well and is very fast.

Naive Bayes is a strong baseline for text classification. If your fancy model barely beats Naive Bayes, the extra complexity probably isn't worth it.

When to Use Which Algorithm

Decision trees: when interpretability matters - you need to explain decisions to humans or regulators. Also useful when exploring data to understand which features matter.

Random forests: the default for most structured data problems. Fast, robust, accurate, and needs little tuning.

SVMs: clean, structured data where performance matters more than interpretability. Less common in industry now because random forests are easier and deep learning gets the headlines, but still mathematically sound and worth knowing.

KNN: as a quick baseline or when the problem is simple enough that memorisation works. Rarely your main model.

Naive Bayes: text and document classification as a fast, strong baseline.

Logistic regression: simple, interpretable, fast, and surprisingly often good enough.

Neural networks: when you have lots of data and compute, and simpler models aren't working - or when dealing with images, audio, and sequences where deep architectures genuinely shine.

Start with logistic regression or decision trees. They teach the fundamentals. Random forests are an immediate practical extension. The temptation is to jump to neural networks because they're famous. Resist it. Once you understand how simpler models work and what can go wrong, neural networks make sense. Before that, they're black boxes and you'll waste time debugging things you don't understand.

Check your understanding

2 questions - select an answer then check it

Question 1 of 2

Why does a random forest typically outperform a single decision tree?

Question 2 of 2

What is the "kernel trick" in SVMs?

Deep Dive Podcast

Podcast Version

Created with Google NotebookLM · AI-generated audio overview

0:00 0:00