Python Essentials for AI: Variables, Data Structures and Control Flow
Everyone says Python is beginner-friendly. That's true but incomplete. Here's what you actually need to know about Python before you touch any machine learning library - and what you can safely skip.
menu_book In this lesson expand_more
Python didn't become the dominant AI language just because it's easy to learn. There are real structural reasons. Understanding them helps you understand why the Python you learn here transfers directly to ML work - and what you can skip.
Why Python Became the Language of AI
The math libraries came early. NumPy and SciPy were built in the 2000s when AI research was getting serious. They made numerical computing efficient in Python - which shouldn't be possible, because Python is slow. But these libraries run C under the hood, so they're actually fast. Once those existed, everything else could build on top.
Then academic researchers adopted Python. Machine learning papers came with code in Python. Everyone who wanted to replicate results learned Python. Momentum is everything, and Python had it.
Honest take: you don't strictly need Python to do AI. You can do it in R, Julia, or JavaScript. The tools are better in Python because everyone uses Python. If everyone switched to R tomorrow, R would become just as good. But if you're starting from nothing, learn Python. The ecosystem is there, the examples are there, and you'll find answers when you're stuck.
Variables and Data Types
A variable is a container for information. You put something in, give it a name, and later ask for it by name.
price = 19.99
name = "Alice"
is_available = True
price holds a number. name holds text. is_available holds a true/false value. Python figures out the type from what you put in - you don't have to declare it.
Why does this matter for AI? Because data comes in types. A feature in your model might be a number (age, income) or text (country, category). A model expects numbers, so if you have text categories, you have to convert them first. Knowing your data types is where that process starts.
Python's core types:
Integers and floats are numbers. Integers are whole (5). Floats have decimals (5.2). In AI work, you'll mostly use floats because real-world measurements are rarely whole numbers.
Strings are text. When you're working with categorical data ("red", "blue", "green"), you're working with strings, and you'll eventually convert them to numbers so a model can use them.
Booleans are true/false. They're useful for filtering: "show me all rows where age > 18" produces a list of true/false values you then use to filter your data.
Lists and Dictionaries: How Data Gets Organised
A list is an ordered collection. You create it with square brackets.
ages = [25, 30, 22, 45]
temperatures = [72.5, 68.3, 75.1]
You access items by their position (starting from 0):
ages[0] # gives you 25
ages[2] # gives you 22
In AI, datasets are often lists. A column in a spreadsheet becomes a list. You loop through it, do something with each item, and build results. Lists are everywhere.
A dictionary is unordered, but items have names. You create it with curly braces.
person = {
"name": "Alice",
"age": 28,
"city": "Portland"
}
You access items by their key:
person["name"] # gives you "Alice"
person["age"] # gives you 28
Real data has structure. A row in a dataset isn't just a list of numbers - it's "this person has age 28, income 50000, and education level 3." Dictionaries let you organise that logically. In pandas, dictionaries become DataFrames. Understanding them now makes DataFrames make sense later.
Loops and Conditionals: The Logic of Programming
A loop repeats code. The simplest is a for loop.
ages = [25, 30, 22, 45]
for age in ages:
print(age)
This prints each age, one per line. You've written code once but applied it to every item. In AI, this is how you process data - loop through a list of records, and for each one you clean it, validate it, or extract features from it.
A conditional makes decisions:
if age > 30:
print("Over 30")
else:
print("30 or under")
Combine loops and conditionals and you can do real work:
ages = [25, 30, 22, 45, 28]
older_people = []
for age in ages:
if age > 30:
older_people.append(age)
Now older_people contains only [45]. You've filtered a list based on a condition. In machine learning, you'll write loops that load training examples, check if they're valid, extract features, and feed them to the model. This is how models learn - one piece of data at a time, in a loop.
A Worked Example: Building a Feature from Raw Data
Say you have customer data: names, ages, and annual spending. You want to create a feature called high_value that's true if someone spent more than £1000.
customers = [
{"name": "Alice", "age": 28, "spending": 1500},
{"name": "Bob", "age": 35, "spending": 800},
{"name": "Charlie", "age": 22, "spending": 2000}
]
for customer in customers:
if customer["spending"] > 1000:
customer["high_value"] = True
else:
customer["high_value"] = False
Now each customer has a high_value field. This is the kind of work you do constantly in ML. Raw data comes in, you create features, and you feed those features to a model.
This example uses everything above: a list of dictionaries, a loop, a conditional, and a comparison. That's the core of data processing.
How Deep Do You Actually Need to Go?
You need to be comfortable with the basics - variables, lists, loops, conditionals. You need to understand what code is doing. But you don't need to be a Python expert.
I've seen people do serious ML work without understanding object-oriented programming, decorators, or metaclasses. Those are advanced Python features you probably won't need. What you will need is the ability to read code, debug simple problems, and write basic scripts that manipulate data.
The mistake beginners make is thinking they need to learn Python perfectly before touching ML. You don't. Learn the basics, then jump into NumPy and pandas. You'll learn the rest as you need it.
Check your understanding
2 questions — select an answer then check it
Question 1 of 2
In a Python dictionary, how do you access a specific piece of data?
Question 2 of 2
Why do AI models need text categories (like "red", "blue", "green") to be converted to numbers before training?
