Types of Machine Learning

19.1. Types of Machine Learning#

In machine learning, machines use data to identify patterns and rules that represent that data. Data simply means any information you have. When data is organized in a table, each row represents a data point, and each column provides some information about that point. These columns are called the “features” of the data.

Consider the following examples of tasks that machine learning can perform:

  • Predicting house prices based on features like the age of the house, number of bedrooms, and area.

  • Classifying emails as spam or not spam.

What do we notice from these examples? To predict house prices, we first need to show the machine data where the prices are already known for some houses. Similarly, to detect whether an email is spam or not, we first show the machine examples labeled as spam or not spam. In other words, the data we provide to the machine must be labeled with prices in the housing example and with spam/not-spam tags in the email example.

Supervised Machine Learning#

When solving problems that require labeled data and predicting labels for new, unseen data points, we call this supervised machine learning. The goal of a supervised learning model is to accurately predict the labels for data it has never seen before.

Other examples of supervised learning tasks include:

  • Image recognition: a medical app detecting skin cancer from labeled images of moles

  • Recommendation systems: an entertainment platform predicting and suggesting movies you might enjoy

  • Stock price prediction based on financial indicators such as historical price data, trading volume, seasonality patterns etc.

  • Weather forecasting using environmental data such as historical weather data, satellite data, geographical location, etc.

  • News categorization: classifying articles as sports, politics, technology, entertainment, and more

In the above examples, you might have noticed that sometimes labels are numbers, such as housing prices, stock prices, or temperature. Other times, they are categories, cancerous vs. non-cancerous, movies you might like or dislike, or news categories. The first type is called numerical labels, and the second type is called categorical labels.

Numerical data includes any data represented by numbers, such as prices, sizes, ages, or weights. Categorical data refers to data with categories or states, like male/female, animal types (cat, dog, or bird), Zip codes.

These differences in label types lead to two types of supervised machine learning models:

  • Regression models, which predict numerical labels such as housing prices or the weight of an animal

  • Classification models, which predict categorical labels such as spam/not-spam, or categories like sports, politics, technology, and entertainment

Unsupervised Machine Learning#

Sometimes, our data points are not labeled, or our goal is not to predict labels for unseen data. Studying data in this way—without labels—can still reveal patterns and meaningful insights. This is called unsupervised machine learning. For example, consider a dataset of images of different fruits without labeling them. An unsupervised learning algorithm could group similar fruits together without knowing what each group represents. This is called a clustering task, where we make clusters of data points that are similar to each other.

Other examples of unsupervised learning tasks include:

  • Anomaly detection: banks detecting unusual credit card transactions that don’t match your normal spending patterns

  • Association analysis: supermarkets finding patterns in items often bought together (e.g., bread + butter)

  • Image recognition: your phone’s gallery automatically clustering similar faces together before you name them

  • Customer segmentation: e-commerce sites grouping shoppers by browsing or buying behavior without predefined labels

A lot of the time in the real world, supervised and unsupervised learning are used together. For example, an e-commerce platform might first use unsupervised learning to group customers into segments based on their browsing and purchasing behavior (e.g., budget shoppers, frequent buyers, tech enthusiasts). Then, it can apply supervised learning within each segment to predict which specific products each customer is most likely to buy. This combination allows companies to better understand their users (through unsupervised clustering) and make accurate, personalized predictions (through supervised models).