Unsupervised Learning

TL;DR

Works with unlabeled data to automatically discover patterns and relationships.
Two primary approaches: clustering (grouping similar data points) and dimensionality reduction (reducing feature count while retaining information).
Commonly used for customer segmentation and image recognition; it requires less human supervision but can be harder to interpret and is often combined with supervised methods.

Definition

Unsupervised learning is a type of machine learning where the algorithms are not given any labeled data or specific instructions on what to learn. Instead, the algorithms are provided a large dataset and must discover patterns and relationships within the data on their own.

Explanation

Unsupervised learning methods operate without explicit labels or target outputs. The source describes two main categories:

Clustering: algorithms group similar data points together based on shared characteristics, without predefined labels.
Dimensionality reduction: algorithms reduce the number of features in a dataset while attempting to preserve as much of the original information as possible.

The approach is useful for discovering structure in large, complex datasets. It typically requires less human intervention than supervised learning but can be more challenging to interpret and apply in practice. For this reason, unsupervised learning is often used alongside supervised techniques.

Examples

Clustering — customer data

Given a dataset of customer attributes such as age, income, and location, a clustering algorithm can group customers into clusters based on shared characteristics. For example, one cluster could consist of young, low-income customers living in urban areas, while another could consist of older, high-income customers living in rural areas. The algorithm determines these clusters from similarities and differences among the data points without explicit labels.

Clustering — image recognition

With a large dataset of animal images, a clustering algorithm could group images by visual characteristics (color, shape, size). The algorithm might form one cluster for cats, another for dogs, and so on, based solely on the images’ similarities.

Dimensionality reduction — customer data

Given customer features such as age, income, location, education level, and occupation, a dimensionality reduction algorithm can identify the most important features and eliminate less important ones to simplify analysis or reduce complexity.

Dimensionality reduction — image recognition

For images with a large number of pixels, a dimensionality reduction algorithm can identify the most important pixels and remove less important ones without significantly altering the images’ overall appearance, which can reduce dataset size or simplify processing.

Use cases

Customer segmentation
Image recognition
Dimensionality reduction

Notes or pitfalls

Unsupervised learning typically requires less human intervention than supervised learning.
Results can be more challenging to interpret and to apply in practical settings.
It is often used in conjunction with supervised learning techniques to achieve better results.

Clustering
Dimensionality reduction
Supervised learning