Convolutional Neural Network

TL;DR

Neural network architecture tailored for grid-like inputs (e.g., images or videos).
Uses convolutional layers with local receptive fields to learn spatial hierarchies and detect patterns.
Commonly applied to tasks such as image recognition and object detection.

Definition

A convolutional neural network (CNN) is a type of artificial neural network that is commonly used in computer vision tasks, such as image and video recognition. CNNs are designed to process data with a grid-like structure (for example, an image) and use convolutional layers to automatically learn the spatial hierarchies present in that data.

Explanation

Convolutional layers are the defining feature of CNNs. In a convolutional layer, each neuron connects to a small region of the input—called a receptive field—so neurons learn to detect local patterns such as edges, corners, or textures. By stacking convolutional layers, CNNs build higher-level representations from lower-level features, enabling recognition of complex structures in visual data.

Examples

Image recognition

CNNs are frequently used for image recognition tasks, for example classifying an image as containing a dog, cat, or car. The input is an image and the output is a prediction of what is contained in the image. The convolutional layers learn to detect various object features, such as the shape of a dog’s ears or the texture of a car’s tire.

Object detection

CNNs are also used for object detection, which involves recognizing objects in an image and localizing them with bounding boxes. In this case the output includes both predictions of the objects present and bounding boxes indicating each object’s location. The convolutional layers learn to detect objects and their locations within the image.

Use cases

Computer vision tasks, including image and video recognition.

Convolutional layer
Receptive field
Spatial hierarchies
Image recognition
Object detection
Computer vision