Mojenas Test

TL;DR

Evaluates how well a clustering algorithm reproduces a known set of clusters.
Comparison uses a similarity measure between predicted and true clusters, most commonly the Rand index.
A higher Rand index (closer to 1) indicates better agreement between predicted and true clusters.

Definition

Mojena’s test is a statistical method used to evaluate the performance of a clustering algorithm by comparing the clusters produced by the algorithm with a known, “true” set of clusters. The test is named after Antonio Mojena, who proposed it in a paper published in 1952.

Explanation

The procedure requires a determination of the true clusters for the dataset, typically obtained from external criteria such as human-assigned labels or categories from another system. Once true clusters are available, a similarity measure is computed between the true clusters and the clusters produced by the algorithm.

The most commonly used similarity measure in this context is the Rand index, which measures the proportion of pairs of data points that are either placed in the same cluster in both the true and predicted partitions or placed in different clusters in both. The Rand index is computed as:

RI = \frac{a + d}{a + b + c + d}

where:

a = the number of pairs of points that are in the same cluster in both the true and the predicted clusters
b = the number of pairs of points that are in different clusters in the true clusters but in the same cluster in the predicted clusters
c = the number of pairs of points that are in the same cluster in the true clusters but in different clusters in the predicted clusters
d = the number of pairs of points that are in different clusters in both the true and the predicted clusters

The Rand index ranges from 0 to 1, with a value of 1 indicating perfect agreement between the true and predicted clusters and a value of 0 indicating no agreement.

Examples

Example 1

A dataset of images of animals is clustered automatically by an algorithm into groups based on the type of animal (e.g., dogs, cats, birds). Mojena’s test compares these algorithm-produced clusters with the true clusters determined by labels assigned by a human annotator.

Example 2

A dataset of customer transactions from a retail store is clustered automatically by an algorithm into groups based on type of product purchased (e.g., clothing, electronics, home goods). Mojena’s test compares the algorithm-produced clusters with the true clusters determined by product categories assigned by the store’s inventory system.

Notes or pitfalls

The Rand index is only one possible measure of cluster similarity; other measures may be more appropriate for certain types of data and clustering algorithms.
Interpretation: a high Rand index generally indicates good clustering performance, while a low Rand index indicates poor performance.

Rand index
Antonio Mojena (proposed the test in 1952)