Linear By Linear Association Test

TL;DR

A statistical test for association between two categorical variables.
Compares observed frequencies to expected frequencies under the assumption of independence.
A significant result suggests an association; a non-significant result suggests no association.

Definition

The linear-by-linear association test is a statistical method used to assess the relationship between two categorical variables. It is a type of chi-squared test that examines whether the observed frequency distribution of the two variables differs significantly from the expected frequency distribution under the assumption of independence.

Explanation

The test compares the observed frequency distribution of two categorical variables to the expected frequency distribution calculated under the null hypothesis of independence. The null hypothesis states there is no association between the variables (the probability distribution of one variable does not depend on the other). The alternative hypothesis states there is an association (the probabilities differ between levels of the variables). If the observed frequencies differ sufficiently from the expected frequencies, the test indicates a statistically significant association.

Examples

Gender and voting behavior

Two variables: gender (male or female) and voting behavior (voted for candidate A or candidate B).
Null hypothesis: no association between gender and voting behavior — the probability of voting for candidate A or B is the same for males and females.
Example observed frequencies described: 60% of males voted for candidate A and 40% voted for candidate B, while 70% of females voted for candidate A and 30% voted for candidate B. The linear-by-linear association test would be used to determine whether these observed frequencies differ significantly from the expected frequencies under independence.

Income level and exercise behavior

Two variables: income level (low, medium, or high) and exercise behavior (regularly exercises or does not regularly exercise).
Null hypothesis: no association between income level and exercise behavior — the probability of regularly exercising is the same across income levels.
Example observed frequencies described: 60% of individuals with low income levels regularly exercise, 70% of individuals with medium income levels regularly exercise, and 80% of individuals with high income levels regularly exercise. The linear-by-linear association test would determine whether these observed frequencies differ significantly from the expected frequencies under independence.

Use cases

Identifying potential relationships between categorical variables.
Providing statistical evidence to support further research and analysis of associations between categories.

Notes or pitfalls

The null hypothesis is that there is no association between the two categorical variables; the alternative hypothesis is that there is an association.
A statistically significant test result suggests an association between the variables; a non-significant result suggests no association.

Chi-squared test
Null hypothesis
Alternative hypothesis
Independence
Observed frequency distribution
Expected frequency distribution