Skip to content

Item Response Theory

  • IRT models the relationship between an individual’s latent trait (e.g., ability or attitude) and their item responses to produce more precise measurement scales.
  • It addresses limitations of classical test theory by estimating ability from item responses rather than relying on total scores or fixed cut-offs.
  • Common IRT models include the one-parameter logistic model (1PL) and the two-parameter logistic model (2PL), which account for item difficulty and, in the 2PL, item discrimination.

Item response theory (IRT) is a mathematical and statistical model used to measure individuals’ abilities, attitudes, or other psychological characteristics by modeling the relationship between individuals’ abilities and their responses to test items.

IRT is commonly applied in educational and psychological assessments, such as standardized tests, to evaluate how well an individual has mastered a particular concept or skill. Unlike classical test theory (CTT), which often relies on total scores and arbitrary cut-off points, IRT models account for differences in item characteristics and individual proficiency. This enables the development of more precise and accurate measurement scales and allows estimation of an individual’s ability level from their pattern of item responses.

Different IRT models vary in the item parameters they include. The one-parameter logistic model (1PL) models a respondent’s probability of a correct response as a function of their proficiency and the item’s difficulty. The two-parameter logistic model (2PL) extends this by also including an item discrimination parameter (desirability of the answer choices), allowing the model to account for how strongly an item differentiates between individuals of different proficiency.

The 1PL model is used to measure an individual’s proficiency on a binary response item (i.e., a question that can only be answered with a “yes” or “no”). For example, on a multiple-choice test that includes a question with four possible answer choices, the 1PL model would estimate the probability that an individual with a certain proficiency level would choose the correct answer, given the difficulty of the question and the other answer choices.

The 2PL model is used to measure an individual’s proficiency on a multiple-choice item by modeling probability as a function of proficiency level, item difficulty, and the desirability of the answer choices. For example, on a multiple-choice test with a question that has four possible answer choices, the 2PL model would estimate the probability that an individual with a certain proficiency level would choose the correct answer, given the difficulty of the question and the desirability of the other answer choices.

  • Educational assessments and standardized tests to evaluate mastery of concepts or skills.
  • Psychological assessments to measure abilities, attitudes, or other psychological characteristics.
  • Classical test theory (CTT) has limitations that IRT aims to address, including an inability to account for individual differences in test-taking ability and reliance on arbitrary cut-off scores to determine proficiency.
  • Classical test theory (CTT)
  • One-parameter logistic model (1PL)
  • Two-parameter logistic model (2PL)