Skip to content

Linear Regression

  • Models how a dependent variable changes with one or more independent variables using a straight-line relationship.
  • Fits the line that best captures that relationship from data, then uses it to make predictions.
  • Assumes the relationship is linear; alternatives exist when it is not.

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables; it is used to predict the value of a dependent variable based on the values of one or more independent variables.

Linear regression uses collected data on the variables of interest to fit a line that best describes the relationship between the dependent and independent variables. Once the model is fitted, it can produce predictions by plugging new independent-variable values into the model. The method assumes the relationship between dependent and independent variables is linear, meaning changes in the dependent variable are directly proportional to changes in the independent variable(s).

If we want to predict the price of a house based on its size, the dependent variable is the price and the independent variable is the size. Linear regression can model the relationship between these variables and predict the price of a house given its size. For example, after fitting a model, one could predict the price of a house with a size of 2,000 square feet by inputting that value into the fitted model.

To predict a student’s grade point average (GPA) from the number of hours they spend studying, the dependent variable is the GPA and the independent variable is the number of study hours. Linear regression can model this relationship and predict GPA from study time.

  • Modeling and predicting the value of a dependent variable from one or more independent variables when the relationship is assumed to be linear.
  • Linear regression assumes a linear relationship between dependent and independent variables; this assumption may not hold in all situations.
  • When the relationship is non-linear, methods such as polynomial regression or non-parametric regression may be more appropriate.
  • Polynomial regression
  • Non-parametric regression