Local Dependence Function

TL;DR

Use local information from training data to infer the relationship between inputs and outputs.
Implemented by algorithms such as k-nearest neighbors and support vector machines that base decisions on nearby or influential training points.
Can capture nonlinear relationships but may be sensitive to noise, outliers, and overfitting.

Definition

Local dependence functions are used in machine learning to determine the relationship between input variables and their corresponding outputs. They are used to model complex data sets and to make predictions about future outcomes by relying on local information from the training data.

Explanation

Local dependence functions infer output behavior from local regions of the training data rather than from a single global model. They can capture nonlinear relationships between variables by basing predictions on nearby or particularly influential data points. Two examples illustrate different forms of local dependence: one directly uses nearby points for classification, and another identifies a small subset of training points that determine a decision boundary.

Examples

k-nearest neighbors (k-NN)

This algorithm classifies a reference point based on its proximity to other points in the dataset. It selects a reference point, measures the distance to the nearest k points, and determines the classification of the reference point from the majority classification of its nearest neighbors.

Support vector machine (SVM)

An SVM is used for classification and regression tasks by creating a boundary between classes through a hyperplane that maximizes the margin between the classes. That boundary is determined by a small subset of the data points, known as support vectors, which lie closest to the hyperplane.

Use cases

Modeling complex data sets.
Making predictions about future outcomes using local information from training data.

Notes or pitfalls

May perform poorly on datasets with large amounts of noise or outliers.
May be susceptible to overfitting when the model becomes too complex and does not generalize well to new data.

k-nearest neighbors (k-NN)
Support vector machine (SVM)