Skip to content

Nadaraya Watson Estimator

  • Estimates a regression function by averaging target values of nearby training points with weights from a kernel.
  • Weights depend on distance from the evaluation point; a bandwidth parameter h controls the smoothing.
  • Useful when the regression form is unknown or the data are noisy.

The Nadaraya–Watson estimator is a nonparametric method for estimating the regression function in a supervised learning problem. It computes the estimate at a point by taking a weighted average of the target variables from training examples, where weights are given by a kernel function that assigns larger weights to examples closer to the point of evaluation.

  • The estimator assigns each training example a weight based on a kernel function. Examples closer to the evaluation point receive higher weights; examples further away receive lower weights.
  • A common kernel choice is the Gaussian kernel, defined as:
k(x)=exp ⁣(x22h2)k(x) = \exp\!\left(-\frac{x^2}{2 * h^2}\right)

where h is a bandwidth parameter controlling kernel width: larger h gives a wider kernel (more smoothing); smaller h gives a narrower kernel (less smoothing).

  • The estimated regression value at a point is the kernel-weighted average of target values for the training examples, normalized by the sum of the weights.

Training set:

xy
12
23
35
47

Estimate the regression function at x = 2.5 using the Gaussian kernel with h = 1.

Weights computed with the kernel:

  • k(x1 - 2.5) = exp(-1.25 / (2 * 1^2)) = 0.7788
  • k(x2 - 2.5) = exp(-0.25 / (2 * 1^2)) = 0.6065
  • k(x3 - 2.5) = exp(0.75 / (2 * 1^2)) = 0.6065
  • k(x4 - 2.5) = exp(1.75 / (2 * 1^2)) = 0.7788

Weighted average estimate:

y^=0.77882+0.60653+0.60655+0.778870.7788+0.6065+0.6065+0.7788=4.4444\hat{y} = \frac{0.7788 \cdot 2 + 0.6065 \cdot 3 + 0.6065 \cdot 5 + 0.7788 \cdot 7}{0.7788 + 0.6065 + 0.6065 + 0.7788} = 4.4444
  • Particularly useful when the data are noisy or when the functional form of the regression function is unknown.
  • Kernel function
  • Gaussian kernel
  • Bandwidth (h)
  • Regression function
  • Supervised learning