Nearest neighbor

From Canonica AI

Definition

The nearest neighbor is a term used in data mining and statistics, particularly in the field of machine learning. It refers to a type of instance-based learning or non-generalizing learning where the function is approximated locally and all computation is deferred until function evaluation.

Nearest Neighbor Rule

The nearest neighbor rule is a simple strategy for classifying an object based on its distance to the nearest neighbor. The object is assigned to the class of its nearest neighbor in the feature space. This rule is based on the assumption that the class of an object can be accurately predicted by the class of its neighbors. The nearest neighbor rule is often used in pattern recognition and machine learning algorithms.

K-Nearest Neighbors Algorithm

The k-nearest neighbors algorithm (k-NN) is a type of instance-based learning where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is amongst the simplest of all machine learning algorithms.

An illustration showing the concept of K-Nearest Neighbors algorithm. The new point (green circle) is classified based on the majority of its five nearest neighbors (blue squares).
An illustration showing the concept of K-Nearest Neighbors algorithm. The new point (green circle) is classified based on the majority of its five nearest neighbors (blue squares).

Algorithm

The k-NN algorithm is a type of lazy learning where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm does not make any assumptions on the underlying data distribution, but it relies on feature similarity. The key elements of the k-NN algorithm include:

1. Set of labeled objects, e.g., a set of stored records. 2. Distance or similarity measure to compute distance between objects. 3. The value of k, the number of nearest neighbors to retrieve. 4. A classification decision rule to determine the output value (class) based on the values of the nearest neighbors.

Applications

The k-NN algorithm has been widely used in statistical estimation and pattern recognition in the beginning of the 1970s. It is also used in computer vision, information retrieval, and genomics.

Nearest Neighbor Search

The nearest neighbor search (NNS), also known as proximity search, similarity search or closest point search, is an optimization problem for finding closest points in metric spaces. The problem is: given a set of points P in a metric space M, and a query point q ∈ M, find the closest point in P to q.

Algorithms

There are various algorithms for performing a nearest neighbor search and they can be broadly classified into two categories: exact and approximate algorithms. Exact algorithms are guaranteed to find the exact nearest neighbor, while approximate algorithms may find a neighbor that is close to the nearest one but not necessarily the exact one.

See Also