Histogram of Oriented Gradients

Introduction

The Histogram of Oriented Gradients (HOG) is a feature descriptor used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image. This method is similar to that of edge orientation histograms, scale-invariant feature transform descriptors, and shape contexts, but differs in that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy.

Overview

HOG is a type of "feature descriptor". The primary function of a feature descriptor is to extract useful information from an image, to aid in the task of object detection. The HOG feature descriptor focuses on shape information within an image, ignoring color and texture information. It achieves this by capturing the intensity and direction of gradients (edges) within the image. These gradients are then compiled into a histogram, with the peaks of the histogram indicating the predominant edge directions within the image.

An image of an object with the Histogram of Oriented Gradients applied, showing the gradients and their orientations.

Methodology

The process of computing a HOG descriptor involves several steps, each of which contributes to the final descriptor's ability to effectively capture shape information within an image.

Normalization

The first step in the HOG computation process is normalization. Normalization is a common step in many feature descriptors and is used to reduce the effects of lighting variations on the image. In the case of HOG, the image is typically normalized to a standard size (such as 64x128 pixels) to ensure consistency in the computation of the descriptor.

Gradient Computation

The next step in the HOG computation process is gradient computation. The gradient of an image is a two-dimensional vector that points in the direction of most rapid change in intensity. The gradient is computed for every pixel in the image, resulting in a gradient image that captures the edge information in the original image.

Orientation Binning

After the gradient image has been computed, the next step is orientation binning. In this step, the gradient image is divided into small spatial regions, called "cells". For each cell, a histogram of gradient directions is computed. The histogram has a bin for each orientation range, and each pixel in the cell votes for the bin that corresponds to its gradient direction. The vote is weighted by the gradient magnitude, so that strong edges contribute more to the histogram than weak edges.

Descriptor Blocks

The final step in the HOG computation process is the creation of descriptor blocks. A descriptor block is a larger spatial region that contains several cells. For each descriptor block, the histograms of the cells within the block are concatenated into a single, high-dimensional vector. This vector is then normalized, which improves the descriptor's performance by reducing the effects of illumination changes.

Applications

The Histogram of Oriented Gradients descriptor has been successfully used in a number of computer vision applications. One of the most notable applications is in the field of pedestrian detection, where HOG was first introduced. In this application, HOG descriptors are computed for a sliding window that moves across an image. The resulting descriptors are then fed into a classifier, such as a Support Vector Machine (SVM), which determines whether or not the window contains a pedestrian.

HOG has also been used in other object detection tasks, such as car detection, face detection, and human detection in video surveillance. In addition, HOG descriptors can be used in image recognition tasks, where the goal is to identify the object in an image rather than just detect its presence.

Advantages and Limitations

The Histogram of Oriented Gradients has several advantages over other feature descriptors. One of the main advantages is its robustness to changes in illumination and shadowing. This is due to the fact that HOG captures shape information, which is less affected by these factors than color or texture information.

However, HOG also has some limitations. One of the main limitations is its sensitivity to changes in object pose. Since HOG captures shape information, it can fail to recognize an object if the object's shape changes significantly due to a change in pose. Another limitation is the high dimensionality of the HOG descriptor, which can make it computationally expensive to use in some applications.