ResNet

Introduction

ResNet, or Residual Network, is a type of Convolutional Neural Network that was first introduced by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun in their 2015 paper, "Deep Residual Learning for Image Recognition". The network is designed to solve the problem of vanishing gradient that often occurs in deep neural networks. This problem makes it difficult for the network to learn and adjust the parameters of the earlier layers. ResNet solves this issue by introducing a novel architecture that includes "shortcut connections" or "residual connections" which allow the gradient to be directly backpropagated to earlier layers.

Architecture

The architecture of ResNet is a revolutionary aspect that sets it apart from other neural networks. The key difference is the introduction of "shortcut connections" which bypass one or more layers. These connections are also known as "skip connections" as they allow the gradient to "skip" layers during backpropagation, thus alleviating the vanishing gradient problem.

Shortcut Connections

The shortcut connections in ResNet are implemented by taking the output from one layer and adding it to the output of a layer further down the network. This is done before applying the activation function. The result is that the network is able to learn the identity function, which helps in preventing the performance from degrading with the increase in depth.

Bottleneck Design

In order to reduce the computational complexity, ResNet uses a "bottleneck" design in its deeper versions (e.g., ResNet-50, ResNet-101, and ResNet-152). In this design, for each residual function F, 3 layers are stacked: 1x1, 3x3, and 1x1 convolutions. The 1x1 layers are responsible for reducing and then increasing (restoring) dimensions, leaving the 3x3 layer a bottleneck with smaller input/output dimensions.

Variants of ResNet

There are several variants of ResNet, each differing in the number of layers. The most common ones are ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152. The number after "ResNet-" indicates the number of layers in the network. For example, ResNet-50 has 50 layers.

ResNet-18 and ResNet-34

ResNet-18 and ResNet-34 are the smaller versions of ResNet, suitable for tasks and datasets that do not require extreme complexity. They follow the original design, with each residual block consisting of two 3x3 convolutions.

ResNet-50, ResNet-101, and ResNet-152

ResNet-50, ResNet-101, and ResNet-152 are larger versions of ResNet, designed for more complex tasks and larger datasets. They use the bottleneck design, with each residual block consisting of three layers: a 1x1 convolution, a 3x3 convolution, and a 1x1 convolution.

Applications

ResNet has been successfully applied in various fields, including image classification, object detection, and semantic segmentation. Its deep architecture and the ability to learn complex patterns make it suitable for these tasks.

Image Classification

ResNet was first introduced for the task of image classification. In the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2015, ResNet outperformed all the previous models and achieved a top-5 error rate of 3.57%, which was better than human-level performance.

Object Detection

In object detection tasks, ResNet is often used as a backbone network due to its powerful feature extraction capabilities. It has been used in popular object detection models like Faster R-CNN and YOLO.

Semantic Segmentation

In semantic segmentation, where the goal is to assign a class to each pixel in the image, ResNet has also shown excellent performance. It is often used in combination with other techniques like Fully Convolutional Network and U-Net.