1. The use of Convolutional Neural Networks has become the state-of-the-art approach in various image recognition, computer vision, and speech recognition tasks.
2. Summary of new CNN architectures have been proposed and have evolved over time to tackle image recognition tasks.
3. LeNet (LeCun et al., 1998) was one of the first CNNs successfully proposed for image recognition tasks, over other supervised learning tools.
The architecture was designed to process low-resolution images of handwritten digits.
The model combines two layers of 5x5 convolutions with pooling and three fully connected layers.
Originally, it used softmax activations and average pooling.
▶ LeNet was proposed even before ReLU and max-pooling were “discovered”!
- AlexNet: Automatic feature extraction with deep CNNs.
- VGG Networks: Stacking of 3x3 convolutions improves the results of wide layers.
- NiN: Adding local non-linearities.
- GoogLeNet, ResNet, ResNeXt: Multi-branch networks that extract different components to approximate general functions.