M3-L1: Deep Convolutional Neural Networks: Architectures

 1.  The use of Convolutional Neural Networks has become the state-of-the-art approach in various image recognition, computer vision, and speech recognition tasks. 

2.  Summary of new CNN architectures have been proposed and have evolved over time to tackle image recognition tasks.

3.  LeNet (LeCun et al., 1998) was one of the first CNNs successfully proposed for image recognition tasks, over other supervised learning tools. 

The architecture was designed to process low-resolution images of handwritten digits.

The model combines two layers of 5x5 convolutions with pooling and three fully connected layers.

Originally, it used softmax activations and average pooling.

LeNet was proposed even before ReLU and max-pooling were “discovered”!


From feature engineering to network engineering Summary of the recent developments in the design of CNNs:

  • AlexNet: Automatic feature extraction with deep CNNs.
  • VGG Networks: Stacking of 3x3 convolutions improves the results of wide layers.
  • NiN: Adding local non-linearities.
  • GoogLeNet, ResNet, ResNeXt: Multi-branch networks that extract different components to approximate general functions.

This has led to new models where the design of the network is also trained on top of the usual model
parameters in a neural network: AnyNet.