The core idea of MobileNet is Depthwise Separable Convolution. Refer to the PDF for exact layer setup / network setup.

The model can be made smaller by two ways:

  • Making it thinner (less channels)
  • Reduce representation (shrink input resolution). Note in this paper they are just providing different input resolution, instead of having a downsample layer. What they learned:
  • Making model thinner is better than shallower (with less layers, even if they do not operate on feature map of a different resolution). Table 5
  • Width / resolution scaler is smooth, unless we make the network way too thin. Table 6, Table 7