Here we look at the performance of fashion MNIST classification task using different activation functions.
We try different model depth and structures
fc(N) = nn.Sequential(
flatten(),
linear(28*28, ... ), activation(), batchnorm(), dropout(),
linear( ... , ... ), activation(), batchnorm(), dropout(),
linear( ... , 10 ), activation(), batchnorm(), dropout(),
softmax(),
)
For following analysis, we'll only run experiments on fc(4) and fc(16) and keep dropout=0
cn(N,m_size,M) = nn.Sequential(
# feature extraction part, has M convolutions
conv2d( 1 , ..., kernel=3), activation(), maxpool2d(m_size),
conv2d( ... , ..., kernel=3), activation(), maxpool2d(m_size),
conv2d( ... , 128, kernel=3), activation(), maxpool2d(m_size),
# classification part, has N linear units
flatten(),
linear( ... , ... ), activation(),
linear( ... , ... ), activation(),
linear( ... , 10 ), activation(),
softmax(),
)
model
activation
dropout