model dropout batchnorm batchnorm order activation σ fc(2) fc(4) fc(16) 0 0.5 0 1 before σ after σ Identity GELU ReLU RReLU PReLU SiLU ELU SELU GELU CELU Mish Softplus Hardswish LeakyReLU LeakyReLU(0.1) LeakyReLU(0.5) LeakyReLU(5.0) Softsign Tanh Hardtanh sigmoid Hardsigmoid Softshrink Hardshrink Tanhshrink abs atan tan sinc sin sin_other_lr sin(πx) ExpLog IdGELU IdGELUsin Idsin Softmax Softmin LogSigmoid LogSoftmax exp exp_other_lr log sqrt x^2 confusion performance loss loss raw