param dim_i, dim_o loss opti dim_sample bwd_σ value dim_i=20 dim_o=30 mse cos dot mse+cos mse+dot cos+dot hetero3 hetero2 all adam, lr=1e-4 adam, lr=1e-3 sgd, lr=1e-2 sgd, lr=1e-1 all 2^0 2^-2 2^-4 2^-6 2^-8 all loglinexp logexpgelu deep-fc all hide external loss show external loss hide loss (mse) show loss (mse) hide loss (cos) show loss (cos) hide loss (dot) show loss (dot) hide loss (dot_sim) show loss (dot_sim)