how to train your model
This is a recap of techniques mentioned in Direct Feedback Alignment Provides Learning in Deep Neural Networks
Currently, the most popular way is backpropagation because it's simple yet very efficient.
Though biologically speaking it is not very realistic :
- signal in neurons don't go both ways,
- the activation is rather boolean which isn't the best for derivative,
- you need to transfer the loss gradient from the output back to the input.
Boltzmann machine is a stochastic model.
A set of binary units $s∈S=\{0,1\}^N$
are fully connected with weights $W∈ℝ^{N⨯N}$
$W$'s diagonal is $0$ (no self interaction) and can be chosen to be symmetric.
$θ∈ℝ^N$ are additional biases (that could be included in $W$ in homogeneous coordinates)
We define $p_{s_i=1}$ the probability for unit $s_i$ to be active with $$\al{ ∆E_i &= s_i⋅( W_i ⋅ s + θ_i) \\ p_{s_i=1} &= \left(1+\exp\left(-\frac{∆E_i}{T}\right)\right)^{-1} }$$
We define the total energy of the system to be $$ E = - \mat{s^T&1}⋅\mat{W&0\\θ^T&0}⋅\mat{s\\1} $$
So the energy contribution of each unit $s_i$ is $$ ∆E_i := \comment{E_{s_i=0}}{0} - E_{s_i=1} = s_i⋅( W_i ⋅ s + θ_i) $$
We suppose the probability of the state of $s_i$ to follow $$ ∆E_i = [-k_B T \ln(\comment{p_{s_i=0}}{1-p_{s_i=1}})] - [-k_B T \ln(p_{s_i=1})]$$ Thus the presence of Boltzmann in the name, as well as the following relation $$ p_{s_i=1} = \left(1+\exp\left(-\frac{∆E_i}{k_BT}\right)\right)^{-1} $$ Because this isn't true physics, we can drop $k_B$ and interpret $T$ to be some arbitrary temperature like variable.
let's split $S=V ⨯ H$ where $V$ designates visible units, $H$ hidden units.
A possible choice is to enforce $W$ bipartite with respect to $V$ and $H$
Given a training set, we wish to enforce a distribution $P^{train}(V)$ on units of $V$
Using gradient descent affecting $W$, we optimize the Kullback–Leibler divergence $G$ $$ G := ∑_{v∈V} P^{train}(v)\log\left(\frac{P^{train}(v)}{P^{current}(v)}\right)$$ (where $V$ is the set of all possible configuration)