an unexpected way to compute things faster

If you have a differential equation that depends on time such as $\dot{u}(t)=f(t,u(t))$, it is quite intuitive that - unless you have very nice symmetries - that you can't skip steps and need to compute the state at every timestep because of the direct dependency on previous state.

However around 1960, Jürg Nievergelt came up with a quite smart idea : instead of computing the future step by step precisely, we start with a coarse approximation. Given that the true solution is not too far, we compute a ton of mostly useless but precise integration steps starting from around those poor approximations. If we sample enough solutions, we will eventually get some pretty good solutions by accident, but we don't know yet which is good. After the computation is done, we follow from the initial condition accross each solution, selecting the ones that happened to be the best ones (we can also interpolate between multiple best ones).

The nice idea is to compute a ton of things (in parallel), and make the "selection" of the result part of the computation which can come for very cheap if done well. In 2022, with GPUs and clusters, maybe you can feel the amazing potential of such techniques : one can bring an immense speedup to a process that intuitively should not be possible to accelerate.

Let's visualize with this example : $\dot{u}(t) = cos(cos(t) + u(t))$ with forward euler as a solver (of course one can consider recursion where the solver is also Nievergelt). Everytime we rerun with a different initial condition $u(0)=u_0$

step 1 coarse $\ca{U_0&=u_0 \\ U_{n+1} &= solver(T_{n},T_{n+1},U_{n})}$ step 2 fine $\ca{v^k_{n,0}&=U_n + ε_k \\ v^k_{n,m+1} &= solver(T_{n,m},T_{n,m+1},v^k_{n,m})}$ such that $\ca{T_{n,0}&=T_{n} \\ T_{n,M}&=T_{n+1}}$ step 3 choose $w_{n,m} = v^κ_{n,m}$ where $\comment{κ = \argmin_k |v^k_{n-1,M} - v^k_{n,0}|}{\text{select $κ$ that maximizes continuity}}$

Of course, this process can be seen as an iterative process where we refine a coarser approximation.

For a realtime/continuous usage, one could imagine to generalize this technique so that instead of computing the coarse approximation first, and then the fine ones, we could progressively introduce new points (change how we synchronize stuff), break dependency to make use of the current fine knowledge to compute the coarse approximation, make the amount of fine solve adaptive etc... we could also precompute part of simulation before they are perturbed by the user's input so that only necessary sections get to be updated... but that requires more thinking.

this is completely off topic but that also makes me think that if we can describe the "roughness" of the ODE/PDE (in an intuitively very similar way to how we consider roughness mapping for CG image rendering), we could decide to adapt timesteps to skip the "smooth/obvious" part, or maybe introduce analytical solution to a locally approximated ODE/PDE...

Another off topic consideration : could random timestep integrator be better than constant step integrator (with the constant step equal to the average of the random one), as it feels like we could express higher frequency events that way...

However there is a big problem : this method computes a lot of waste, and the larger the working domain is, the more samples we need. Even though we have massively parallel computing capabilities, they are still very finite. Can we do better ? Yes we can. The basic idea is to view the "selection" step as a constraint on initial condition so that the ending position is compatible with the starting condition of the next section. Obviously we can't know the ending position in advance, but lets just write the idea anyway.

Let's rewrite the problem we are solving again : $\ca{ \dot{u}(t) = f(t,u(t)), ∀t∈[0,T]\\ u(0)=u_0}$

Now we split the time domain $[0,T]$ into smaller chunks $0=T_0 < T_1 < ... < T_N=T$

Now lets split $u$ in each domain $\ca{ \dot{u}_n(t) = f(t,u_n(t)), ∀t∈[T_n,T_{n+1}]\\ u_n(T_n)=U_n }$

Now we want compatibility : $u_n(T_{n+1},U_n)=U_{n+1}$ which is obvious with analytical solution, but not at all with numerical approximation. This relation can be rewritten into $F(U)=0$ with $$F(U)=\mat{ U_{0}-u_0 \\ U_{1}-u_0(T_{1},U_0) \\ U_{2}-u_1(T_{2},U_1) \\ ... \\ U_{n+1}-u_n(T_{n+1},U_n) \\ }$$ On which we run Newton's algorithm to find $F(U)=0$ $$U^{k+1} = U^k - F'(U^k)^{-1}⋅F(U^k)$$ $$\text{where }F'(U^k)_{i,j}=\frac{∂F_i}{∂U_j}(U^k) = \mat{ 1 & 0 & 0 & 0 & 0 \\ -\frac{∂u_0}{∂U_0}(T_1,U_0^k) & 1 & 0 & 0 & 0 \\ 0 & -\frac{∂u_1}{∂U_1}(T_2,U_1^k) & 1 & 0 & 0 \\ 0 & 0 & \ddots & \ddots & 0 \\ 0 & 0 & 0 & -\frac{∂u_{N-1}}{∂U_{N-1}}(T_N,U_{N-1}^k) & 1 \\ }$$ From which we can compute $$U_{n+1}^{k+1} = \blue{u_n(T_{n+1},U_n^k)} + \red{\frac{∂u_n}{∂U_n}(T_{n+1},U_n^k)}⋅(U_n^{k+1}-U_n^k)$$ which happens to be a Taylor expansion of $u_n$ relative to its given initial condition $U_n^k$

If we were to discretize the last relation, we get $$\al{ U_{n+1}^{k+1} &= \blue{u_n(T_{n+1},U_n^k)} + \frac{ \red{u_n(T_{n+1},U_n^{k+1})-u_n(T_{n+1},U_n^k)} }{ \cancel{\red{(U_n^{k+1}-U_n^k)}} }⋅\cancel{(U_n^{k+1}-U_n^k)} \\ }$$ However in the discrete world, we don't have the analytical expression of $u_n(T_{n+1},U_n^k)$ so they must be replace with an estimation from a solver. And here is the idea : we could use two solvers. One, coarse to compute the $\red{∂/∂U_n}$ part, and one finer to compute the first $\blue{u_n}$ term.

This choice is indeed a nice one because the $\blue{\text{blue term}}$ can be computed in parallel for all $n$ at each $k$, and the $\red{\text{red term}}$ is computed sequentially. The sequential nature of that second part makes us want to use a cheap coarse integrator.

And tadah ! this is the parareal algorithm.

$$\ca{ U_{0}^{0} &= u_0 \\ U_{n+1}^{0} &= \red{u_n(T_{n+1},U_n^0)} \\ U_{n+1}^{k+1} &= \blue{u_n(T_{n+1},U_n^k)} + \red{u_n(T_{n+1},U_n^{k+1})-u_n(T_{n+1},U_n^k)} }$$ where $\red{\text{red}}$ is computed from a coarse solver, $\blue{\text{blue}}$ is computed from a fine solver.

I wonder if there is a "best timestep distribution"

We could try to mix parareal with Nievergelt, maybe with some stochastic integrator, or a multitude of estimation of the $\red{\text{red}}$ discrete derivative.

source : Martin Gander's Time Parallel Time Integration course

23 juin 2021