Troubleshooting

This section is intended to gather information on common failure mode of PeleLMeX. Additional information can be found in GitHub issues of PeleLM and PeleLMeX

Linear solver failure

The PeleLMeX algorithm involves multiple linear solves to handle projections and implicit diffusion. In the event of the solver is enable to solve the problem, the code will abort with the following message:

` amrex::Abort::0::MLMG failed !!! `

or

` amrex::Abort::0::MLMG failing so lets stop here !!! `

appearing multiple times when using more than one MPI rank. The first thing to do is to identify which linear solve is failing and how. To do so, one needs to increase PeleLMeX, as well as the projection solves verbose (see the Control section for more details on LMeX controls):

peleLM.verbose = 3
nodal_proj.verbose = 2
mac_proj.verbose = 2

Note that we focused on the projection solves here because they are generally more prone to failure than the diffusion ones. You can then restart the simulation again and identify if the code is failing in the nodal projection, either during the initial projection (following Initial velocity projection) or during the time step one (following - oneSDC()::ScalarReaction() –>), or in the MAC-projection (right after SDC iter [1]). Then, the linear solver verbose is useful to understand how the solver fails. If the solver hangs around a small value following an initial reduction of the residual:

MLMG: # of AMR levels: 1
  # of MG levels on the coarsest AMR level: 9
MLMG: Initial rhs               = 2666.243975
MLMG: Initial residual (resid0) = 2666.243975
MLMG: Iteration   1 Fine resid/bnorm = 0.03858916872
MLMG: Iteration   2 Fine resid/bnorm = 0.001142880258
MLMG: Iteration   3 Fine resid/bnorm = 3.300053779e-04
MLMG: Iteration   4 Fine resid/bnorm = 9.433906375e-06
MLMG: Iteration   5 Fine resid/bnorm = 2.665697369e-07
MLMG: Iteration   6 Fine resid/bnorm = 7.40910596e-09
MLMG: Iteration   7 Fine resid/bnorm = 2.071981144e-10
MLMG: Iteration   8 Fine resid/bnorm = 2.66772528e-11
MLMG: Iteration   9 Fine resid/bnorm = 2.568558082e-11
MLMG: Iteration  10 Fine resid/bnorm = 2.713587827e-11
MLMG: Iteration  11 Fine resid/bnorm = 2.490776046e-11
MLMG: Iteration  12 Fine resid/bnorm = 2.41198728e-11
MLMG: Iteration  13 Fine resid/bnorm = 2.527429436e-11
MLMG: Iteration  14 Fine resid/bnorm = 2.431036667e-11
MLMG: Iteration  15 Fine resid/bnorm = 2.479456555e-11
MLMG: Iteration  16 Fine resid/bnorm = 2.28960372e-11
MLMG: Iteration  17 Fine resid/bnorm = 2.541484652e-11
MLMG: Iteration  18 Fine resid/bnorm = 2.522691579e-11
MLMG: Iteration  19 Fine resid/bnorm = 2.508988366e-11
...

it generally means that the required solver tolerance is too small for the problem. The default relative tolerances of all solvers in PeleLMeX is 1e-11, but increasing the resolution, using a small amr.blocking_factor (<16) or large flow divergence across coarse-fine interfaces can lead to the example above. In this case, one can increase the tolerance of the faulty solver using one of:

nodal_proj.rtol = 5e-11
mac_proj.rtol   = 5e-11
diffusion.rtol  = 5e-11

It is sometimes necessary to increase the tolerance up 5e-10. If you need to go higher than this ballpark value, it probably indicates that something is wrong in the problem setup and one should take a closer look at the solution to understand the problem. Alternatively, the solver can fail as follows:

MLMG: # of AMR levels: 2
  # of MG levels on the coarsest AMR level: 6
MLMG: Initial rhs               = 395786.0963
MLMG: Initial residual (resid0) = 395786.0963
MLMG: Iteration   1 Fine resid/bnorm = 0.009458721163
MLMG: Iteration   2 Fine resid/bnorm = 1046166408
MLMG: Iteration   3 Fine resid/bnorm = 5.420966957e+23

In this case, the solver diverges and it is generally a clear indication that the problem is not properly setup.

Chemistry integration failure

PeleLMeX relies on Sundials CVODE to integrate the stiff ODE resulting of the chemical system (along with advection/diffusion forcing). CVODE has multiple failure modes, but the most common one appearing in PeleLMeX will promp a message similar to one of the following:

From CVODE: At t = 0 and h = 6.01889e-195, the corrector convergence test failed repeatedly or with |h| = hmin.```
From CVODE: At t = 2.459e-6 and h = 6.01889e-16, the corrector convergence test failed repeatedly or with |h| = hmin.```
[CVODE ERROR]  CVode
    At t = 5.09606e-09, mxstep steps taken before reaching tout.

All of which indicate that the internal sub-stepping algorithm of CVODE did not managed to integrate the system of ODEs up to the CFL-constrained time step requested by PeleLMeX because CVODE logic reauired awfully small substep size.

In the case of the first message, one can see that CVODE failed right away (At t = 0) which suggests that the state given to CVODE was wrong. If this happens right at the start of the simulation, your initial solution is most likely erroneous.

In the case of the second message, the system was integrated up to 2.459e-6 s, but CVODE was not able to proceed any further as its internal step size dropped to a small value. This could indicates that your CFL condition is too loose and the chemical stifness can’t be properly handled by CVODE. You can consider reduce your CFL number:

peleLM.cfl = 0.1

if your CFL step size is too large (generally >1e-5 s). e.g. as for a slow, laminar case. This message can also appear if your state contains species mass fraction undershoots due to poor spatial resolution. In this case, one can use the following option:

ode.clean_init_massfrac = 1

where the ODE integration is then computed as an increment where the initial species mass fractions [0-1] bounds are enforced.