An extensible benchmark framework for differentiable physics solvers that treats gradient quality as a first-class criterion alongside forward accuracy and throughput.
What this measures, and why
If you train a model or run an optimizer through a physics simulation, the simulation has to return two things: a correct prediction (the forward pass) and a correct gradient of that prediction with respect to your inputs (the backward pass, or vector–Jacobian product, VJP). Most solver benchmarks only check the forward pass. Mosaic checks both, because a solver that runs fast and predicts accurately is still useless for gradient-based learning if its VJP is wrong, noisy, or numerically ill-conditioned.
Concretely, every solver is asked to compute an output \(y = f(x)\) and the VJP \(\bar{x} = (\partial f / \partial x)^\top \bar{y}\), and we score it on three axes: does the forward solution match a trusted reference?, does the gradient match a finite-difference ground truth?, and how much wall-clock time and memory does each cost?
How solvers are compared
Built on tesseract-core, which wraps each solver in a Docker container exposing a uniform apply / vjp interface, and tesseract-jax, which calls those containers as native JAX functions (with full JIT and grad support) from the benchmark harness. This lets a single harness compare solvers built on incompatible AD backends (JAX, PyTorch, Julia Zygote, hand-written C++ adjoints) without any shared dependency, by talking only to the common apply / vjp interface.
Benchmark domains
| ID | Domain | Optimization task | Control dim. | Backends |
|---|---|---|---|---|
| H | Heat transfer | Conductivity inversion | 128 | deal.II, FEniCS, Firedrake, JAX-FEM, torch-fem |
| S | Structural mechanics | Compliance minimization (SIMP) | 2048 | deal.II, FEniCS, Firedrake, JAX-FEM, TopOpt.jl |
| F2 | Incompressible fluids (2D) | Inflow optimization for drag min. | 32 | JAX-CFD, PhiFlow, INS.jl, XLB, PICT, Warp-NS, OpenFOAM |
| F3 | 3D Navier-Stokes | Initial condition recovery | 12288 | PhiFlow, XLB, PICT, Warp-NS, Exponax, INS.jl, OpenFOAM |
The full catalog — per-solver numerical scheme, AD strategy, image name, schema fields, and known limitations — is on the Solver Reference page.
Each domain has a results page with every plot, a best-solver ranking per category, and the task setup: Navier–Stokes 2D · Navier–Stokes 3D · Structural mechanics · Heat transfer
Evaluation protocol
All metrics are computed uniformly across solvers through the Tesseract interface. The protocol is solver-agnostic: it operates on apply and vjp and never inspects solver internals.
Setup compatibility. For each solver–task pair: whether the solver produces a usable gradient, fails numerically, or cannot run due to structural constraints (e.g. periodic-only BCs on a channel domain). Structural incompatibilities are declared via problem.exclude(key, {solver_name: Exclusion(...)}); the matching is longest-prefix (e.g. key="gradient" blocks every gradient/* experiment).
Gradient accuracy. Central finite differences (FD) through the Tesseract interface serve as the solver-agnostic ground truth. We never inspect a solver’s AD machinery, only the gradient it returns. Directional derivatives are evaluated along \(K\) random perturbation vectors (\(K = 6\text{--}20\)) and reported as:
- Cosine similarity between the AD gradient \(g_\text{ad}\) and the FD gradient \(g_\text{fd}\) (direction agreement, \(1\) = identical direction)
- Relative \(L^2\) error \(\dfrac{\lVert g_\text{ad} - g_\text{fd} \rVert}{\lVert g_\text{fd} \rVert}\) (magnitude agreement, \(0\) = exact)
The FD step size is swept over \(\varepsilon \in \{10^{-6}, \dots, 10^{-1}\}\) and the minimum error is reported, avoiding both the truncation regime (large \(\varepsilon\)) and the roundoff regime (small \(\varepsilon\)). For Navier–Stokes problems (F2, F3), the full Jacobian is additionally computed and its singular-value spectrum inspected. A wide spread of singular values signals an ill-conditioned gradient that will stall an optimizer.
Performance. Forward and VJP wall-clock time, their ratio, and peak memory, measured on each solver’s intended hardware target across 3–4 problem sizes. All timings are averaged over 3 runs after 1 warmup iteration (to pre-populate JIT caches and stabilize memory allocation).
Forward accuracy. Resolution sweep against a reference solver (OpenFOAM for fluids, deal.II for structures/thermal). Where an analytical solution exists (the Taylor–Green vortex), precision is also measured against it. Physical-law adherence (e.g. a divergence-free velocity field, \(\nabla \cdot \mathbf{u} = 0\)) is checked where applicable.
Optimization convergence. The ultimate test: can you actually optimize through the solver? We run Adam with each solver’s own gradients on each benchmark task for a fixed iteration budget (500 iterations for H, F2, F3; 2500 for S). A solver succeeds if it reaches a final objective within 1% of the best solution any solver achieved within the budget. This catches solvers whose gradients pass the FD check pointwise but still fail to drive a full optimization loop.
Benchmark suites
The CLI exposes four suites, each running a set of experiments:
| Suite | CLI command | Experiments | What it measures |
|---|---|---|---|
| forward | mosaic run --suites forward |
baseline, agreement, physical_laws |
Forward accuracy vs. reference and analytic solutions |
| cost | mosaic run --suites cost |
spatial_cost, temporal_cost, vjp_cost |
Wall-clock scaling with problem size |
| gradient | mosaic run --suites gradient |
fd_check, param_sweep, horizon_sweep, jacobian_svd |
Gradient correctness and conditioning |
| optimization | mosaic run --suites optimization |
domain-specific (e.g. recovery, drag_opt, topopt) |
End-to-end optimization convergence |
Results are saved to mosaic-results/<problem>/<suite>/ as JSON (metrics), NPZ (arrays), and PNG/PDF (plots). Per-domain breakdowns with embedded plots live under the Results menu.
→ Head over to Getting Started for installation, the smoke-test workflow, and the full CLI reference. To call a Mosaic solver from your own code without the harness, see Use Mosaic solvers elsewhere. For questions and support, visit the Tesseract Forum.
