Getting Started

This guide walks through setting up Mosaic, running the benchmarks yourself, and reading the output. Running locally is also how you reproduce the Results on your own target hardware, which is the most accurate measurement for your setup.

Prerequisites

Python >= 3.10
Docker — must be running. All solvers execute inside Docker containers.
Several dozen GB of disk space for building all solver containers; individual images range from ~1 GB to ~10 GB.
RAM: 16 GB minimum; 32 GB recommended for 3D fluid problems.

GPU support (optional)

GPU-enabled solvers (marked GPU in the Solver Reference) run efficiently on a CUDA-capable NVIDIA GPU. To enable that path you’ll need:

An NVIDIA GPU with CUDA support
NVIDIA Container Toolkit installed and configured
Docker configured to use the nvidia runtime

Autodiff without gradient checkpointing materialises the entire trajectory in memory, so RAM/VRAM can saturate quickly on longer rollouts — more memory is better.

Installation

git clone https://github.com/pasteurlabs/mosaic
cd mosaic

# Pick one
uv sync --extra dev        # uv (recommended)
pip install -e ".[dev]"    # pip

pre-commit install         # optional: enables lint checks on commit

Verify the installation:

mosaic --help
mosaic status        # shows all problems and solvers (no Docker needed)

Running a single solver (smoke test)

The fastest way to verify everything works is to build one solver and run a quick forward check. The first two commands come straight from tesseract-core (tesseract build, tesseract run) — Mosaic adds nothing on top.

# Build the Exponax solver (small, fast, JAX-only)
tesseract build mosaic/tesseracts/navier-stokes-grid/exponax

# Run a single forward call (uses the built image)
tesseract run exponax_navier_stokes_grid apply '{}'

# Run the forward accuracy suite for just this solver
mosaic run -p ns-3d-grid --suites forward -s Exponax

This builds one container (~2 min), runs a single forward simulation, and produces accuracy plots in mosaic-results/ns-3d-grid/forward/.

Running a benchmark suite

Each suite can target a specific problem and (optionally) a specific solver:

# Forward accuracy for all NS-grid solvers
mosaic run -p ns-grid --suites forward

# Gradient quality check for a single solver (quick debug mode)
mosaic run -p ns-grid -e gradient/fd_check --debug

# Cost scaling for structural mechanics
mosaic run -p structural-mesh --suites cost

# Full optimization convergence
mosaic run -p thermal-mesh --suites optimization

Useful flags

Flag	Effect
`-s <solver,…>`	Restrict to listed solvers. A flat CSV is a union set: each problem keeps only the listed solvers that exist there, problems with zero matches are skipped. Per-problem overrides via `<problem>=<csv>;…`.
`-e <suite>/<exp>`	Run only one experiment within a suite
`-e <suite>/<exp>/<ic>`	Pin one initial condition (e.g. `forward/agreement/tgv`)
`--debug`	Small problem size for quick iteration
`--no-build`	Skip container builds (use existing images)
`--no-plots`	Skip plot generation
`--plots-only`	Regenerate plots from existing results
`--gpus 0,1,2`	Distribute solvers across multiple GPUs
`--only <state[,…]>`	Re-run only cells matching one of `failed,anom,missing,stale,excluded` (skips fresh-ok). Combinable.

Running everything

mosaic run                                           # all suites, all problems
mosaic run --problems ns-grid,structural-mesh        # filter problems
mosaic run --suites forward,gradient                 # filter suites

Selecting solvers

-s (alias --solvers) accepts two forms:

# Flat CSV — a union set applied to every problem in -p.
# Each problem keeps only the listed solvers that exist there; problems
# with zero matches are skipped (not silently expanded to "run all").
mosaic run -s OpenFOAM,XLB,deal.II,JAX-FEM

# Per-problem map — explicit picks per domain. Problems not listed in
# the map pass through unchanged (all solvers).
mosaic run -s "ns-grid=XLB,jax-cfd;structural-mesh=Firedrake,JAX-FEM"

Names must match the display form exactly (XLB, OpenFOAM, deal.II, JAX-FEM, …). A name not in any registered problem aborts the run with a “Did you mean…?” hint before any image build.

Re-running a subset by status

After an initial run, --only re-executes only the cells in a given state, leaving fresh-ok cells untouched (the merge-aware result writer keeps their entries intact). Useful for iterating on a single solver or recovering from a partial failure without redoing everything.

mosaic run --only failed                  # re-run only failed cells
mosaic run --only failed,stale            # plus anything stale
mosaic run --only missing                 # first-time runs only
mosaic run --only failed,missing,stale    # everything that isn't fresh-ok
mosaic run -s PhiFlow --only excluded     # re-check after dropping an exclusion

States: failed, anom, missing, stale, excluded. Compose multiple with commas. Combinable with -p / --suites / -e / -s for finer scoping.

Warning

mosaic run builds ~20 Docker containers and runs the complete evaluation. Expect several hours on a machine with a modern GPU, or much longer on CPU only. The container images add up to dozens of GB on disk. Consider starting with a single problem (mosaic run --problems thermal-mesh) to validate your setup before committing the time and storage.

Resuming after a crash

A long run may get killed mid-experiment by an OOM, host reboot, or SIGKILL. --continue resumes from where the prior invocation left off, at two granularities:

Per experiment. Experiments whose result.json already exists under the output directory are skipped entirely. Multi-IC experiments count as done once every IC subdir has a result.json.
Per solver. Within a partially-completed experiment, each harness writes result_partial.json after each solver finishes; on resume, solvers already recorded there are filtered out so the remaining ones can pick up the work.

mosaic run --continue                       # resume all problems / suites
mosaic run -p ns-grid --continue            # scope to one problem
mosaic run --only failed --continue         # only retry failed cells, skip ok

--continue composes with --only — useful when a long sweep mostly succeeded but a few cells timed out: --only failed --continue re-runs just those without restarting fresh-ok cells.

Understanding output

Results are organized by problem and suite:

mosaic-results/
  ns-grid/
    forward/
      baseline/result.json    # per-solver forward errors
      agreement/result.json   # cross-solver agreement
      *.png                   # generated plots
    gradient/
      fd_check/result.json    # FD verification results
    cost/
      result.json             # wall-clock scaling data
    optimization/
      recovery/result.json    # optimization convergence

Use mosaic status to get a summary table of all completed experiments:

mosaic status                             # full per-problem tables
mosaic status -p ns-grid -f               # single problem with failure reasons
mosaic status --format md > report.md     # export as markdown
mosaic status --format json > snap.json   # machine-readable snapshot

Troubleshooting

Container start failures

A solver whose container fails to start — broken image, missing CUDA runtime, or an import error inside the API module — surfaces as a failed cell with the underlying exception message in mosaic status -f. (Previously these dropped silently as NOT_RUN, indistinguishable from “wasn’t selected”; the runner now records them via an on_error callback so the status pipeline can classify the cell as FAILED.) Retry the cell with mosaic run --only failed once the container is fixed.

Container build failures

If tesseract build fails:

Check Docker is running: docker info
Check disk space: docker system df
For GPU solvers, verify the NVIDIA runtime: docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
Try building with verbose output: tesseract build <path> --verbose
Inspect the build log for missing system dependencies (common with FEniCS/Firedrake)

Solver timeouts

Solvers that exceed the HTTP watchdog timeout (1200s default) are killed automatically. This typically happens with:

CPU-only solvers at large grid sizes
Long rollouts on slow solvers

Use --debug for a smaller problem size, or run with -s <solver> to isolate.

Memory issues

3D fluid problems at high resolution can exceed available (V)RAM. Symptoms include:

OOM kills (check dmesg or docker logs)
Silent NaN outputs

Reduce resolution or limit to fewer concurrent solvers via --gpus.

Getting help

If you run into problems not covered here, visit the Tesseract Forum for community support, or open an issue on GitHub.