Advanced Usage

File aliasing

The tesseract command can take care of passing data from local disk (or any fsspec-compatible resource, like HTTP, FTP, S3 Buckets, and so on) to a Tesseract via the @ syntax.

You can mount a folder into a Tesseract with --input-path. A The input path is mounted with read-only permissions so a Tesseract will never mutate files located at the input path. Paths in a Tesseract’s payload have to be relative to --input-path:

tesseract run filereference apply \
    --input-path ./testdata \
    --output-path ./output \
    '{"inputs": {"data": ["sample_2.json", "sample_3.json"]}}'

See examples/filereference

If you want to write the output of a Tesseract to a file, you can use the --output-path parameter, which also supports any fsspec-compatible target path:

$ tesseract run vectoradd apply --output-path /tmp/output @inputs.json

Logging metrics and artifacts

Tesseracts may log metrics and artifacts (e.g. iteration numbers, VTK files, …) as demonstrated in the metrics example Tesseract.

# Copyright 2025 Pasteur Labs. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0

from pydantic import BaseModel

from tesseract_core.runtime.experimental import log_artifact, log_metric, log_parameter


class InputSchema(BaseModel):
    pass


class OutputSchema(BaseModel):
    pass


def apply(inputs: InputSchema) -> OutputSchema:
    """This demonstrates logging parameters, metrics and artifacts."""
    print("This is a message from the apply function.")

    log_parameter("example_param", "value")

    for step in range(10):
        metric_value = step**2
        log_metric("squared_step", metric_value, step=step)

    text = "This is an output file we want to log as an artifact."
    with open("/tmp/artifact.txt", "w") as f:
        f.write(text)

    log_artifact("/tmp/artifact.txt")
    return OutputSchema()

By default, Tesseracts log metrics, parameters and artifacts to a directory logs in the Tesseract’s --output-path. (Note that, when running Tesseracts in a container, the log directory is placed inside the container.)

Alternatively, you can log metrics and artifacts to an MLflow server by setting the TESSERACT_MLFLOW_TRACKING_URI environment variable. For local development, you can spin up an MLflow server (ready to use with Tesseract) through the provided docker-compose file:

docker-compose -f extra/mlflow/docker-compose-mlflow.yml up

Launch the metrics example Tesseract with the the following volume mount, network and TESSERACT_MLFLOW_TRACKING_URI to ensure that it connects to that MLflow server.

tesseract serve --network=tesseract-mlflow-server --env=TESSERACT_MLFLOW_TRACKING_URI=http://mlflow-server:5000 --volume mlflow-data:/mlflow-data:rw metrics

The same options apply when executing Tesseracts through tesseract run.

As an alternative to the MLflow setup we provide, you can point your Tesseract to a custom MLflow server:

$ tesseract serve --env=TESSERACT_MLFLOW_TRACKING_URI="..."  metrics

Note that if your MLFlow server uses basic auth, you need to set the MLFLOW_TRACKING_USERNAME and MLFLOW_TRACKING_PASSWORD env variables for the Tesseract to be able to authenticate to it.

$ tesseract serve --env=TESSERACT_MLFLOW_TRACKING_URI="..." \
    --env=MLFLOW_TRACKING_USERNAME="..." --env=MLFLOW_TRACKING_PASSWORD="..." \
    metrics

If you wish to pass additional parameters to the MLflow run (such as tags, run name, or description), you can do so via the TESSERACT_MLFLOW_RUN_EXTRA_ARGS environment variable. This accepts a Python dictionary string that is passed directly to mlflow.start_run(). See supported parameters in the mlflow documentation.

Example: Setting tags only

$ tesseract serve --env=TESSERACT_MLFLOW_TRACKING_URI="..." \
    --env=TESSERACT_MLFLOW_RUN_EXTRA_ARGS='{"tags": {"key1": "value1", "key2": "value2"}}' \
    metrics

Example: Setting run name and tags

$ tesseract serve --env=TESSERACT_MLFLOW_TRACKING_URI="..." \
    --env=TESSERACT_MLFLOW_RUN_EXTRA_ARGS='{"run_name": "my_experiment", "tags": {"env": "production"}}' \
    metrics

Example: Multiple parameters

$ tesseract serve --env=TESSERACT_MLFLOW_TRACKING_URI="..." \
    --env=TESSERACT_MLFLOW_RUN_EXTRA_ARGS='{"run_name": "test_run", "description": "Testing new feature", "tags": {"version": "1.0"}}' \
    metrics

Volume mounts and user permissions

When mounting a volume into a Tesseract container, default behavior depends on the Docker engine being used. Specifically, Docker Desktop, Docker Engine, and Podman have different ways of handling user permissions for mounted volumes.

Tesseract tries to ensure that the container user has the same permissions as the host user running the tesseract command. This is done by setting the user ID and group ID of the container user to match those of the host user.

In cases where this fails or is not desired, you can explicitly set the user ID and group ID of the container user using the --user argument. This allows you to specify a different user or group for the container, which can be useful for ensuring proper permissions when accessing mounted volumes.

Warning

In cases where the Tesseract user is neither root nor the local user / file owner, you may encounter permission issues when accessing files in mounted volumes. To resolve this, ensure that the user ID and group ID are set correctly using the --user argument, or modify the permissions of files to be readable by any user.

Passing environment variables to Tesseract containers

Through the optional --env argument, you can pass environment variables to Tesseracts. This works both for serving a Tesseract and running a single execution:

$ tesseract serve --env=MY_ENV_VARIABLE="some value" helloworld
$ tesseract run --env=MY_ENV_VARIABLE="some value" helloworld apply '{"inputs": {"name": "Osborne"}}'

Parallelism and worker processes

By default, Tesseracts run with a single worker process. When handling multiple concurrent requests, you can increase the number of workers using the --num-workers argument to tesseract serve or the num_workers parameter in the Python SDK. (This option is not available for tesseract run, which processes a single request and exits.)

Each worker runs as a separate process (using multiprocessing under the hood), so they are not affected by the GIL but also don’t share in-process state.

When to use multiple workers

Multiple workers are useful when:

  • Handling concurrent requests — If multiple clients will call your Tesseract simultaneously, each worker can handle one request at a time. With a single worker, requests are processed sequentially.

  • CPU-bound computations — If your Tesseract performs CPU-intensive work and you have multiple cores available, multiple workers can process requests in parallel.

  • Batch processing — When processing many independent inputs, you can submit them concurrently and let workers handle them in parallel.

When NOT to use multiple workers

Stick with a single worker when:

  • GPU-bound computations — GPUs typically can’t run multiple processes efficiently. If your Tesseract uses a GPU, multiple workers will compete for GPU resources and may cause out-of-memory errors or slowdowns.

  • High memory usage — Each worker loads its own copy of the model/data into memory. If your Tesseract uses 4GB of RAM, 4 workers will use 16GB total.

  • Stateful operations — Workers don’t share state. If your computation requires shared state between requests, multiple workers won’t work correctly.

CLI usage

# Serve with 4 worker processes
$ tesseract serve --num-workers 4 my-tesseract

Python SDK usage

from concurrent.futures import ThreadPoolExecutor
from tesseract_core import Tesseract

# Serve with multiple workers
with Tesseract.from_image("my-tesseract", num_workers=4) as t:
    # Process requests concurrently using threads
    with ThreadPoolExecutor(max_workers=4) as executor:
        results = list(executor.map(t.apply, batch))

Choosing the right number of workers

A reasonable starting point:

  • For CPU-bound Tesseracts: num_workers = number of CPU cores

  • For I/O-bound Tesseracts (e.g., calling external APIs): num_workers = 2 * number of CPU cores

  • For GPU-bound Tesseracts: num_workers = 1 (or match the number of GPUs if using --gpus)

Monitor memory usage and adjust accordingly. More workers isn’t always better—context switching overhead can reduce throughput if you use too many.

Using GPUs

To leverage GPU support in your Tesseract environment, you can specify which NVIDIA GPU(s) to make available using the --gpus argument when running a Tesseract command. This allows you to select specific GPUs or enable all available GPUs for a task.

To run Tesseract on a specific GPU, provide its index:

$ tesseract run --gpus 0 helloworld apply '{"inputs": {"name": "Osborne"}}'

To make all available GPUs accessible, use the --gpus all option:

$ tesseract run --gpus all helloworld apply '{"inputs": {"name": "Osborne"}}'

You can also specify multiple GPUs individually:

$ tesseract run --gpus 0 --gpus 1 helloworld apply '{"inputs": {"name": "Osborne"}}'

The GPUs are indexed starting at zero with the same convention as nvidia-smi.

Deploying and interacting with Tesseracts on HPC clusters

Running Tesseracts on high-performance computing clusters can have many use cases including:

  • Deployment of a single long-running component of a pipeline on a state-of-the-art GPU.

  • Running an entire optimization workflow on a dedicated compute node

  • Large parameter scans distributed in parallel over a multitude of cores.

All of this is possible even in scenarios where containerisation options are either unavailable or incompatible by directly using tesseract-runtime (which includes a serve feature). For more details, please see our tutorial, which demonstrates how to launch uncontainerised Tesseracts using SLURM, either as a batch job or for interactive use.

Running Tesseracts without containers

In some environments, containerization may not be available or desirable. You can run Tesseracts directly using the tesseract-runtime CLI, which is the same command that runs inside Tesseract containers.

To set this up:

  1. Install tesseract-core in your Python environment (see Development installation).

  2. Install your Tesseract’s dependencies: pip install -r tesseract_requirements.txt

  3. Set the TESSERACT_API_PATH environment variable to point to your tesseract_api.py

Then use tesseract-runtime instead of tesseract run:

# Instead of:
$ tesseract run helloworld apply '{"inputs": {"name": "Tessie"}}'

# Use:
$ export TESSERACT_API_PATH=/path/to/tesseract_api.py
$ tesseract-runtime apply '{"inputs": {"name": "Tessie"}}'

The tesseract-runtime CLI supports the same endpoints and options as containerized Tesseracts. Run tesseract-runtime --help for details.

Tip

Running without containers is also useful for debugging and development.