Advanced Usage¶
File aliasing¶
The tesseract CLI can load data from local disk or any fsspec-compatible resource (HTTP, FTP, S3, etc.) using the @ syntax.
Use --input-path to mount a folder into the Tesseract (read-only). Paths in the payload must be relative to --input-path:
tesseract run filereference apply \
--input-path ./testdata \
--output-path ./output \
'{"inputs": {"data": ["sample_2.json", "sample_3.json"]}}'
See the filereference example for a complete walkthrough.
To write output to a file, use --output-path (also supports fsspec-compatible targets):
$ tesseract run vectoradd apply --output-path /tmp/output @inputs.json
See also
For handling large datasets that don’t fit in memory, see the out-of-core dataloading tutorial which demonstrates streaming data through Tesseracts using file references and volume mounts.
Logging metrics and artifacts¶
Tesseracts can log metrics and artifacts (e.g., iteration numbers, VTK files) as shown in the metrics example:
# Copyright 2025 Pasteur Labs. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
from pydantic import BaseModel
from tesseract_core.runtime.experimental import log_artifact, log_metric, log_parameter
class InputSchema(BaseModel):
pass
class OutputSchema(BaseModel):
pass
def apply(inputs: InputSchema) -> OutputSchema:
"""This demonstrates logging parameters, metrics and artifacts."""
print("This is a message from the apply function.")
log_parameter("example_param", "value")
for step in range(10):
metric_value = step**2
log_metric("squared_step", metric_value, step=step)
text = "This is an output file we want to log as an artifact."
with open("/tmp/artifact.txt", "w") as f:
f.write(text)
log_artifact("/tmp/artifact.txt")
return OutputSchema()
By default, metrics, parameters, and artifacts are logged to a logs directory in the Tesseract’s --output-path. (When running in a container, this directory lives inside the container.)
To log to an MLflow server instead, set the TESSERACT_MLFLOW_TRACKING_URI environment variable. For local development, spin up an MLflow server using the provided Docker Compose file:
docker-compose -f extra/mlflow/docker-compose-mlflow.yml up
Then launch the metrics Tesseract with the appropriate volume mount, network, and tracking URI:
tesseract serve --network=tesseract-mlflow-server --env=TESSERACT_MLFLOW_TRACKING_URI=http://mlflow-server:5000 --volume mlflow-data:/mlflow-data:rw metrics
The same options work with tesseract run.
To connect to a custom MLflow server instead:
$ tesseract serve --env=TESSERACT_MLFLOW_TRACKING_URI="..." metrics
If your MLflow server uses basic auth, pass the credentials as environment variables:
$ tesseract serve --env=TESSERACT_MLFLOW_TRACKING_URI="..." \
--env=MLFLOW_TRACKING_USERNAME="..." --env=MLFLOW_TRACKING_PASSWORD="..." \
metrics
To pass additional parameters to the MLflow run (tags, run name, description), use TESSERACT_MLFLOW_RUN_EXTRA_ARGS. This accepts a Python dictionary string passed directly to mlflow.start_run().
Example: Setting tags only
$ tesseract serve --env=TESSERACT_MLFLOW_TRACKING_URI="..." \
--env=TESSERACT_MLFLOW_RUN_EXTRA_ARGS='{"tags": {"key1": "value1", "key2": "value2"}}' \
metrics
Example: Setting run name and tags
$ tesseract serve --env=TESSERACT_MLFLOW_TRACKING_URI="..." \
--env=TESSERACT_MLFLOW_RUN_EXTRA_ARGS='{"run_name": "my_experiment", "tags": {"env": "production"}}' \
metrics
Example: Multiple parameters
$ tesseract serve --env=TESSERACT_MLFLOW_TRACKING_URI="..." \
--env=TESSERACT_MLFLOW_RUN_EXTRA_ARGS='{"run_name": "test_run", "description": "Testing new feature", "tags": {"version": "1.0"}}' \
metrics
Volume mounts and user permissions¶
Permission handling for mounted volumes varies between Docker Desktop, Docker Engine, and Podman. By default, Tesseract maps the container user’s UID and GID to match the host user running the tesseract command.
If this doesn’t work for your setup, override it with the --user argument to set a specific UID/GID for the container.
Warning
If the container user is neither root nor the file owner, you may encounter permission errors on mounted volumes. Fix this by setting the correct UID/GID with --user, or by making the files readable by all users.
Passing environment variables¶
Use --env to pass environment variables to Tesseract containers. This works with both serve and run:
$ tesseract serve --env=MY_ENV_VARIABLE="some value" helloworld
$ tesseract run --env=MY_ENV_VARIABLE="some value" helloworld apply '{"inputs": {"name": "Osborne"}}'
Parallelism and worker processes¶
By default, Tesseracts run with a single worker process. To handle concurrent requests, increase the worker count with --num-workers (for tesseract serve) or the num_workers parameter (in the Python SDK). This is not available for tesseract run, which processes a single request and exits.
Each worker runs as a separate process, so they are not affected by the GIL but do not share in-process state.
When to use multiple workers¶
Multiple workers are useful when:
Handling concurrent requests — If multiple clients will call your Tesseract simultaneously, each worker can handle one request at a time. With a single worker, requests are processed sequentially.
CPU-bound computations — If your Tesseract performs CPU-intensive work and you have multiple cores available, multiple workers can process requests in parallel.
Batch processing — When processing many independent inputs, you can submit them concurrently and let workers handle them in parallel.
When NOT to use multiple workers¶
Stick with a single worker when:
GPU-bound computations — GPUs typically can’t run multiple processes efficiently. If your Tesseract uses a GPU, multiple workers will compete for GPU resources and may cause out-of-memory errors or slowdowns.
High memory usage — Each worker loads its own copy of the model/data into memory. If your Tesseract uses 4GB of RAM, 4 workers will use 16GB total.
Stateful operations — Workers don’t share state. If your computation requires shared state between requests, multiple workers won’t work correctly.
CLI usage¶
# Serve with 4 worker processes
$ tesseract serve --num-workers 4 my-tesseract
Python SDK usage¶
from concurrent.futures import ThreadPoolExecutor
from tesseract_core import Tesseract
# Serve with multiple workers
with Tesseract.from_image("my-tesseract", num_workers=4) as t:
# Process requests concurrently using threads
with ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(t.apply, batch))
Choosing the right number of workers¶
As a starting point:
CPU-bound:
num_workers= number of CPU coresI/O-bound (e.g., calling external APIs):
num_workers= 2 x number of CPU coresGPU-bound:
num_workers= 1 (or match the number of GPUs if using--gpus)
Monitor memory usage and adjust. More workers isn’t always better — context switching overhead can reduce throughput.
Using GPUs¶
Use the --gpus argument to make NVIDIA GPUs available to a Tesseract.
To use a specific GPU:
$ tesseract run --gpus 0 helloworld apply '{"inputs": {"name": "Osborne"}}'
To use all available GPUs:
$ tesseract run --gpus all helloworld apply '{"inputs": {"name": "Osborne"}}'
To specify multiple GPUs:
$ tesseract run --gpus 0 --gpus 1 helloworld apply '{"inputs": {"name": "Osborne"}}'
GPUs are indexed starting at zero, matching nvidia-smi conventions.
Tesseracts on HPC clusters¶
Common HPC use cases for Tesseracts include:
Deploying a long-running pipeline component on a GPU node
Running an optimization workflow on a dedicated compute node
Distributing parameter scans across many cores
This works even without containerization, using tesseract-runtime serve directly. See our HPC tutorial for a SLURM-based walkthrough covering both batch and interactive use.
Running Tesseracts without containers¶
When containerization is unavailable or undesirable, run Tesseracts directly using the tesseract-runtime CLI (the same command that runs inside Tesseract containers).
Setup:
Install tesseract-core with the runtime extra.
Install the Tesseract’s dependencies:
pip install -r tesseract_requirements.txtSet
TESSERACT_API_PATHto point to yourtesseract_api.py.
Then use tesseract-runtime instead of tesseract run:
# Instead of:
$ tesseract run helloworld apply '{"inputs": {"name": "Tessie"}}'
# Use:
$ export TESSERACT_API_PATH=/path/to/tesseract_api.py
$ tesseract-runtime apply '{"inputs": {"name": "Tessie"}}'
tesseract-runtime supports the same endpoints and options as containerized Tesseracts. Run tesseract-runtime --help for details.
Tip
Running without containers is also useful for debugging and development.