Tips for Defining Tesseract APIs

Advanced Pydantic features

Warning

Pydantic V2 metadata and transformations like AfterValidator, Field, model_validator, and field_validator are generally supported for all inputs named inputs (first argument of various endpoints), and outputs of apply. They are silently stripped in all other cases (except in abstract_eval).

Tesseract uses Pydantic to define and validate endpoint signatures. Pydantic is a powerful library that allows for complex type definition and validation, but not all of its features are supported by Tesseract.

One core feature of Tesseract is that only the input and output schema for apply is user-specified, while all other endpoint schemas are inferred from them, which cannot preserve all features of the original schema.

Tesseract supports almost all Pydantic features for endpoint inputs named inputs (that is, the first argument to apply, jacobian, jacobian_vector_product, vector_jacobian_product):

class InputSchema(BaseModel):
    # ✅ Field metadata + validators
    field: int = Field(..., description="Field description", ge=0, le=10)

    # ✅ Nested models
    nested: NestedModel

    # ✅ Default values
    default: int = 10

    # ✅ Union types
    union: Union[int, str]
    another_union: int | str

    # ✅ Generic containers
    list_of_ints: List[int]
    dict_of_strs: Dict[str, str]

    # ✅ Field validators
    validated_field: Annotated[int, AfterValidator(my_validator)]

    # ✅ Model validators
    @model_validator
    def check_something(self):
        if self.field > 10:
            raise ValueError("Field must be less than 10")
        return self

    # ❌ Recursive models, will raise a build error
    itsame: "InputSchema"

    # ❌ Custom types with __get_pydantic_core_schema__, will raise runtime errors
    custom: CustomType

Note

In case you run into issues with Pydantic features not listed here, please open an issue.

🔪 Sharp edge: x86 vs ARM architecture on Apple Silicon

If you’re using a Mac, your system uses the ARM64 processor architecture, while many Docker images and Python packages are built for x86_64 (also known as AMD64). This can lead to architecture incompatibilities when building or running Tesseracts.

Common symptoms:

  • Build failures with errors mentioning “platform mismatch” or “exec format error”

  • Runtime errors like exec /tesseract/entrypoint.sh: exec format error

  • Slow performance due to Rosetta 2 emulation

  • Package installation failures in tesseract_requirements.txt

  • A Python package fails to install because it doesn’t provide a pre-built Linux ARM64 wheel

Solutions:

  1. Build x86_64 images for sharing or compatibility (recommended): If you intend to share Tesseracts with others, deploy to x86_64 servers, or are running into difficulties with missing ARM64 wheels, build for x86_64. Edit your tesseract_config.yaml:

    # tesseract_config.yaml
    build_config:
      target_platform: linux/amd64 # Build for x86_64
    

    Note this uses QEMU emulation and will be slower to build, but produces images that work everywhere.

  2. Build for your native architecture (for local development): By default, Tesseract builds for your native platform. If you only need to run locally, this is faster. You can explicitly set it in tesseract_config.yaml:

    # tesseract_config.yaml
    build_config:
      target_platform: linux/arm64 # Explicitly set for Apple Silicon
    
  3. Use ARM-compatible base images: Some base images don’t have ARM64 variants. Check that your base image supports ARM64 (e.g., python:3.11-slim supports both architectures).

  4. Handle packages without Linux ARM64 wheels: Some Python packages don’t provide pre-built wheels for Linux ARM64. Note that a macOS ARM64 wheel is not sufficient here, since Tesseracts run in Linux containers.

    One solution is to include the system packages required to build the wheel from source during the tesseract build step by specifying the extra_packages build option. Common required packages may include build-essential, gcc, or nvidia-cuda-toolkit:

    # tesseract_config.yaml
    build_config:
      extra_packages:
        - build-essential
        - gcc
    

    Other options include using conda (venv_backend: conda) or pinning to a version that has ARM64 support. Alternatively, build for x86_64 as described above.

To verify the architecture of a built Tesseract image: docker inspect --format='{{.Architecture}}' my_tesseract:latest

🔪 Sharp edge: abstract_eval and field validators

A special case are the inputs and outputs to abstract_eval, which also keep the full Pydantic schema, albeit with some limitations. In particular, all Array types will be replaced by a special object that only keeps the shape and dtype of the array, but not the actual data. Therefore, validators that depend on arrays must check for this special object and pass it through:

class InputSchema(BaseModel):
    myarray: Array[(None,), Float64]

    @field_validator("myarray", mode="after")
    @classmethod
    def check_array(cls, v) -> np.ndarray:
        # Pass through non-arrays
        # ⚠️ Without this, abstract_eval breaks ⚠️
        if not isinstance(v, np.ndarray):
            return v

        # This is the actual validator that's used for other endpoints
        return v + 1

Building Tesseracts with private dependencies

In case you have some dependencies in tesseract_requirements.txt for which you need to ssh into a server (e.g., private repositories which you specify via “git+ssh://…”), you can make your ssh agent available to tesseract build with the option --forward-ssh-agent. Alternatively you can use pip download to download a dependency to the machine that builds the Tesseract.

Customizing the build process

The build_config section of tesseract_config.yaml controls how the Tesseract image is built. Common reasons to customize it:

  • Your code needs system libraries (e.g., gfortran, libgomp1) — use extra_packages to install them via apt-get.

  • You need a specific Python version or GPU drivers — override base_image (must be Debian-based).

  • You’re deploying to a different architecture (e.g., ARM64 on AWS Graviton) — set target_platform.

  • Your Tesseract needs data files at runtime (model weights, config files) — use package_data to copy them into the image.

  • None of the above cover your case — use custom_build_steps to inject arbitrary Dockerfile commands. See the Dockerfile template for where these are injected.

See also

For the full list of options and their defaults, see the Configuration reference.

For worked examples, see the Package Data, Pyvista on ARM64, and Fortran Integration building blocks.

Creating a Tesseract from a Python package

Sometimes it is useful to create a Tesseract from an already-existing Python package. In order to do so, you can run tesseract init in the root folder of your package (i.e., where setup.py and requirements.txt would be). Import your package as needed in tesseract_api.py, and specify the dependencies you need at runtime in tesseract_requirements.txt.