File IO with InputPath / OutputPath

Context

Instead passing file constents to the input payload, a Tesseract can declare InputPath / OutputPath fields that refer to files or directories on the --input-path / --output-path mounts. This is useful when inputs or outputs are large on disk, or consist of many files.

Example Tesseract (examples/file_io)

Using InputPath and OutputPath you can include references to files or directories in the InputSchema and OutputSchema of a Tesseract. The schemas make sure that a path exists (either locally or in the Tesseract) and resolve paths correctly in both tesseract-runtime and tesseract run calls.

class InputSchema(BaseModel):
    paths: list[CheckedInputPath]
class OutputSchema(BaseModel):
    paths: list[CheckedOutputPath]
def apply(inputs: InputSchema) -> OutputSchema:
    output_path = Path(get_config().output_path)
    result = []

    for src in inputs.paths:
        if src.is_dir():
            # copy any folder that is given
            dest = output_path / src.name
            shutil.copytree(src, dest)
        else:
            # copy any file that is given, and if it references a .bin file, copy that too
            dest = output_path / src.with_suffix(".copy").name
            shutil.copy(src, dest)
            bin = bin_reference(src)
            if bin is not None:
                shutil.copy(src.parent / bin, dest.parent / bin)
        result.append(dest)
    return OutputSchema(paths=result)

For the tesseract-runtime command, paths are relative to the local input/output paths:

tesseract-runtime apply \
    --input-path ./testdata \
    --output-path ./output \
    '{"inputs": {"paths": ["sample_0.json", "sample_1.json"]}}'

For the tesseract run command, the path reference schemas resolve to the mounted input/output folders inside the Tesseract:

tesseract run file_io apply \
    --input-path ./testdata \
    --output-path ./output \
    '{"inputs": {"paths": ["sample_2.json", "sample_3.json"]}}'

What InputPath / OutputPath do in a schema

InputPath and OutputPath are Pydantic-aware Path subclasses that carry validation logic. Use InputPath in your InputSchema and OutputPath in your OutputSchema.

InputPath fields — caller sends a relative string, apply receives an absolute Path:

caller sends → "sample_8.json"
  → built-in   →  Path("/tesseract/input_data/sample_8.json")   (resolved + existence check)
  → apply sees →  Path("/tesseract/input_data/sample_8.json")
  • Rejects any path that would escape input_path (path traversal protection).

  • Raises ValidationError if the resolved path does not exist.

OutputPath fieldsapply returns an absolute Path, caller receives a relative string:

apply returns  →  Path("/tesseract/output_data/sample_8.copy")
  → built-in         →  Path("sample_8.copy")                          (existence check + prefix stripped)
  → caller receives  →  "sample_8.copy"
  • Raises ValidationError if the path does not exist inside output_path.

Composing user-defined validators

Use Annotated with AfterValidator to attach custom validators to a path reference. Validators always receive absolute paths for both input and output types:

from typing import Annotated
from pydantic import AfterValidator
from tesseract_core.runtime.experimental import InputPath, OutputPath

def has_bin_sidecar(path: Path, info) -> Path:
    """Check that any binref JSON has its .bin sidecar present."""
    if path.is_file():
        name = bin_reference(path)
        if name is not None:
            bin = path.parent / name
            assert bin.exists(), f"Expected .bin file for json {path} not found at {bin}"
    elif path.is_dir():
        return path
    else:
        raise ValueError(f"{path} does not exist.")
    return path

CheckedInputPath = Annotated[InputPath, AfterValidator(has_bin_sidecar)]
CheckedOutputPath = Annotated[OutputPath, AfterValidator(has_bin_sidecar)]

class InputSchema(BaseModel):
    paths: list[CheckedInputPath]

class OutputSchema(BaseModel):
    paths: list[CheckedOutputPath]

Input fields — built-in resolves first, then user validators run (on absolute paths):

caller sends → "sample_8.json"
  → built-in         →  Path("/tesseract/input_data/sample_8.json")   (resolved + existence check)
  → has_bin_sidecar  →  Path("/tesseract/input_data/sample_8.json")   (checks .bin sidecar present)
  → apply receives   →  Path("/tesseract/input_data/sample_8.json")

Output fields — user validators run on absolute paths, built-in strips on serialization:

apply returns  →  Path("/tesseract/output_data/sample_8.copy")
  → built-in         →  Path("/tesseract/output_data/sample_8.copy")   (existence check)
  → has_bin_sidecar  →  Path("/tesseract/output_data/sample_8.copy")   (checks .bin sidecar present)
  → caller receives  →  "sample_8.copy"                                (prefix stripped on serialization)