File IO with InputPath / OutputPath¶
Context¶
Instead passing file constents to the input payload, a Tesseract can
declare InputPath / OutputPath fields that refer to files or directories on
the --input-path / --output-path mounts. This is useful when inputs or
outputs are large on disk, or consist of many files.
Example Tesseract (examples/file_io)¶
Using InputPath and OutputPath you can
include references to files or directories in the InputSchema and OutputSchema of a Tesseract.
The schemas make sure that a path exists (either locally or in the Tesseract)
and resolve paths correctly in both tesseract-runtime and tesseract run calls.
class InputSchema(BaseModel):
paths: list[CheckedInputPath]
class OutputSchema(BaseModel):
paths: list[CheckedOutputPath]
def apply(inputs: InputSchema) -> OutputSchema:
output_path = Path(get_config().output_path)
result = []
for src in inputs.paths:
if src.is_dir():
# copy any folder that is given
dest = output_path / src.name
shutil.copytree(src, dest)
else:
# copy any file that is given, and if it references a .bin file, copy that too
dest = output_path / src.with_suffix(".copy").name
shutil.copy(src, dest)
bin = bin_reference(src)
if bin is not None:
shutil.copy(src.parent / bin, dest.parent / bin)
result.append(dest)
return OutputSchema(paths=result)
For the tesseract-runtime command, paths are relative to the local input/output paths:
tesseract-runtime apply \
--input-path ./testdata \
--output-path ./output \
'{"inputs": {"paths": ["sample_0.json", "sample_1.json"]}}'
For the tesseract run command, the path
reference schemas resolve to the mounted input/output folders inside the
Tesseract:
tesseract run file_io apply \
--input-path ./testdata \
--output-path ./output \
'{"inputs": {"paths": ["sample_2.json", "sample_3.json"]}}'
What InputPath / OutputPath do in a schema¶
InputPath and OutputPath are Pydantic-aware Path subclasses that carry validation logic. Use InputPath in your InputSchema and OutputPath in your OutputSchema.
InputPath fields — caller sends a relative string, apply receives an absolute Path:
caller sends → "sample_8.json"
→ built-in → Path("/tesseract/input_data/sample_8.json") (resolved + existence check)
→ apply sees → Path("/tesseract/input_data/sample_8.json")
Rejects any path that would escape
input_path(path traversal protection).Raises
ValidationErrorif the resolved path does not exist.
OutputPath fields — apply returns an absolute Path, caller receives a relative string:
apply returns → Path("/tesseract/output_data/sample_8.copy")
→ built-in → Path("sample_8.copy") (existence check + prefix stripped)
→ caller receives → "sample_8.copy"
Raises
ValidationErrorif the path does not exist insideoutput_path.
Composing user-defined validators¶
Use Annotated with AfterValidator to attach custom validators to a path
reference. Validators always receive absolute paths for both input and
output types:
from typing import Annotated
from pydantic import AfterValidator
from tesseract_core.runtime.experimental import InputPath, OutputPath
def has_bin_sidecar(path: Path, info) -> Path:
"""Check that any binref JSON has its .bin sidecar present."""
if path.is_file():
name = bin_reference(path)
if name is not None:
bin = path.parent / name
assert bin.exists(), f"Expected .bin file for json {path} not found at {bin}"
elif path.is_dir():
return path
else:
raise ValueError(f"{path} does not exist.")
return path
CheckedInputPath = Annotated[InputPath, AfterValidator(has_bin_sidecar)]
CheckedOutputPath = Annotated[OutputPath, AfterValidator(has_bin_sidecar)]
class InputSchema(BaseModel):
paths: list[CheckedInputPath]
class OutputSchema(BaseModel):
paths: list[CheckedOutputPath]
Input fields — built-in resolves first, then user validators run (on absolute paths):
caller sends → "sample_8.json"
→ built-in → Path("/tesseract/input_data/sample_8.json") (resolved + existence check)
→ has_bin_sidecar → Path("/tesseract/input_data/sample_8.json") (checks .bin sidecar present)
→ apply receives → Path("/tesseract/input_data/sample_8.json")
Output fields — user validators run on absolute paths, built-in strips on serialization:
apply returns → Path("/tesseract/output_data/sample_8.copy")
→ built-in → Path("/tesseract/output_data/sample_8.copy") (existence check)
→ has_bin_sidecar → Path("/tesseract/output_data/sample_8.copy") (checks .bin sidecar present)
→ caller receives → "sample_8.copy" (prefix stripped on serialization)