FileReference

Content

Tesseract that mounts input and output directories as datasets. To be used for Tesseracts with large inputs and/or outputs.

Example Tesseract (examples/filereference)

Using InputFileReference and OutputFileReference you can include references to files in the InputSchema and OuputSchema of a Tesseract. The file reference schemas make sure that a file exists (either locally or in the Tesseract) and resolve paths correctly in both tesseract-runtime and tesseract run calls.

class InputSchema(BaseModel):
    data: list[InputFileReference]
class OutputSchema(BaseModel):
    data: list[OutputFileReference]
def apply(inputs: InputSchema) -> OutputSchema:
    output_path = Path(get_config().output_path)
    files = []
    for source in inputs.data:
        # source is a pathlib.Path starting with /path/to/input_path/...
        target = output_path / source.name
        # target must be a pathlib.Path at /path/to/output_path
        target = target.with_suffix(".copy")
        shutil.copy(source, target)
        files.append(target)
    return OutputSchema(data=files)

For the tesseract-runtime command, paths are relative to the local input/output paths:

tesseract-runtime apply \
    --input-path ./testdata \
    --output-path ./output \
    '{"inputs": {"data": ["sample_0.json", "sample_1.json"]}}'

For the tesseract run command, the file reference schemas resolve to the mounted input/output folders inside the Tesseract:

tesseract run filereference apply \
    --input-path ./testdata \
    --output-path ./output \
    '{"inputs": {"data": ["sample_2.json", "sample_3.json"]}}'

For the Python SDK usage examples see test_tesseract.py.