Dataloader

Context

This is an example of a Tesseract that loads in data from a folder by mounting the folder in the cli.

Example Tesseract (examples/dataloader)

In the actual Tesseract, we may have logic that’s expecting data samples for the input schema

class InputSchema(BaseModel):
    # NOTE: no file references here
    data: LazySequence[Differentiable[Array[(None, 3), Float32]]] = Field(
        description="Data to be processed."
    )

The inputted data may be processed as such in an apply function

def apply(inputs: InputSchema) -> OutputSchema:
    """Process data samples and compute their sum."""
    out_data = []
    data_sum = np.zeros(3)

    # iterating over inputs.data loads its contents one by one
    for data in inputs.data:
        # we only keep processed data here for demonstration
        out_data.append(data * 2)
        data_sum += data.sum(axis=0)

    return OutputSchema(data=out_data, data_sum=data_sum)

You can then pass in data into this Tesseract by mounting the directory where the data samples are stored using the tesseract flag --volume

tesseract run dataloader \
    --volume $here/testdata:/mnt/testdata:ro \
    apply '{"inputs": {"data": "@/mnt/testdata/*.json"}}' | jq