Dataloader¶
Context¶
This is an example of a Tesseract that loads in data from a folder by mounting the folder in the cli.
Example Tesseract (examples/dataloader)¶
In the actual Tesseract, we may have logic that’s expecting data samples for the input schema
class InputSchema(BaseModel):
# NOTE: no file references here
data: LazySequence[Differentiable[Array[(None, 3), Float32]]] = Field(
description="Data to be processed."
)
The inputted data may be processed as such in an apply function
def apply(inputs: InputSchema) -> OutputSchema:
"""Process data samples and compute their sum."""
out_data = []
data_sum = np.zeros(3)
# iterating over inputs.data loads its contents one by one
for data in inputs.data:
# we only keep processed data here for demonstration
out_data.append(data * 2)
data_sum += data.sum(axis=0)
return OutputSchema(data=out_data, data_sum=data_sum)
You can then pass in data into this Tesseract by mounting the directory where the data samples are stored using the tesseract flag --volume
tesseract run dataloader \
--volume $here/testdata:/mnt/testdata:ro \
apply '{"inputs": {"data": "@/mnt/testdata/*.json"}}' | jq