# Performance Trade-offs & Optimization ## Overview Using Tesseracts adds overhead to your computations through: 1. **Container startup** (~2s) — One-time cost when starting a containerized Tesseract. 2. **HTTP communication** (~2.5ms locally, up to ~50-100ms+ in cloud setups) — Request/response handling per call. 3. **Data transfer** — Moving data between client and server (depends on data size and network bandwidth). 4. **Data serialization** — Encoding arrays for transport (depends on data size and encoding format). 5. **Framework overhead** (~0.5ms) — Internal machinery, schema processing. Present even in non-containerized mode. For workloads where computations take seconds or longer, total overhead is typically negligible. See the [rules of thumb](#rules-of-thumb-by-use-case) for guidance on which overhead sources dominate for different workloads. ```{note} Tesseract is not a high-performance RPC framework. If your workload requires microsecond latency or millions of calls per second, consider a more traditional RPC framework. ``` ## Example scenario: Locally hosted Tesseract ### Benchmarking scenario The numbers and figures on this page are based on benchmarks run under a specific scenario. This scenario represents a best-case baseline for Tesseract overhead: it minimizes network latency and container virtualization costs, so the numbers isolate framework overhead rather than infrastructure overhead. - **Bare-metal Linux Docker** (no Docker Desktop virtualization) - **Loopback networking** (Tesseract running on the same machine as the client) - **Local SSD** for binref disk I/O - All arrays are **float64** Your numbers will differ depending on your setup. In particular: - **Docker Desktop (macOS/Windows)** adds a virtualization layer, increasing container startup time and HTTP latency, and decreasing raw performance. - **Remote Tesseracts** make network latency and bandwidth the dominant cost for HTTP mode — compact encodings (base64, binref) matter even more. - **Network-attached storage** for binref can be significantly slower than local SSD. Benchmark with representative inputs to understand the trade-offs for your use case. ### The right interaction mode depends on your workload ```{warning} Advice in this section is specific to the benchmarking scenario described above. Your mileage will vary based on your setup and workload. ``` The following chart shows absolute overhead (in milliseconds) for each interaction mode across a range of array sizes, using a no-op Tesseract that does nothing but decode and encode data: ```{figure} /img/benchmark_overhead.png :alt: Tesseract overhead by interaction mode :width: 80% Overhead comparison across interaction modes for different array sizes. Uses a no-op Tesseract that does nothing but decode and encode data, isolating framework overhead. ``` | Mode (color) | What it measures | Typical use case | | ------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | | **Non-containerized, in-memory** (blue) | Direct Python calls via `Tesseract.from_tesseract_api`. Passes data as in-memory Python objects without Docker or HTTP. | Development, tight loops, performance-critical paths | | **Containerized, json+base64 via HTTP** (orange) | Full Docker + HTTP stack, served via HTTP (e.g. `tesseract serve`). Includes serialization and network transfer. | Production, CI/CD, multi-language environments | | **Containerized, json+binref via CLI** (purple) | CLI invocation via `tesseract run`. Includes container startup and disk I/O, but avoids transferring data over the network. | Shell scripts, one-off runs, pipelines with large data | The guidance chart below puts these numbers in context by showing overhead as a percentage of computation time, for three representative I/O sizes (dotted = 1kB, dashed = 1MB, solid = 1GB). For small data, fixed costs (HTTP roundtrip, container startup) dominate. For large data, transfer and serialization take over. ```{figure} /img/benchmark_guidance.png :alt: Tesseract overhead guidance chart :width: 95% Overhead as percentage of computation time, depending on interaction mode and I/O data size, for the benchmark scenario (local Tesseract with fast network and disk). Some lines overlap where modes have similar performance characteristics: non-containerized usage across all data sizes, and CLI usage for all but the largest data sizes. ``` (rules-of-thumb-by-use-case)= ### Rules of thumb by use case ```{warning} Advice in this section is specific to the benchmarking scenario described above. Your mileage will vary based on your setup and workload. ``` | Scenario | Recommendation | | ---------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Second-scale workloads on medium-size data** | The sweet spot for containerized HTTP execution, with low overhead benefitting from most Tesseract features. | | **Development and debugging** | Use {doc}`non-containerized execution ` or {doc}`tesseract-runtime serve ` for fast iteration, then switch to containerized HTTP for final testing. | | **Cheap operations on small data via HTTP** | HTTP overhead (~2.5ms) can dominate when computation is fast. Batch multiple inputs into a single request. | | **Tight loops on in-memory data** | Consider {doc}`non-containerized execution ` to bypass all network/container overhead. At ~0.5ms per call, you can run thousands of iterations per second. Requires all dependencies to be available in the same local Python environment. | | **Shell scripts and one-off runs** | CLI is convenient but has ~2s overhead per invocation from container startup. For multiple calls, keep a container running. | | **Long-running operations on large datasets** | Use CLI with `json+binref` encoding. The ~2s container overhead is negligible for multi-minute runs, and binref allows large arrays to be passed between Tesseracts without expensive copies. | | **Cheap operations on huge datasets** | Serialization and transfer will dominate. Try partitioning your workload so each Tesseract call does more compute per byte of I/O, or use binref to avoid redundant data copies between pipeline stages. | ## Optimizing performance ### 1. Choose the right encoding format Encoding format affects both serialization time and the volume of data transferred. A 10M-element float64 array is ~76MB as raw binary, ~100MB as base64, and ~230-760MB as JSON. If I/O is slow, data transfer dominates over serialization, and choosing a compact format is the most effective optimization. In short: use **base64** (default) for HTTP transport, **binref** for large arrays or disk-based pipelines, and **json** only when you need human-readable output. See {doc}`/content/using-tesseracts/array-encodings` for format details and usage examples. ### 2. Batch small operations If you have many small operations, batch them into a single request: ```python # ❌ Avoid: Many small calls for item in items: result = tesseract.apply({"data": item}) # ✅ Prefer: Batch into one call results = tesseract.apply({"data": np.stack(items)}) ``` Note that your Tesseract's `apply` function must be written to accept batched inputs (e.g., arrays with a leading batch axis) for this to work. ### 3. Reuse Tesseract instances Container startup is expensive. Reuse instances across calls: ```python # Good - reuse the context with Tesseract.from_image("my-tesseract") as tesseract: for batch in batches: result = tesseract.apply(batch) # Bad - new container per call for batch in batches: with Tesseract.from_image("my-tesseract") as tesseract: result = tesseract.apply(batch) ``` If you're running a script multiple times against the same Tesseract, consider keeping a container running and connecting via `Tesseract.from_url()`: ```python # Start once: tesseract serve my-tesseract tesseract = Tesseract.from_url("http://localhost:8100") result = tesseract.apply(inputs) ``` ### 4. Profile to find bottlenecks Enable profiling to see where time is spent: ```bash # Via CLI tesseract run myimage apply '{"inputs": {...}}' --profiling ``` Or via the Python SDK: ```python tess = Tesseract.from_tesseract_api( "/path/to/tesseract_api.py", runtime_config={"profiling": True} ) ``` See {ref}`profiling` in the debugging guide for more usage examples and how to interpret the output.