Installation

Basic installation

Note

Before proceeding, make sure you have a working installation of Docker (Docker Desktop or Docker Engine) and a modern Python installation (Python 3.10+), ideally in a virtual environment.

The simplest way to install Tesseract Core is via pip:

$ pip install tesseract-core

Then, verify everything is working as intended:

$ tesseract list

Installing Docker

Docker Desktop ships with everything you need to run Tesseract Core, including the Docker Engine CLI, Docker Compose, and Docker Buildx. It also includes a GUI for managing containers and images. It is available for Windows, macOS, and Linux for Debian and Fedora based distros.

If your system is not supported by Docker Desktop, or you prefer a more minimal setup, you will need to install the docker engine CLI together with some required plugins:

  1. docker-buildx

  2. docker-compose

To use Tesseract without sudo, you will need to add your user to the docker group. See Linux post-installation steps for Docker Engine > Manage Docker as a non-root user, or run:

$ sudo usermod -aG docker $USER

Then, log out and back in to apply the changes.

Warning

Using sudo tesseract may bypass active virtual environments and shadow the tesseract command with conflicting executables. To avoid this, make sure you’re using the correct tesseract executable, or add your user to the docker group (and omit sudo).

Runtime installation

Invoking the Tesseract Runtime directly without Docker can be useful for debugging during Tesseract creation and non-containerized deployment (see here). To install it, run:

$ pip install tesseract-core[runtime]

Warning

Some shells use [ and ] as special characters, and might error out on the pip install line above. If that happens, consider escaping these characters, e.g. -e .\[dev\], or enclosing them in double quotes, e.g. -e ".[dev]".

Common issues

Windows support

Tesseract is fully supported on Windows via the Windows Subsystem for Linux (WSL). For guidance, please refer to the official documentation.

Conflicting executables

This is not the only software called “Tesseract”. Sometimes, this leads to multiple executables with the same name, for example if you also have Tesseract OCR installed. In that case, you may encounter the following error:

$ tesseract build examples/vectoradd/ vectoradd

read_params_file: Can't open vectoradd
Error in findFileFormatStream: failed to read first 12 bytes of file
Error during processing.

To avoid it, we always recommend to use Tesseract in a separate Python virtual environment. Nevertheless, this error can still happen if you are a zsh shell user due to its way of caching paths to executables. If that’s the case, consider refreshing the shell’s executable cache with

$ hash -r

You can always confirm what executable the command tesseract corresponds with

$ which tesseract

Missing user privileges

If you lack permissions to access the Docker daemon, running e.g. tesseract build will result in the following exception:

$ tesseract build examples/helloworld
RuntimeError: Could not reach Docker daemon, check if it is running. See logs for details.

You can resolve this by adding your user to the docker group. See Linux post-installation steps for Docker Engine > Manage Docker as a non-root user, or run:

$ sudo usermod -aG docker $USER

Then, log out and back in to apply the changes.

Development installation

If you would like to install everything you need for dev work on Tesseract itself (editable source, runtime + dependencies for tests), run this instead:

$ git clone [email protected]:pasteurlabs/tesseract-core.git
$ cd tesseract-core
$ pip install -e .[dev]
$ pre-commit install