# Finite Difference Gradients

This example demonstrates how to make any Tesseract differentiable using **finite differences**,
without implementing analytical gradient code.

```{warning}
This feature is **experimental** and available in `tesseract_core.runtime.experimental`.
The API may change in future releases.
```

## When to use finite differences

Finite differences are useful when:

- **Prototyping**: You want to quickly test gradient-based workflows before investing time in analytical derivatives
- **Complex nested schemas**: Your inputs have deeply nested structures (like the mesh data in this example) that would be tedious to differentiate manually
- **Legacy code**: You're wrapping existing code where deriving gradients would require significant refactoring
- **Verification**: You want to cross-check your analytical gradient implementation

## Trade-offs

| Approach                          | Pros                                | Cons                                     |
| --------------------------------- | ----------------------------------- | ---------------------------------------- |
| **Auto-diff (e.g. JAX, PyTorch)** | Accurate, efficient, automatic      | Requires compatible framework            |
| **Analytical**                    | Accurate, efficient                 | Requires manual implementation           |
| **Finite Differences**            | No manual work, works with any code | Less accurate, more function evaluations |

## The example: meshstats_finitediff

This example is a variant of the `meshstats` example that computes statistics on volumetric mesh data.
Instead of hand-written Jacobian implementations, it uses the finite difference helpers.

### Input schema

The input is a complex nested structure representing a volumetric mesh:

```python
class VolumetricMeshData(BaseModel):
    n_points: int
    n_cells: int
    points: Differentiable[Array[(None, 3), Float64]]
    num_points_per_cell: Array[(None,), Float64]
    cell_connectivity: Array[(None,), Int32]
    cell_data: dict[str, Array[(None, None), Float64]]
    point_data: dict[str, Array[(None, None), Float64]]
```

### Implementing gradients with finite differences

You can implement all three AD endpoints (`jacobian`, `jacobian_vector_product`,
`vector_jacobian_product`) with just a few lines of code:

```python
from tesseract_core.runtime.experimental import (
    finite_difference_jacobian,
    finite_difference_jvp,
    finite_difference_vjp,
)

def jacobian(inputs, jac_inputs, jac_outputs):
    return finite_difference_jacobian(
        apply, inputs, jac_inputs, jac_outputs,
        algorithm="central",
        eps=1e-6,
    )

def jacobian_vector_product(inputs, jvp_inputs, jvp_outputs, tangent_vector):
    return finite_difference_jvp(
        apply, inputs, jvp_inputs, jvp_outputs, tangent_vector,
        algorithm="central",
        eps=1e-6,
    )

def vector_jacobian_product(inputs, vjp_inputs, vjp_outputs, cotangent_vector):
    return finite_difference_vjp(
        apply, inputs, vjp_inputs, vjp_outputs, cotangent_vector,
        algorithm="central",
        eps=1e-6,
    )
```

### Choosing an algorithm

The `algorithm` parameter controls the finite difference method:

| Algorithm      | Function Evaluations | Accuracy     | Use case                                    |
| -------------- | -------------------- | ------------ | ------------------------------------------- |
| `"central"`    | 2n                   | Second-order | Default choice                              |
| `"forward"`    | n + 1                | First-order  | When evaluations are expensive              |
| `"stochastic"` | O(√n)                | Approximate  | High-dimensional inputs (1000s+ parameters) |

where `n` is the number of array elements in the input that we're differentiating with respect to.

The `"stochastic"` algorithm uses
[Simultaneous Perturbation Stochastic Approximation (SPSA)](https://en.wikipedia.org/wiki/Simultaneous_perturbation_stochastic_approximation),
which estimates the Jacobian using random perturbation directions. It requires only
2 function evaluations per sample regardless of input dimension, making it useful when
the number of input dimensions is very large. The resulting gradient estimates are
unbiased, which makes them suitable for use with first-order SGD-family optimizers.
They may not be suitable for applications requiring precise Jacobian values (e.g.,
Newton-type methods).

```{note}
All three algorithms use a single scalar `eps` for the perturbation magnitude.
This means accuracy may degrade for inputs where the magnitudes of Jacobian entries
vary by many orders of magnitude (e.g., physical applications with unnormalized inputs).
If this is a concern, consider normalizing your inputs or implementing analytical gradients.
```

### Making the algorithm configurable

The example also shows how to expose the algorithm choice as an input parameter:

```python
from typing import Literal

FDAlgorithm = Literal["central", "forward", "stochastic"]

class InputSchema(BaseModel):
    mesh: VolumetricMeshData
    fd_algorithm: FDAlgorithm = "central"

def jacobian(inputs, jac_inputs, jac_outputs):
    return finite_difference_jacobian(
        apply, inputs, jac_inputs, jac_outputs,
        algorithm=inputs.fd_algorithm,  # User can choose at runtime
        eps=1e-6,
    )
```

## Full source code

```{literalinclude} ../../../../examples/meshstats_finitediff/tesseract_api.py
:language: python
:caption: tesseract_api.py
```

```{literalinclude} ../../../../examples/meshstats_finitediff/tesseract_config.yaml
:language: yaml
:caption: tesseract_config.yaml
```

## See also

- {ref}`tr-autodiff` for background on differentiable programming in Tesseracts
- The [meshstats example](https://github.com/pasteurlabs/tesseract-core/tree/main/examples/meshstats) for the version with analytical gradients