By default,Documentation Index
Fetch the complete documentation index at: https://mintlify.com/itsubaki/autograd/llms.txt
Use this file to discover all available pages before exploring further.
Backward() computes gradients but does not build a graph for those gradients — they cannot themselves be differentiated. Passing variable.Opts{CreateGraph: true} retains the computation graph through the backward pass, enabling a second Backward() call to differentiate the gradient itself.
How it works
When autograd evaluatesy = f(x), it records the computation as a graph. Calling y.Backward() traverses this graph in reverse, computing ∂y/∂x and storing it in x.Grad. The gradient is a plain variable with no attached graph.
With CreateGraph: true, the backward computation itself becomes a differentiable graph. The result stored in x.Grad is a variable connected to a computation graph, so you can call x.Grad.Backward() again to obtain the second derivative.
Double backpropagation — Sin example
The derivatives of sin cycle with period 4:sin → cos → -sin → -cos → sin → ...
Starting from x = 1.0:
cos(1), -sin(1), -cos(1), sin(1), cos(1), -sin(1).
Call
x.Cleargrad() between iterations to discard the previous gradient before computing the next one. Without it, gradients accumulate via addition instead of replacing the previous value.Newton’s method via double backprop
Newton’s method requires the second derivative. WithCreateGraph: true, you can compute it without manually writing the second derivative formula.
x = 1.
Combining higher-order gradients
You can mix first and second derivatives in the same expression. For example, computingz = (dy/dx)³ + y:
When to use higher-order gradients
| Use case | Why |
|---|---|
| Newton’s method | Requires the Hessian (second derivative) for each update step |
| Hessian-vector products | Efficient second-order information for optimization |
| Meta-learning (MAML) | Differentiating through an inner optimization loop |
| Gradient penalty (e.g. WGAN-GP) | Penalizing the norm of gradients as part of the loss |
Next steps
Gradient Descent
First-order optimization using plain Backward() calls.
Visualization
Render computation graphs as images to understand the graph structure.
Autograd Concepts
How the computation graph is built and traversed during backward passes.
Variable API
Full reference for variable.Opts, Cleargrad, and Backward.