Automatic differentiation (AD) computes exact derivatives of code — not numerical approximations. autograd implements reverse-mode AD (also called backpropagation): the forward pass runs your computation and records a graph, then the backward pass walks that graph in reverse, multiplying local Jacobians to accumulate gradients at each input.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/itsubaki/autograd/llms.txt
Use this file to discover all available pages before exploring further.
Forward pass: building the graph
Every time you call a function on aVariable, the Function.Forward method:
- Runs the concrete computation (e.g.,
sin,matmul). - Sets the output’s
Creatorpointer back to the function. - Increments the output’s
Generationcounter.
Backward pass: Backward()
Calling z.Backward() walks the graph from z back to the leaves, propagating gradients via the chain rule.
Seed gradient
Backward() initialises z.Grad to OneLike(z) (a tensor of ones with the same shape) if it is not already set.Priority queue by generation
The backward engine maintains a list of pending functions sorted by
Generation (highest first). It starts with z.Creator.Pop and call Backward
The function with the highest generation is popped. Its
Backward(gy...) method is called with the upstream gradients from its output variables.Accumulate input gradients
Each returned gradient
gx is added to the corresponding input’s Grad field. If the input has a Creator, that function is pushed onto the queue.Gradient accumulation
Gradients accumulate: callingBackward() twice without clearing adds to existing Grad values. Always call Cleargrad() before reusing a variable.
Backward options
Backward accepts an optional Opts struct that controls two behaviours.
RetainGrad
By default, intermediate variable gradients are cleared after use to free memory. SetRetainGrad: true to keep them.
RetainGrad, intermediate variable gradients (like t.Grad) are cleared to nil after the backward pass to save memory. Only leaf variable gradients (x0.Grad, x1.Grad) are retained.
CreateGraph
By default, the backward pass runs inside aNograd() scope, so the gradient computation itself does not build a graph. Set CreateGraph: true to record the backward pass in the graph — this enables higher-order gradients.
Nograd mode
variable.Nograd() disables graph creation for a scope. No Creator links are set, so no memory is allocated for the backward pass. Use this during inference or evaluation.
Nograd() returns a *Span whose End() method restores the previous state. The idiomatic pattern is defer variable.Nograd().End().
Test mode
variable.TestMode() sets Config.Train = false, which changes the behaviour of operations that differ between training and inference, such as Dropout (which passes values through unchanged in test mode).
Nograd, it returns a *Span and is restored automatically when End() is called.
Putting it together
The following example computes a loss, backpropagates, and performs a single manual gradient step:Next steps
Variables
Review the Variable struct and gradient fields.
Functions
See how functions implement Forward and Backward.
Higher-order gradients
Differentiate through the backward pass with CreateGraph.
Gradient descent
Use gradients to train a model end to end.