Automatic differentiation

Automatic differentiation (AD) computes exact derivatives of code — not numerical approximations. autograd implements reverse-mode AD (also called backpropagation): the forward pass runs your computation and records a graph, then the backward pass walks that graph in reverse, multiplying local Jacobians to accumulate gradients at each input.

Forward pass: building the graph

Every time you call a function on a Variable, the Function.Forward method:

Runs the concrete computation (e.g., sin, matmul).
Sets the output’s Creator pointer back to the function.
Increments the output’s Generation counter.

This forms a directed acyclic graph (DAG) linking outputs back to inputs through function nodes.

x := variable.New(1.0)     // Generation 0, Creator nil
y := variable.Square(x)    // Generation 1, Creator → Square
z := variable.Sin(y)       // Generation 2, Creator → Sin

At this point no gradients exist — only the graph structure.

Backward pass: `Backward()`

Calling z.Backward() walks the graph from z back to the leaves, propagating gradients via the chain rule.

z.Backward()

fmt.Println(x.Grad) // dz/dx = cos(x²) * 2x

Seed gradient

Backward() initialises z.Grad to OneLike(z) (a tensor of ones with the same shape) if it is not already set.

Priority queue by generation

The backward engine maintains a list of pending functions sorted by Generation (highest first). It starts with z.Creator.

Pop and call Backward

The function with the highest generation is popped. Its Backward(gy...) method is called with the upstream gradients from its output variables.

Accumulate input gradients

Each returned gradient gx is added to the corresponding input’s Grad field. If the input has a Creator, that function is pushed onto the queue.

Repeat until the queue is empty

The loop continues until all reachable functions have been visited, leaving gradients on every leaf variable that participated in the computation.

Gradient accumulation

Gradients accumulate: calling Backward() twice without clearing adds to existing Grad values. Always call Cleargrad() before reusing a variable.

x := variable.New(3.0)

y := variable.Square(x)
y.Backward()
fmt.Println(x.Grad) // variable(6)

// Without Cleargrad, the second backward would give variable(12)
x.Cleargrad()

y2 := variable.Square(x)
y2.Backward()
fmt.Println(x.Grad) // variable(6)

Backward options

Backward accepts an optional Opts struct that controls two behaviours.

type Opts struct {
    RetainGrad  bool
    CreateGraph bool
}

RetainGrad

By default, intermediate variable gradients are cleared after use to free memory. Set RetainGrad: true to keep them.

x0 := variable.New(1.0)
x1 := variable.New(1.0)
t  := variable.Add(x0, x1)
y  := variable.Add(x0, t)

// With RetainGrad: intermediate gradient t.Grad is kept
y.Backward(variable.Opts{RetainGrad: true})
fmt.Println(t.Grad)           // variable(1) — kept
fmt.Println(x0.Grad, x1.Grad) // variable(2) variable(1)

Without RetainGrad, intermediate variable gradients (like t.Grad) are cleared to nil after the backward pass to save memory. Only leaf variable gradients (x0.Grad, x1.Grad) are retained.

CreateGraph

By default, the backward pass runs inside a Nograd() scope, so the gradient computation itself does not build a graph. Set CreateGraph: true to record the backward pass in the graph — this enables higher-order gradients.

x := variable.New(1.0)
y := variable.Sin(x)
y.Backward(variable.Opts{CreateGraph: true})

// x.Grad is now cos(x), and it has its own Creator chain
fmt.Println(x.Grad) // variable(0.5403...)

// Differentiate again to get -sin(x)
gx := x.Grad
x.Cleargrad()
gx.Backward(variable.Opts{CreateGraph: true})
fmt.Println(x.Grad) // variable(-0.8414...)

Use CreateGraph: true when computing Hessian-vector products, implementing Newton’s method, or training with gradient-based meta-learning. See the higher-order gradients guide for worked examples.

Nograd mode

variable.Nograd() disables graph creation for a scope. No Creator links are set, so no memory is allocated for the backward pass. Use this during inference or evaluation.

x := variable.New(3.0)

func() {
    defer variable.Nograd().End()

    y := variable.Square(x)
    y.Backward()            // y.Creator is nil, so Backward is a no-op
    fmt.Println(x.Grad)     // <nil>
}()

// Backprop is automatically re-enabled after the scope exits
y := variable.Square(x)
y.Backward()
fmt.Println(x.Grad) // variable(6)

Nograd() returns a *Span whose End() method restores the previous state. The idiomatic pattern is defer variable.Nograd().End().

Nograd mutates a package-level Config variable. It is not safe to use concurrently from multiple goroutines without external synchronisation.

Test mode

variable.TestMode() sets Config.Train = false, which changes the behaviour of operations that differ between training and inference, such as Dropout (which passes values through unchanged in test mode).

func evaluate(model *MyModel, x *variable.Variable) *variable.Variable {
    defer variable.TestMode().End()
    return model.Forward(x)
}

Like Nograd, it returns a *Span and is restored automatically when End() is called.

fmt.Println(variable.Config.Train) // true

func() {
    defer variable.TestMode().End()
    fmt.Println(variable.Config.Train) // false
}()

fmt.Println(variable.Config.Train) // true

Putting it together

The following example computes a loss, backpropagates, and performs a single manual gradient step:

package main

import (
    "fmt"
    "github.com/itsubaki/autograd/variable"
)

func main() {
    // Parameters
    w := variable.New(2.0)
    b := variable.New(1.0)

    // Input and target
    x := variable.New(3.0)
    target := variable.New(10.0)

    // Forward: pred = w*x + b
    pred := variable.Add(variable.Mul(w, x), b)

    // Loss: (pred - target)²
    diff := variable.Sub(pred, target)
    loss := variable.Square(diff)

    // Backward
    loss.Backward()

    fmt.Println("loss  =", loss)    // loss  = variable(9)
    fmt.Println("dL/dw =", w.Grad) // dL/dw = variable(-6)
    fmt.Println("dL/db =", b.Grad) // dL/db = variable(-2)

    // Manual gradient step (lr = 0.01)
    lr := 0.01
    w.Data.Data[0] -= lr * w.Grad.At()
    b.Data.Data[0] -= lr * b.Grad.At()
}

Next steps

Variables

Review the Variable struct and gradient fields.

Functions

See how functions implement Forward and Backward.

Higher-order gradients

Differentiate through the backward pass with CreateGraph.

Gradient descent

Use gradients to train a model end to end.

Get Started

Core Concepts

Guides

Automatic differentiation

Forward pass: building the graph

Backward pass: `Backward()`

Gradient accumulation

Backward options

RetainGrad

CreateGraph

Nograd mode

Test mode

Putting it together

Next steps

Variables

Functions

Higher-order gradients

Gradient descent

Get Started

Core Concepts

Guides

Documentation Index

​Forward pass: building the graph

​Backward pass: Backward()

​Gradient accumulation

​Backward options

​RetainGrad

​CreateGraph

​Nograd mode

​Test mode

​Putting it together

​Next steps

Variables

Functions

Higher-order gradients

Gradient descent

Forward pass: building the graph

Backward pass: `Backward()`

Gradient accumulation

Backward options

RetainGrad

CreateGraph

Nograd mode

Test mode

Putting it together

Next steps