Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/itsubaki/autograd/llms.txt

Use this file to discover all available pages before exploring further.

The model package provides composable neural network models — MLP and LSTM — built from layer primitives. Each model embeds Model, which manages a slice of layers and exposes a unified parameter and gradient interface consumed by optimizers.

Layer interface

Every element in Model.Layers must satisfy the Layer interface. The built-in layer types (LinearT, RNNT, LSTMT) all implement it.
type Layer interface {
    First(x ...*variable.Variable) *variable.Variable
    Forward(x ...*variable.Variable) []*variable.Variable
    Params() layer.Parameters
    Cleargrads()
}
First
func(x ...*variable.Variable) *variable.Variable
Runs the layer forward and returns the first output variable. Convenience wrapper around Forward.
Forward
func(x ...*variable.Variable) []*variable.Variable
Runs the layer forward and returns all output variables.
Params
func() layer.Parameters
Returns the learnable parameters (layer.Parameters) for this layer.
Cleargrads
func()
Sets all parameter gradients to nil, preparing for the next forward/backward pass.

Model

Model is the base struct embedded by MLP and LSTM. It aggregates parameters and gradient clearing across all its layers.
type Model struct {
    Layers []Layer
}
Layers
[]Layer
The ordered list of layers that make up the model. You can inspect or replace layers directly.

Methods

Params

func (m Model) Params() layer.Parameters
Returns a merged layer.Parameters map from every layer. Keys are prefixed with the layer index ("0.W", "1.b", etc.) so parameter names remain unique across layers.

Cleargrads

func (m *Model) Cleargrads()
Calls Cleargrads() on every layer, zeroing all gradient tensors. Call this before each backward pass to prevent gradient accumulation.

Activation

Activation is the function type used for inter-layer activations in MLP.
type Activation func(x ...*variable.Variable) *variable.Variable
Any function with this signature can be used as an activation. The standard library activations (F.Sigmoid, F.ReLU, F.Tanh, F.Softmax(axis)) all satisfy this type.

MLP

MLP is a multi-layer perceptron with a configurable number of hidden layers and a shared activation function applied between each layer. The final layer has no activation.
type MLP struct {
    Activation Activation
    Model
}
Activation
Activation
The activation function applied after every layer except the last. Defaults to F.Sigmoid.

NewMLP

func NewMLP(outSize []int, opts ...MLPOptionFunc) *MLP
Creates an MLP where each element of outSize defines the output size of a Linear layer.
outSize
[]int
required
Output sizes for each layer. len(outSize) determines the depth of the network.
opts
...MLPOptionFunc
Option functions that customize the MLP before the layers are initialized.
Layer weights are initialized with the default random source. Use WithMLPSource to set a deterministic source for reproducible experiments.

Option functions

WithMLPActivation

func WithMLPActivation(activation Activation) MLPOptionFunc
Overrides the default F.Sigmoid activation with any Activation-compatible function.

WithMLPSource

func WithMLPSource(s randv2.Source) MLPOptionFunc
Sets the random source (math/rand/v2.Source) used to initialize layer weights. Pass a seeded source for reproducible results.

Forward

func (m *MLP) Forward(x *variable.Variable) *variable.Variable
Runs the input x through every hidden layer with the configured activation, then through the final layer with no activation.
x
*variable.Variable
required
The input variable. Shape must be compatible with the first layer’s weight matrix.

LSTM

LSTM wraps a single layer.LSTMT followed by a layer.LinearT projection. It exposes a stateful forward pass and a method to reset the hidden and cell states between sequences.
type LSTM struct {
    Model
}

NewLSTM

func NewLSTM(hiddenSize, outSize int, opts ...LSTMOptionFunc) *LSTM
hiddenSize
int
required
The dimensionality of the LSTM hidden state.
outSize
int
required
The output size of the projection Linear layer that follows the LSTM.
opts
...LSTMOptionFunc
Option functions applied before the layers are constructed.

Option functions

WithLSTMSource

func WithLSTMSource(s randv2.Source) LSTMOptionFunc
Sets the random source (math/rand/v2.Source) for initializing both the LSTM and projection layer weights.

Forward

func (m *LSTM) Forward(x *variable.Variable) *variable.Variable
Runs x through the LSTM layer, then through the projection layer. The LSTM hidden and cell states are carried forward across calls — reset them with ResetState between sequences.

ResetState

func (m *LSTM) ResetState()
Clears the LSTM hidden state (h) and cell state (c). Call this at the start of each new sequence during training or inference.

Examples

Training an MLP

package main

import (
    "fmt"

    F "github.com/itsubaki/autograd/function"
    "github.com/itsubaki/autograd/model"
    "github.com/itsubaki/autograd/optimizer"
    "github.com/itsubaki/autograd/variable"
)

func main() {
    // Two hidden layers of size 10, output size 1
    mlp := model.NewMLP([]int{10, 10, 1})

    opt := &optimizer.SGD{LearningRate: 0.01}

    x := variable.New(1.0, 2.0, 3.0)
    t := variable.New(0.0, 1.0, 0.0)

    for i := range 100 {
        y := mlp.Forward(x)
        loss := F.MeanSquaredError(y, t)

        mlp.Cleargrads()
        loss.Backward()
        opt.Update(mlp)

        if i%10 == 0 {
            fmt.Println("loss:", loss.Data)
        }
    }
}

Using ReLU activation

mlp := model.NewMLP(
    []int{64, 64, 10},
    model.WithMLPActivation(F.ReLU),
)

Training an LSTM on a sequence

lstm := model.NewLSTM(64, 1)
opt := &optimizer.Adam{Alpha: 0.001, Beta1: 0.9, Beta2: 0.999}

for epoch := range 10 {
    lstm.ResetState()
    lstm.Cleargrads()

    var loss *variable.Variable
    for _, x := range sequence {
        y := lstm.Forward(x)
        loss = F.MeanSquaredError(y, target)
    }

    loss.Backward()
    opt.Update(lstm)
}
Call ResetState() at the start of each new sequence so the LSTM does not carry state from the previous sequence into the next.

See also