Julian Henry

Back-of-the-Envelope: Recurrent Neural Networks

06 Aug 2024

Purpose

Terse notes on RNN’s.

  1. Applications
  2. RNN Model
  3. Illustration : Character-Level Language Model
  4. Long Short-Term Memory
  5. Python Code Vanilla Example
  6. Further Reading

Applications

RNN’s are especially adept at sequence data. They hold context in weights and can reference all elements of a sequence for ingestion and memory.

The Mathematics of a Vanilla Recurrent Neural Network

  1. Vanilla Forward Pass vanilla-forward-pass
  2. Vanilla Backward Pass back-propagation
  3. Vanilla Bidirectional Pass bidirectional-rnn
  4. Training of Vanilla RNN

  5. Vanishing and exploding gradient problem

    While training using BPTT the gradients have to travel from the last cell all the way to the first cell. The product of these gradients can go to zero or increase exponentially. The exploding gradients problem refers to the large increase in the norm of the gradient during training. The vanishing gradients problem refers to the opposite behavior, when long term components go exponentially fast to norm 0, making it impossible for the model to learn correlation between temporally distant events.

unrolled-rnn computation-graph

Illustration : Character-Level Language Model

example example2 example3

LSTM

The LSTM architecture consists of a set of recurrentlyconnectedsubnets, known as memory blocks. These blocks canbe thoughtofas a differentiable version of the memory chips in a digital computer. Each block contains one or more self-connectedmemorycells and three multiplicative units that provide continuousanalogues of write, read and reset operations for the cells: namely, the input, output and forget gates.

formulae

lstm lstm-cell

Feed-Forward LSTM equations feed-forward-lstm

Feed-Backward LSTM equations feed-backward-lstm

Python Code Vanilla Example (Karpathy)

Note, lines 48-58 are where backpropagation occurs and where the vanishing/exponential gradient issue arises in plain RNN.

Gist

https://gist.github.com/karpathy/d4dee566867f8291f086

Commentary Blog

https://towardsdatascience.com/recurrent-neural-networks-rnns-3f06d7653a85

Further Reading