Sequence Modeling: Recurrent and Recursive Nets

a recurrent neural network is a neural network that is specialized fro processing a sequence of values x^{(1)}, \dots, x^{(\tau)} . Just as convolutional networks can readily scale to images with large width and height, and some convolutional networks can readily sale to images with large width an height, and some convolutional networks can process images of variable size, recurrent networks can scale to much longer sequences than would be practical for networks without sequence-based specialization.

To go from multi-layer networks to recurrent networks, we need to take advantage of the early ideas found in machine learning and statical models of the 1980s: sharing parameters across different parts of a model.

  • Parameter sharing makes it possible to extend and apply the model to examples of different forms and generalize across them.
  • If we had separate parameters for each value of the time index, we could not generalize to sequence lengths not seen during training, nor share statistical strength across different sequence lengths and across different positions in time.
  • Such sharing is particular important when a specific piece of information can occur at multiple positions within the sequence.

Suppose that we trained a feedforward network that processes sentences of fixed length. A traditional fully connected feedforward network would have separate parameters for each input feature, so ti would need to learn all of the rules of the language separately at each position in the sentence. By comparison, a recurrent neural network shares the same weights across several time steps.

A related idea is the use of convolution across a 1-D temporal sequence. This convolutional approach is the basis for time-delay neural networks.

  • The convolution operation allows a network to share parameters across time, but is shallow. The output of convolution is a sequence where each member of the output is a function fo a small number of neighboring members of the input. The idea of parameter sharing manifests in the application of the same convolution kernel at each time step.

Recurrent networks share parameters in a different way.

  • Each member of the output is produced using the same update rule applied to the previous outputs. This recurrent formulation results in the sharing of parameters through a very deep computational graph.

This chapter extends the idea of a computational graph to include cycles. These cycles represent the influence of the present value of a variable on its own value at a future time step. Such computational graphs allow us to define recurrent neural networks. We then describe many different ways to construct, train, and use recurrent neural networks.

Read More »