Long Short Term Memory

The general Neural Network can only use independent data, it do not have the “memory”.

The first approach is to add a recurse in net. The output will still contain the information of the input. This called Recursive Neural Network (RNN), but it always leads to a Gradient Vanishing.

Here Ct1C_{t-1} and CtC_t is the Cell state to let information pass through with unchanged.

But for RNN, learning long-term dependencies with Gradient Decent is hard: http://ai.dinfo.unifi.it/paolo//ps/tnn-94-gradient.pdf

Then LSTM is designed as a special RNN and has 3 gate to control the cell state. It has 输入门,遗忘门,输出门.

Forget Gate

  • ht1h_{t-1} is the output of the last cell(the whole green box above).
  • xtx_t is the input of the current cell
  • σ\sigma is the sigmod function: result is between 0 and 1, 0 means all forget and 1 means keep all.
  • ftf_t is the result of the sigmoid function.

Input Gate

Sigmoid decide which information need to be update. iti_t is the result of the sigmoid, so we can use itCt~i_t\cdot \tilde{C_t} to be the scaled value which means how much we decided to update each state value. Similarly, ftCt1f_t\cdot C_{t-1} is how much we want to keep from the previous state.

Tanh layer is to update the old information.

  1. CtC_t is the Cell information or Cell state. and Ct1C_{t-1} has been updated to CtC_t
  2. Then we use ftCt1+itCt~f_t\cdot C_{t-1} + i_t \cdot \tilde{C_t} as our new CtC_t

Output Gate

Based on our Cell information, we get the output hth_t or the ht1h_t-1 for the next Cell.

  1. oto_t is caculated by the sigmoid layer
  2. output hth_t is computed by oto_t times Cell information pass through the tanh layer (-1 to 1)

LSTM Transformation

Gated Recurrent Unit: reset gate and update gate

Author: shixuan liu
Link: http://tedlsx.github.io/2019/10/18/lstm/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.
Donate
  • Wechat
  • Alipay

Comment