Long-Short Term Memory (LSTM) is an enhancement used to improve the performance of a Recurrent Neural Network (RNN) by allowing it to retain important information learned from earlier in the sequence at a later step. If a network is being trained on a large sequence, such as a paragraph of text, it might forget information from the first sentence that might be important to use in being able to generate text for the remainder of the paragraph.
The workings of an LSTM cell involve rigorous mathematics, but basically, an additional parameter called the cell state is introduced at each step of the sequence. At a given step, in addition to the input of the current step and the activation of the output from the previous step, the previous cell state is also passed in. Through a series of “gates”, the LSTM cell determines what information to retain at the next step as opposed to forgetting at that point of the sequence. Once it determines what to discard and what to carry through the sequence, it updates the cell state and passes it to the next step, continuing the sequential process.