Improve Long-term Memory Learning Through Rescaling the Error Temporally
Shida Wang, Zhanglu Yan

TL;DR
This paper investigates how error metrics influence long-term memory learning in sequence models, proposing a temporally rescaled error to reduce short-term bias and improve long-term retention, validated through experiments.
Contribution
It introduces a novel temporally rescaled error metric that mitigates short-term bias and enhances long-term memory learning in sequence models.
Findings
Temporally positive-weighted errors are biased towards short-term memory.
Rescaling errors reduces short-term bias and alleviates vanishing gradients.
Numerical experiments confirm the effectiveness of the proposed approach.
Abstract
This paper studies the error metric selection for long-term memory learning in sequence modelling. We examine the bias towards short-term memory in commonly used errors, including mean absolute/squared error. Our findings show that all temporally positive-weighted errors are biased towards short-term memory in learning linear functionals. To reduce this bias and improve long-term memory learning, we propose the use of a temporally rescaled error. In addition to reducing the bias towards short-term memory, this approach can also alleviate the vanishing gradient issue. We conduct numerical experiments on different long-memory tasks and sequence models to validate our claims. Numerical results confirm the importance of appropriate temporally rescaled error for effective long-term memory learning. To the best of our knowledge, this is the first work that quantitatively analyzes different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Control Systems and Identification · Neural Networks and Applications
