Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding
Fenglin Liu, Xuancheng Ren, Guangxiang Zhao, Chenyu You, Xuewei Ma,, Xian Wu, Xu Sun

TL;DR
This paper introduces layer-wise multi-view decoding for sequence-to-sequence models, enhancing information extraction from encoder layers, addressing hierarchy bypassing, and achieving state-of-the-art results across diverse NLP tasks.
Contribution
It proposes a novel multi-view decoding method that incorporates multiple encoder layers for each decoder layer, improving performance and addressing hierarchy bypassing in deep models.
Findings
Addresses hierarchy bypassing problem effectively.
Achieves state-of-the-art results on ten benchmark datasets.
Requires negligible additional parameters.
Abstract
In sequence-to-sequence learning, e.g., natural language generation, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, recent work has proposed to use representations from different encoder layers for diversified levels of information. Nonetheless, the decoder still obtains only a single view of the source sequences, which might lead to insufficient training of the encoder layer stack due to the hierarchy bypassing problem. In this work, we propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences. Systematic experiments and analyses show that we successfully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
