Probing Omissions and Distortions in Transformer-based RDF-to-Text Models
Juliette Faille, Albert Gatt, Claire Gardent

TL;DR
This paper investigates how transformer-based models like BART and T5 omit or distort information in RDF-to-Text generation by introducing novel probing methods to analyze encoder outputs, revealing the encoder's role in information loss.
Contribution
The paper introduces two new probing methods, including a parameter-free cosine similarity approach, to analyze omissions and distortions in transformer encoder outputs for RDF-to-Text tasks.
Findings
Both omitted and distorted entities can be detected in encoder embeddings.
Encoder signals are weaker for omitted or distorted entities.
Probing methods can identify mistakes in NLG model outputs.
Abstract
In Natural Language Generation (NLG), important information is sometimes omitted in the output text. To better understand and analyse how this type of mistake arises, we focus on RDF-to-Text generation and explore two methods of probing omissions in the encoder output of BART (Lewis et al, 2020) and of T5 (Raffel et al, 2019): (i) a novel parameter-free probing method based on the computation of cosine similarity between embeddings of RDF graphs and of RDF graphs in which we removed some entities and (ii) a parametric probe which performs binary classification on the encoder embeddings to detect omitted entities. We also extend our analysis to distorted entities, i.e. entities that are not fully correctly mentioned in the generated text (e.g. misspelling of entity, wrong units of measurement). We found that both omitted and distorted entities can be probed in the encoder's output…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques
MethodsAttention Is All You Need · Linear Layer · Adafactor · Gated Linear Unit · Adam · Refunds@Expedia|||How do I get a full refund from Expedia? · SentencePiece · Byte Pair Encoding · Softmax · Layer Normalization
