GateNLP-UShef at SemEval-2022 Task 8: Entity-Enriched Siamese   Transformer for Multilingual News Article Similarity

Iknoor Singh; Yue Li; Melissa Thong; Carolina Scarton

arXiv:2205.15812·cs.CL·June 30, 2022·1 cites

GateNLP-UShef at SemEval-2022 Task 8: Entity-Enriched Siamese Transformer for Multilingual News Article Similarity

Iknoor Singh, Yue Li, Melissa Thong, Carolina Scarton

PDF

Open Access 1 Repo

TL;DR

This paper presents a multilingual news article similarity system using an entity-enriched Siamese Transformer that captures narrative, entities, location, and time to assess how different outlets report the same events.

Contribution

It introduces an entity-enriched Siamese Transformer architecture that combines narrative and auxiliary features for improved news article similarity detection.

Findings

01

Achieved second place in SemEval-2022 Task 8 leaderboard.

02

Demonstrated effectiveness of combining narrative and entity features.

03

Validated the approach through detailed ablation studies.

Abstract

This paper describes the second-placed system on the leaderboard of SemEval-2022 Task 8: Multilingual News Article Similarity. We propose an entity-enriched Siamese Transformer which computes news article similarity based on different sub-dimensions, such as the shared narrative, entities, location and time of the event discussed in the news article. Our system exploits a Siamese network architecture using a Transformer encoder to learn document-level representations for the purpose of capturing the narrative together with the auxiliary entity-based features extracted from the news articles. The intuition behind using all these features together is to capture the similarity between news articles at different granularity levels and to assess the extent to which different news outlets write about "the same events". Our experimental results and detailed ablation study demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iknoorjobs/semeval-code
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Softmax · Dense Connections · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Multi-Head Attention · Absolute Position Encodings · Dropout