Wolfies at SemEval-2022 Task 8: Feature extraction pipeline with   transformers for Multi-lingual news article similarity

Nikhil Goel; Ranjith Reddy

arXiv:2208.09715·cs.CL·February 7, 2023

Wolfies at SemEval-2022 Task 8: Feature extraction pipeline with transformers for Multi-lingual news article similarity

Nikhil Goel, Ranjith Reddy

PDF

TL;DR

This paper presents a multi-lingual news article similarity system using transformer-based feature extraction and neural networks, achieving significant improvements over baseline cosine similarity metrics.

Contribution

It introduces a multi-lingual feature extraction pipeline with transformers and neural networks for news article similarity, enhancing baseline results.

Findings

01

Significant improvement over baseline cosine similarity

02

Effective multi-lingual feature extraction pipeline

03

Neural network enhances similarity prediction accuracy

Abstract

This work is about finding the similarity between a pair of news articles. There are seven different objective similarity metrics provided in the dataset for each pair and the news articles are in multiple different languages. On top of the pre-trained embedding model, we calculated cosine similarity for baseline results and feed-forward neural network was then trained on top of it to improve the results. We also built separate pipelines for each similarity metric for feature extraction. We could see significant improvement from baseline results using feature extraction and feed-forward neural network.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.