Cross-Language Learning for Entity Matching
Ralph Peeters, Christian Bizer

TL;DR
This paper investigates how supplementing limited non-English training data with larger English datasets can enhance Transformer-based entity matching, especially in low-resource language scenarios, by leveraging web-extracted English pairs.
Contribution
It demonstrates that adding English training pairs improves Transformer-based entity matching performance for non-English data, particularly in low-resource settings.
Findings
Adding English pairs consistently improves matching accuracy.
Performance gains are most significant in low-resource scenarios.
Web-extracted English data is a valuable resource for low-resource language matching.
Abstract
Transformer-based entity matching methods have significantly moved the state of the art for less-structured matching tasks such as matching product offers in e-commerce. In order to excel at these tasks, Transformer-based matching methods require a decent amount of training pairs. Providing enough training data can be challenging, especially if a matcher for non-English product descriptions should be learned. This poster explores along the use case of matching product offers from different e-shops to which extent it is possible to improve the performance of Transformer-based matchers by complementing a small set of training pairs in the target language, German in our case, with a larger set of English-language training pairs. Our experiments using different Transformers show that extending the German set with English pairs improves the matching performance in all cases. The impact of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
MethodsLinear Layer · Residual Connection · Softmax · Attention Is All You Need · Multi-Head Attention · Layer Normalization · Dense Connections · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam
