# CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection   Methods for Semantic Textual Similarity

**Authors:** Jeremy Ferrero, Frederic Agnes, Laurent Besacier, Didier Schwab

arXiv: 1704.01346 · 2017-04-06

## TL;DR

This paper describes CompiLIG's systems for cross-language semantic textual similarity, combining multiple methods to accurately estimate sentence similarity between Spanish and English, achieving top performance at SemEval-2017.

## Contribution

The paper introduces a hybrid approach combining syntax, dictionary, context, and machine translation methods for cross-language semantic similarity detection.

## Key findings

- Achieved 83.02% correlation with human annotations
- Ranked 1st in SemEval-2017 Track 4a
- Demonstrated effectiveness of combined unsupervised and supervised methods

## Abstract

We present our submitted systems for Semantic Textual Similarity (STS) Track 4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must estimate their semantic similarity by a score between 0 and 5. In our submission, we use syntax-based, dictionary-based, context-based, and MT-based methods. We also combine these methods in unsupervised and supervised way. Our best run ranked 1st on track 4a with a correlation of 83.02% with human annotations.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.01346/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1704.01346/full.md

---
Source: https://tomesphere.com/paper/1704.01346