Structure-semantics interplay in complex networks and its effects on the predictability of similarity in texts
Diego R. Amancio, Osvaldo N. Oliveira Jr., Luciano da F. Costa

TL;DR
This paper explores how complex network features, combining syntactic, semantic, and pragmatic aspects, can improve text classification tasks like machine translation identification, quality evaluation, and authorship recognition.
Contribution
It introduces a novel methodology integrating topological and semantic features of complex networks for enhanced text classification.
Findings
Topological features improve MT system identification.
Semantic features correlate highly with translation quality metrics.
Hybrid approaches outperform individual feature types in authorship recognition.
Abstract
There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic field, whereas in author recognition stylistic features should be considered. In this study, we introduce ways to classify texts employing concepts of complex networks, which may be able to capture syntactic, semantic and even pragmatic features. The interplay between the various metrics of the complex networks is analyzed with three applications, namely identification of machine translation (MT) systems, evaluation of quality of machine translated texts and authorship recognition. We shall show that topological features of the networks representing texts can enhance the ability to identify MT systems in particular cases. For evaluating the quality of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
