Similarit\`a per la ricerca del dominio di una frase
Massimiliano Morrelli, Giacomo Pansini, Massimiliano Polito, Arturo, Vitale

TL;DR
This paper investigates the most effective algorithms for determining whether a document belongs to a specific domain by comparing vector distance methods within a distributed computing environment using Apache Spark.
Contribution
It presents a comparative study of algorithms for sentence similarity calculation implemented in Apache Spark, building on prior research in big data and distributed text classification.
Findings
Identified the most accurate vector distance methods for domain verification.
Demonstrated the feasibility of distributed sentence similarity computation.
Provided performance benchmarks within the Spark environment.
Abstract
English. This document aims to study the best algorithms to verify the belonging of a specific document to a related domain by comparing different methods for calculating the distance between two vectors. This study has been made possible with the help of the structures made available by the Apache Spark framework. Starting from the study illustrated in the publication "New frontier of textual classification: Big data and distributed calculus" by Massimiliano Morrelli et al., We wanted to carry out a study on the possible implementation of a solution capable of calculating the Similarity of a sentence using the distributed environment. Italiano. Il presente documento persegue l'obiettivo di studiare gli algoritmi migliori per verificare l'appartenenza di un determinato documento a un relativo dominio tramite un confronto di diversi metodi per il calcolo della distanza fra due vettori.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Linguistic Studies and Language Acquisition · Authorship Attribution and Profiling
