ERASMO: Leveraging Large Language Models for Enhanced Clustering   Segmentation

Fillipe dos Santos Silva; Gabriel Kenzo Kakimoto; Julio Cesar dos Reis; and Marcelo S. Reis

arXiv:2410.03738·cs.CL·February 5, 2025

ERASMO: Leveraging Large Language Models for Enhanced Clustering Segmentation

Fillipe dos Santos Silva, Gabriel Kenzo Kakimoto, Julio Cesar dos Reis, and Marcelo S. Reis

PDF

Open Access 1 Repo

TL;DR

ERASMO is a novel framework that fine-tunes large language models on textual representations of tabular data to produce embeddings that improve clustering accuracy across diverse datasets.

Contribution

The paper introduces ERASMO, a method that transforms tabular data into text and fine-tunes language models to generate better embeddings for clustering tasks.

Findings

01

ERASMO improves clustering accuracy over baseline methods.

02

It effectively captures complex relationships in multimodal data.

03

Experimental results show enhanced embedding quality.

Abstract

Cluster analysis plays a crucial role in various domains and applications, such as customer segmentation in marketing. These contexts often involve multimodal data, including both tabular and textual datasets, making it challenging to represent hidden patterns for obtaining meaningful clusters. This study introduces ERASMO, a framework designed to fine-tune a pretrained language model on textually encoded tabular data and generate embeddings from the fine-tuned model. ERASMO employs a textual converter to transform tabular data into a textual format, enabling the language model to process and understand the data more effectively. Additionally, ERASMO produces contextually rich and structurally representative embeddings through techniques such as random feature sequence shuffling and number verbalization. Extensive experimental evaluations were conducted using multiple datasets and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fsant0s/ERASMO
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques