Graph-Convolutional Networks: Named Entity Recognition and Large Language Model Embedding in Document Clustering
Imed Keraghel, Mohamed Nadif

TL;DR
This paper introduces a graph-based document clustering method that combines Named Entity Recognition and Large Language Model embeddings using graph convolutional networks to improve clustering accuracy for documents with many named entities.
Contribution
It presents a novel integration of NER and LLM embeddings within a GCN framework for enhanced document clustering, addressing limitations of previous co-occurrence methods.
Findings
Outperforms traditional co-occurrence clustering methods
Effective for documents rich in named entities
Improves semantic grouping of related documents
Abstract
Recent advances in machine learning, particularly Large Language Models (LLMs) such as BERT and GPT, provide rich contextual embeddings that improve text representation. However, current document clustering approaches often ignore the deeper relationships between named entities (NEs) and the potential of LLM embeddings. This paper proposes a novel approach that integrates Named Entity Recognition (NER) and LLM embeddings within a graph-based framework for document clustering. The method builds a graph with nodes representing documents and edges weighted by named entity similarity, optimized using a graph-convolutional network (GCN). This ensures a more effective grouping of semantically related documents. Experimental results indicate that our approach outperforms conventional co-occurrence-based methods in clustering, notably for documents rich in named entities.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Attention Is All You Need · Dense Connections · Byte Pair Encoding · Dropout · Multi-Head Attention
