# FLACON: An Information-Theoretic Approach to Flag-Aware Contextual Clustering for Large-Scale Document Organization

**Authors:** Sungwook Yoon

PMC · DOI: 10.3390/e27111133 · 2025-10-31

## TL;DR

FLACON is a new clustering method that organizes enterprise documents by combining content similarity with contextual flags like priority and workflow status.

## Contribution

FLACON introduces a novel information-theoretic clustering approach using a six-dimensional flag system for context-aware document organization.

## Key findings

- FLACON achieves a 7.8-fold improvement in clustering quality compared to traditional methods.
- It performs at 89% of GPT-4's quality but is 7× faster for 10,000 documents.
- The method supports deterministic behavior and O(m log n) complexity for incremental updates.

## Abstract

Enterprise document management faces a significant challenge: traditional clustering methods focus solely on content similarity while ignoring organizational context, such as priority, workflow status, and temporal relevance. This paper introduces FLACON (Flag-Aware Context-sensitive Clustering), an information-theoretic approach that captures multi-dimensional document context through a six-dimensional flag system encompassing Type, Domain, Priority, Status, Relationship, and Temporal dimensions. FLACON formalizes document clustering as an entropy minimization problem, where the objective is to group documents with similar contextual characteristics. The approach combines a composite distance function—integrating semantic content, contextual flags, and temporal factors—with adaptive hierarchical clustering and efficient incremental updates. This design addresses key limitations of existing solutions, including context-aware systems that lack domain-specific intelligence and LLM-based methods that require prohibitive computational resources. Evaluation across nine dataset variations demonstrates notable improvements over traditional methods, including a 7.8-fold improvement in clustering quality (Silhouette Score: 0.311 vs. 0.040) and performance comparable to GPT-4 (89% of quality) while being ~7× faster (60 s vs. 420 s for 10 K documents). FLACON achieves O(m log n) complexity for incremental updates affecting m documents and provides deterministic behavior, which is suitable for compliance requirements. Consistent performance across business emails, technical discussions, and financial news confirms the practical viability of this approach for large-scale enterprise document organization.

## Full-text entities

- **Diseases:** injury to (MESH:D014947)
- **Chemicals:** - 3l (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12650872/full.md

---
Source: https://tomesphere.com/paper/PMC12650872