Bundle Fragments into a Whole: Mining More Complete Clusters via Submodular Selection of Interesting webpages for Web Topic Detection
Junbiao Pang, Anjing Hu, Qingming Huang

TL;DR
This paper introduces a submodular optimization-based bundling and refining method to assemble more complete hot topic clusters from fragmented web data, significantly improving accuracy over existing methods.
Contribution
It presents a novel bundling-refining approach that leverages submodular optimization to enhance web topic detection by assembling fragments into coherent hot topics.
Findings
Outperforms state-of-the-art by 20% in accuracy
Achieves 10% improvement in another key metric
Demonstrates scalability and effectiveness on public datasets
Abstract
Organizing interesting webpages into hot topics is one of key steps to understand the trends of multimodal web data. A state-of-the-art solution is firstly to organize webpages into a large volume of multi-granularity topic candidates; hot topics are further identified by estimating their interestingness. However, these topic candidates contain a large number of fragments of hot topics due to both the inefficient feature representations and the unsupervised topic generation. This paper proposes a bundling-refining approach to mine more complete hot topics from fragments. Concretely, the bundling step organizes the fragment topics into coarse topics; next, the refining step proposes a submodular-based method to refine coarse topics in a scalable approach. The propose unconventional method is simple, yet powerful by leveraging submodular optimization, our approach outperforms the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Complex Network Analysis Techniques · Advanced Text Analysis Techniques
