TL;DR
SocialX is a modular, source-agnostic platform designed to streamline multi-source big data research in Indonesia by integrating data collection, preprocessing, and analysis into a flexible pipeline.
Contribution
It introduces a modular architecture that separates concerns into independent layers, enabling easy extension and customization for diverse data sources and analysis methods.
Findings
Platform effectively integrates heterogeneous data sources.
Preprocessing addresses Indonesian language-specific challenges.
Demonstrates utility through a typical research workflow.
Abstract
Big data research in Indonesia is constrained by a fundamental fragmentation: relevant data is scattered across social media, news portals, e-commerce platforms, review sites, and academic databases, each with different formats, access methods, and noise characteristics. Researchers must independently build collection pipelines, clean heterogeneous data, and assemble separate analysis tools, a process that often overshadows the research itself. We present SocialX, a modular platform for multi-source big data research that integrates heterogeneous data collection, language-aware preprocessing, and pluggable analysis into a unified, source-agnostic pipeline. The platform separates concerns into three independent layers (collection, preprocessing, and analysis) connected by a lightweight job-coordination mechanism. This modularity allows each layer to grow independently: new data sources,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
