TL;DR
This paper introduces HiFACTMix, a new benchmark and graph-aware model for fact verification in Hinglish, addressing the challenge of low-resource, code-mixed political claims in India, and demonstrating improved accuracy and explainability.
Contribution
The paper presents the HiFACT dataset for Hinglish claims and a novel graph-aware, retrieval-augmented model for evidence-based fact verification in low-resource, code-mixed languages.
Findings
HiFACTMix outperforms existing multilingual models in accuracy.
The model provides faithful explanations for its verdicts.
The dataset includes 1,500 claims with evidence annotations.
Abstract
Fact-checking in code-mixed, low-resource languages such as Hinglish remains an underexplored challenge in natural language processing. Existing fact-verification systems largely focus on high-resource, monolingual settings and fail to generalize to real-world political discourse in linguistically diverse regions like India. Given the widespread use of Hinglish by public figures, particularly political figures, and the growing influence of social media on public opinion, there's a critical need for robust, multilingual and context-aware fact-checking tools. To address this gap a novel benchmark HiFACT dataset is introduced with 1,500 realworld factual claims made by 28 Indian state Chief Ministers in Hinglish, under a highly code-mixed low-resource setting. Each claim is annotated with textual evidence and veracity labels. To evaluate this benchmark, a novel graphaware,…
Peer Reviews
Decision·ICLR 2026 Conference Desk Rejected Submission
1. The primary strength of this paper is the introduction of the HiFACTMix dataset. Fact-checking in low-resource, code-mixed environments is a critical research gap, and this dataset, sourced from real-world political discourse, provides a much-needed resource for the community. 2. The paper addresses a highly relevant and impactful problem. Hinglish is a dominant language in online political and social discourse in India, making it a key vector for misinformation. Developing tools for this spe
1. The "Quantum-Enhanced RAG" is presented as a novel contribution but is poorly motivated and explained. The paper provides no specific details on the algorithm used, how it is "quantum-inspired," or why this approach is superior to established classical re-ranking methods (e.g., BM25, dense retrievers, or monoT5-based re-rankers). The ablation study merely shows that removing the component (i.e., using a simpler retrieval) hurts performance, but it provides no evidence that the "quantum" aspec
- The paper fills the gap in fact-checking for code-mixed low-resource languages like Hinglish. - The HiFACTMix framework not only outperforms strong multilingual and code-mixed baselines (such as CM-BERT and VerT5erini) in both veracity prediction and explanation quality but also remains competitive with advanced LLMs like GPT-4
At least 7 citation errors exist (e.g., Section 5.3, Reproducibility Checklist, Reproducibility Statement). Insufficient dataset size (1.5k samples) and lack of clear description of annotation standards/quality. No novelty in using widely-used RAG for empirical validation. The HiFACTMix framework’s integration of graph-based reasoning and quantum-enhanced retrieval introduces substantial computational overhead that requires careful consideration. Experiments are only conducted on self-constr
- This paper studies a unique problem, that is, fact-checking in code-mixed (Hinglish) political discourse. - This paper introduces a new annotated dataset (HiFACTMix) for Hinglish political claims, which could be useful.
- The presentation quality is poor. For example, the authors copied and pasted Figure 1 from (Guo et al., 2022). Please consider making original figures and using them in the paper. - Key methodological details are missing or vague: the architecture, training pipeline, evidence graph construction, and integration of components are described only at a high level, with missing references and incomplete sections. - Annotation protocol and inter-annotator agreement statistics are not described, rais
Previous automatic fact checking efforts do not specifically model code mixing, which is the core novelty of this paper.
- The dataset collection process is underspecified -- the paper states that "claims were collected from diverse sources", but the exact list of sources and how the claims were collected or selected is not specified. It is also not specified how the evidence documents are collected. - Similarly, the annotation process is underspecified -- the veracity labels and meta-data categories are listed, and it is stated that annotations are performed by multiple reviewers, but nothing related to annotator
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
