A General Framework for Producing Interpretable Semantic Text Embeddings

Yiqun Sun; Qiang Huang; Yixuan Tang; Anthony K. H. Tung; Jun Yu

arXiv:2410.03435·cs.CL·October 7, 2024·2 cites

A General Framework for Producing Interpretable Semantic Text Embeddings

Yiqun Sun, Qiang Huang, Yixuan Tang, Anthony K. H. Tung, Jun Yu

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces CQG-MBQA, a versatile framework that generates interpretable semantic text embeddings using discriminative yes/no questions, achieving high quality and interpretability across diverse NLP tasks.

Contribution

The paper presents a novel, general framework combining contrastive question generation and multi-task binary question answering to produce interpretable embeddings without relying on expert-crafted questions.

Findings

01

Achieves embedding quality comparable to black-box models.

02

Outperforms existing interpretable embedding methods.

03

Demonstrates effectiveness across multiple NLP tasks.

Abstract

Semantic text embedding is essential to many tasks in Natural Language Processing (NLP). While black-box models are capable of generating high-quality embeddings, their lack of interpretability limits their use in tasks that demand transparency. Recent approaches have improved interpretability by leveraging domain-expert-crafted or LLM-generated questions, but these methods rely heavily on expert input or well-prompt design, which restricts their generalizability and ability to generate discriminative questions across a wide range of tasks. To address these challenges, we introduce \algo{CQG-MBQA} (Contrastive Question Generation - Multi-task Binary Question Answering), a general framework for producing interpretable semantic text embeddings across diverse tasks. Our framework systematically generates highly discriminative, low cognitive load yes/no questions through the \algo{CQG}…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 3

Strengths

1) The question generation component of the framework concerns generating questions that are both discriminative and general. It groups similar texts by clustering for the generation of questions, such that nuanced questions can be asked for each group, as opposed to simple questions in the baseline method. The concept is analogous to leveraging hard negatives in regular training of embedding models. 2) The authors show good understanding at related work; the implementation and the evaluation ar

Weaknesses

1) Performance ablations about setups in the framework can be very interesting although currently missing (e.g., performance across different dimensionality, question difficulties, different encoding models, etc..). 2) Implementation details can be moved more to the main paper as they are mostly in appendices.

Reviewer 02Rating 8Confidence 4

Strengths

1. CQG-MBQA focus on producing interpretable embeddings, which is important for domains requiring transparency. 2. Compared with QAEmb, CQG produces more discriminative questions. 3. By integrating MBQA, the framework achieves cost-effective embeddings compared to LLM-based alternatives. 4. This paper conducts extensive experiments on semantic textual similarity, retrieval, and clustering tasks, showcasing its utility and competitiveness.

Weaknesses

Please refer to the Questions.

Reviewer 03Rating 6Confidence 3

Strengths

The ideas behind CQG and MBQA are novel and effective, supported by thoughtful experiments and ablation studies. The paper is clearly written and well-structured. Given the increasing demand for model transparency, CQG-MBQA could have significant implications and represent a meaningful approach that would be of interest to the ICLR audience. - The paper builds upon QAEmb with important innovations: a contrastive approach to question generation that improves discrimination using positive, hard n

Weaknesses

- In retrieval tasks, there is a significant performance gap compared to black-box models, and the performance is also lower than BM25. Therefore, additional performance comparisons are needed when applying them to various downstream tasks such as sentiment classification and retrieval. - Lack of ablation studies to assess the efficacy of the proposed approach - lack of comparison between different models in Figure 4 and 5, and lack of comparison between the MBQA method and directly using the

Code & Models

Repositories

dukesun99/CQG-MBQA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques