DoubleCCA: Improving Foundation Model Group Robustness with Random   Sentence Embeddings

Hong Liu; Yitong Lu

arXiv:2411.16236·cs.CL·November 26, 2024

DoubleCCA: Improving Foundation Model Group Robustness with Random Sentence Embeddings

Hong Liu, Yitong Lu

PDF

Open Access

TL;DR

DoubleCCA is a new technique that enhances foundation model robustness against group biases by augmenting prompts with random sentences and using CCA to align and reconstruct embeddings, improving performance and robustness.

Contribution

We introduce DoubleCCA, a simple method combining random sentence augmentation and CCA to improve foundation model robustness to group biases.

Findings

01

Outperforms existing methods in robustness and performance.

02

Easily integrable into current foundation models.

03

Effective across various tasks and datasets.

Abstract

This paper presents a novel method to improve the robustness of foundation models to group-based biases. We propose a simple yet effective method, called DoubleCCA, that leverages random sentences and Canonical Correlation Analysis (CCA) to enrich the text embeddings of the foundation model. First, we generate various random sentences that augment the original prompts, which extends the original prompts with random words or character sequences. Second, we use an additional sentence embedding model to generate different text embeddings with respect to these random sentences. We then use CCA double twice to align the representations and reconstruct them back to the original representation space. We demonstrate the effectiveness of our method on a variety of tasks and datasets, showing that it outperforms existing methods in terms of both performance and robustness. Our method is simple to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsALIGN