BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense
Baktash Ansari, Mohammadmostafa Rostamkhani, Sauleh Eetemadi

TL;DR
This paper presents BAMO's approach to SemEval-2024 Task 9, where models are tested on creatively challenging questions, using fine-tuning, chain of thought prompting, and consensus techniques to improve accuracy.
Contribution
We introduce a novel combination of fine-tuning, chain of thought prompting, and ReConcile consensus methods for tackling creatively challenging language tasks.
Findings
Achieved 85% accuracy on sentence puzzles subtask.
Demonstrated effectiveness of ReConcile for consensus in zero-shot learning.
Combined multiple models and prompting techniques for improved performance.
Abstract
This paper outlines our approach to SemEval 2024 Task 9, BRAINTEASER: A Novel Task Defying Common Sense. The task aims to evaluate the ability of language models to think creatively. The dataset comprises multi-choice questions that challenge models to think "outside of the box". We fine-tune 2 models, BERT and RoBERTa Large. Next, we employ a Chain of Thought (CoT) zero-shot prompting approach with 6 large language models, such as GPT-3.5, Mixtral, and Llama2. Finally, we utilize ReConcile, a technique that employs a "round table conference" approach with multiple agents for zero-shot learning, to generate consensus answers among 3 selected language models. Our best method achieves an overall accuracy of 85 percent on the sentence puzzles subtask.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
Methods15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · WordPiece · Linear Warmup With Linear Decay · Cosine Annealing · Softmax · RoBERTa · Layer Normalization · BERT · Weight Decay
