Alloprof: a new French question-answer education dataset and its use in an information retrieval case study
Antoine Lefebvre-Brossard, Stephane Gazaille, Michel C. Desmarais

TL;DR
This paper introduces a comprehensive French question-answering dataset from an educational website, and demonstrates its use in an information retrieval case study using BERT models, highlighting challenges and future directions.
Contribution
It provides a new large-scale French educational dataset with diverse content and a baseline retrieval approach, enabling research in French question-answering and multimodal comprehension.
Findings
The dataset contains 29,349 questions from students across various subjects.
Fine-tuned BERT models achieved acceptable retrieval performance on the dataset.
The dataset's complexity requires advanced algorithms for reliable educational applications.
Abstract
Teachers and students are increasingly relying on online learning resources to supplement the ones provided in school. This increase in the breadth and depth of available resources is a great thing for students, but only provided they are able to find answers to their queries. Question-answering and information retrieval systems have benefited from public datasets to train and evaluate their algorithms, but most of these datasets have been in English text written by and for adults. We introduce a new public French question-answering dataset collected from Alloprof, a Quebec-based primary and high-school help website, containing 29 349 questions and their explanations in a variety of school subjects from 10 368 students, with more than half of the explanations containing links to other questions or some of the 2 596 reference pages on the website. We also present a case study of this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Adam · WordPiece · Softmax · Linear Layer · Residual Connection · Dropout · Weight Decay · Layer Normalization
