Pedagogical Alignment of Large Language Models
Shashank Sonkar, Kangqi Ni, Sapana Chaudhary, Richard G. Baraniuk

TL;DR
This paper explores training large language models to better emulate effective teaching strategies by using learning from human preferences and synthetic data, resulting in improved pedagogical alignment and new evaluation metrics.
Contribution
It introduces a novel synthetic data generation approach for training LLMs with learning from human preferences, enhancing pedagogical alignment over standard fine-tuning methods.
Findings
LHP methods outperform SFT in pedagogical alignment accuracy by 13.1% and 8.7%.
Proposed perplexity-based metrics effectively measure pedagogical alignment.
Synthetic data generation reduces the need for manual annotation.
Abstract
Large Language Models (LLMs), when used in educational settings without pedagogical fine-tuning, often provide immediate answers rather than guiding students through the problem-solving process. This approach falls short of pedagogically best practices and limits their effectiveness as educational tools. We term the objective of training LLMs to emulate effective teaching strategies as `pedagogical alignment.' In this paper, we investigate Learning from Human Preferences (LHP) algorithms to achieve this alignment objective. A key challenge in this process is the scarcity of high-quality preference datasets to guide the alignment. To address this, we propose a novel approach for constructing a large-scale dataset using synthetic data generation techniques, eliminating the need for time-consuming and costly manual annotation. Leveraging this dataset, our experiments with Llama and Mistral…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗kangqi-ni/Mistral-7B-Instruct-v0.1_bio-tutor_dpomodel· 6 dl· ♡ 16 dl♡ 1
- 🤗kangqi-ni/Mistral-7B-Instruct-v0.2_bio-tutor_dpomodel· 5 dl5 dl
- 🤗kangqi-ni/zephyr-7b-beta_bio-tutor_dpomodel· 9 dl9 dl
- 🤗kangqi-ni/Mistral-7B-Instruct-v0.2_bio-tutor_sftmodel· 7 dl7 dl
- 🤗kangqi-ni/zephyr-7b-beta_bio-tutor_sftmodel· 5 dl5 dl
- 🤗kangqi-ni/Llama-3.1-8B-Instruct_bio-tutor_dpomodel· 3 dl3 dl
- 🤗kangqi-ni/Llama-3.1-8B-Instruct_bio-tutor_sftmodel· 7 dl· ♡ 17 dl♡ 1
- 🤗kangqi-ni/Llama-3.1-8b-Instruct_bio-tutor_ktomodel· 4 dl4 dl
- 🤗kangqi-ni/Mistral-7B-Instruct-v0.2_bio-tutor_ktomodel· 1 dl1 dl
- 🤗kangqi-ni/zephyr-7b-beta_bio-tutor_ktomodel· 1 dl1 dl
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsLLaMA · Shrink and Fine-Tune
