Loading paper
TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization | Tomesphere