/
lemmy.ml
Explore
Create
Machine Learning - Theory | Research
manitcor
•
1 year ago
•
100%
Direct Preference Optimization - Your Language Model is Secretly a Reward Model
https://arxiv.org/pdf/2305.18290.pdf
1
0
Comments
0