mltheory

Direct Preference Optimization - Your Language Model is Secretly a Reward Model

Direct Preference Optimization - Your Language Model is Secretly a Reward Model

1
0
Comments 0