The Fine-Tuning Index / RLHF & Preference / #109
RLHFlow/Online-RLHF
by RLHFlow · RLHF & Preference · updated 1y ago
A recipe for online RLHF and online iterative DPO.
31
momentum
544
stars
48
forks
#109
rank
llama3llmrlhf
View on GitHub →