The Fine-Tuning Index / RLHF & Preference / #109

RLHFlow/Online-RLHF

by RLHFlow · RLHF & Preference · updated 1y ago

A recipe for online RLHF and online iterative DPO.

31
momentum
544
stars
48
forks
#109
rank
llama3llmrlhf
View on GitHub →