FINE‑TUNING/INDEX

The Fine-Tuning Index / RLHF & Preference / #109

RLHFlow/Online-RLHF

by RLHFlow · RLHF & Preference · updated 1y ago

A recipe for online RLHF and online iterative DPO.

31

momentum

544

stars

48

forks

#109

rank

llama3llmrlhf

View on GitHub →