The Fine-Tuning Index / Training Frameworks / #152

RLHFlow/Reinforce-Ada

by RLHFlow · Training Frameworks · updated 6mo ago

An adaptive sampling framework for Reinforce-style LLM post training.

22
momentum
96
stars
17
forks
#152
rank
View on GitHub →