The Fine-Tuning Index / Training Frameworks / #152
RLHFlow/Reinforce-Ada
by RLHFlow · Training Frameworks · updated 6mo ago
An adaptive sampling framework for Reinforce-style LLM post training.
22
momentum
96
stars
17
forks
#152
rank
An adaptive sampling framework for Reinforce-style LLM post training.