The Fine-Tuning Index / RLHF & Preference / #95

tanzelin430/The-Scaling-Law-for-Reinforcement-Learning

by tanzelin430 · RLHF & Preference · updated 1mo ago

[ACL2026]Code Repo for paper "Scaling Behaviors of LLM Reinforcement Learning Post-Training"

38
momentum
22
stars
5
forks
#95
rank
View on GitHub →