The Fine-Tuning Index / RLHF & Preference / #110

Gen-Verse/ReasonFlux

by Gen-Verse · RLHF & Preference · updated 8mo ago

[NeurIPS 2025 Spotlight] LLM post-training suite — featuring ReasonFlux, ReasonFlux-PRM, and ReasonFlux-Coder.

31
momentum
538
stars
37
forks
#110
rank
chain-of-thoughtclawdbot-skillcode-generationdeepseek-r1gemini-prollm-rlhfo3-minipost-trainingprocess-reward-modelreinforcement-learningsft-data
View on GitHub →