The Fine-Tuning Index / RLHF & Preference / #161

bobxwu/learning-from-rewards-llm-papers

by bobxwu · RLHF & Preference · updated 1y ago

A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-inference stages.

21
momentum
71
stars
3
forks
#161
rank
guided-decodinglarge-language-modelsllmllmspost-trainingreinforcement-learningreward-learningreward-modelreward-modelingreward-modelsself-correctiontest-time-scaling
View on GitHub →