The Fine-Tuning Index / RLHF & Preference / #161

bobxwu/learning-from-rewards-llm-papers

by bobxwu · RLHF & Preference · updated 1y ago

A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-inference stages.

momentum

stars

forks

#161

rank

guided-decodinglarge-language-modelsllmllmspost-trainingreinforcement-learningreward-learningreward-modelreward-modelingreward-modelsself-correctiontest-time-scaling

View on GitHub →

bobxwu/learning-from-rewards-llm-papers

More in RLHF & Preference