The Fine-Tuning Index / RLHF & Preference / #68

hscspring/rl-llm-nlp

by hscspring · RLHF & Preference · updated 1mo ago

Curated, opinionated index of post-R1 LLM × Reinforcement Learning. Many deep-dive blog posts cross-linked to many papers — GRPO, DAPO, DPO, PPO, RLHF, GSPO, CISPO, VAPO, Reward Modeling, MoE RL stability, Verifier-Free RL, Training-Free RL, Agentic RL, DeepSeek-R1 reproduction.

momentum

stars

forks

#68

rank

agentic-rlalignmentawesomeawesome-listcurated-listdeepseek-r1dpogrpollmllm-reasoningllm-trainingmoe

View on GitHub →

hscspring/rl-llm-nlp

More in RLHF & Preference