The Fine-Tuning Index / RLHF & Preference / #68

hscspring/rl-llm-nlp

by hscspring · RLHF & Preference · updated 1mo ago

Curated, opinionated index of post-R1 LLM × Reinforcement Learning. Many deep-dive blog posts cross-linked to many papers — GRPO, DAPO, DPO, PPO, RLHF, GSPO, CISPO, VAPO, Reward Modeling, MoE RL stability, Verifier-Free RL, Training-Free RL, Agentic RL, DeepSeek-R1 reproduction.

44
momentum
68
stars
5
forks
#68
rank
agentic-rlalignmentawesomeawesome-listcurated-listdeepseek-r1dpogrpollmllm-reasoningllm-trainingmoe
View on GitHub →