The Fine-Tuning Index / RLHF & Preference / #48

zht8506/Easy-LLM-Post-Training

by zht8506 · RLHF & Preference · updated 2d ago

Implement popular LLM post-training algorithms (SFT, DFT, DPO, GRPO, etc.) in PyTorch with easy code!

momentum

117

stars

forks

#48

rank

More in RLHF & Preference