The Fine-Tuning Index / RLHF & Preference / #117
Joyce94/LLM-RLHF-Tuning
by Joyce94 · RLHF & Preference · updated 2y ago
LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)
30
momentum
451
stars
24
forks
#117
rank
fine-tuninglanguage-modelllamallmlorapeftpporeinforcement-learningrlhf
View on GitHub →