The Fine-Tuning Index / RLHF & Preference / #137

Jerry-XDL/AIDoctor

by Jerry-XDL · RLHF & Preference · updated 1y ago

AIDoctor training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preferenc…

26
momentum
188
stars
16
forks
#137
rank
View on GitHub →