The Fine-Tuning Index / RLHF & Preference / #137

Jerry-XDL/AIDoctor

by Jerry-XDL · RLHF & Preference · updated 1y ago

AIDoctor training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preferenc…

momentum

188

stars

forks

#137

rank

View on GitHub →

Jerry-XDL/AIDoctor

More in RLHF & Preference