The Fine-Tuning Index / RLHF & Preference / #146

jackaduma/ChatGLM-LoRA-RLHF-PyTorch

by jackaduma · RLHF & Preference · updated 3y ago

A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM

momentum

138

stars

forks

#146

rank

chatglmchatglm-6bchatgptdeepspeedfinetunegptllamallmlorapeftppopytorch

View on GitHub →

jackaduma/ChatGLM-LoRA-RLHF-PyTorch

More in RLHF & Preference