The Fine-Tuning Index / RLHF & Preference / #146
jackaduma/ChatGLM-LoRA-RLHF-PyTorch
by jackaduma · RLHF & Preference · updated 3y ago
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
24
momentum
138
stars
9
forks
#146
rank
chatglmchatglm-6bchatgptdeepspeedfinetunegptllamallmlorapeftppopytorch
View on GitHub →