The Fine-Tuning Index / RLHF & Preference / #133

jackaduma/Vicuna-LoRA-RLHF-PyTorch

by jackaduma · RLHF & Preference · updated 2y ago

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna

momentum

221

stars

forks

#133

rank

chatgptfinetunegptllamallmlorapeftppopytorchreward-modelsrlhfvicuna

View on GitHub →

jackaduma/Vicuna-LoRA-RLHF-PyTorch

More in RLHF & Preference