The Fine-Tuning Index / RLHF & Preference / #163

OpenMOSE/RWKV-LM-RLHF

by OpenMOSE · RLHF & Preference · updated 8mo ago

Reinforcement Learning Toolkit for RWKV.(v6,v7,ARWKV) Distillation,SFT,RLHF(DPO,ORPO), infinite context training, Aligning. Exploring the possibilities for deeper fine-tuning of RWKV.

momentum

stars

forks

#163

rank

View on GitHub →

OpenMOSE/RWKV-LM-RLHF

More in RLHF & Preference