The Fine-Tuning Index / RLHF & Preference / #63

wxhcore/bumblecore

by wxhcore · RLHF & Preference · updated 1mo ago

An LLM training framework built from the ground up, featuring a custom BumbleBee architecture and end-to-end support for multiple open-source models across Pretraining → SFT → RLHF/DPO.

momentum

stars

forks

#63

rank

aideepseekfine-tuninggenerative-aigptinstruction-tuningllmloranlppeftpretrainqwen

View on GitHub →

wxhcore/bumblecore

More in RLHF & Preference