The Fine-Tuning Index / RLHF & Preference / #151

iBacklight/PipelineLLM

by iBacklight · RLHF & Preference · updated 4mo ago

PipelineLLM 是一个系统性的大语言模型(LLM)后训练学习项目,涵盖从监督微调(SFT)到偏好优化(DPO)、强化学习(RLHF/PPO/GRPO)再到持续学习(Continual Learning)的完整技术栈。

23
momentum
31
stars
3
forks
#151
rank
continual-learningfine-tuningllm-infrastructurellm-processingllm-reasoninglorapost-trainingpreference-optimizationreinforcement-learningrlhfsft
View on GitHub →