The Fine-Tuning Index / RLHF & Preference / #35

Sphere-AI-Lab/orbit

by Sphere-AI-Lab · RLHF & Preference · updated 14d ago

Stable and Efficient Reinforcement Learning for Trillion-Parameter LLMs

64
momentum
137
stars
8
forks
#35
rank
cudalow-precisionpeftreinforcement-learningtransformers
View on GitHub →