The Fine-Tuning Index / RLHF & Preference / #35
Sphere-AI-Lab/orbit
by Sphere-AI-Lab · RLHF & Preference · updated 14d ago
Stable and Efficient Reinforcement Learning for Trillion-Parameter LLMs
64
momentum
137
stars
8
forks
#35
rank
cudalow-precisionpeftreinforcement-learningtransformers
View on GitHub →