The Fine-Tuning Index / RLHF & Preference / #26
radixark/miles
by radixark · RLHF & Preference · updated today
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
68
momentum
1,550
stars
254
forks
#26
rank