The Fine-Tuning Index / RLHF & Preference / #26

radixark/miles

by radixark · RLHF & Preference · updated today

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

68
momentum
1,550
stars
254
forks
#26
rank
View on GitHub →