The Fine-Tuning Index / RLHF & Preference / #26

radixark/miles

by radixark · RLHF & Preference · updated today

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

momentum

1,550

stars

254

forks

#26

rank

More in RLHF & Preference