The Fine-Tuning Index / RLHF & Preference / #45

TYH-labs/unsloth-buddy

by TYH-labs · RLHF & Preference · updated 1mo ago

Zero-friction LLM fine-tuning skill for Claude Code, Gemini CLI & any ACP agent. Unsloth on NVIDIA · TRL+MPS/MLX on Apple Silicon. Automates env setup, LoRA training (SFT, DPO, GRPO, vision), post-hoc GRPO log diagnostics, evaluation, and export end-to-end. Part of the Gaslamp AI platform.

momentum

250

stars

forks

#45

rank

apple-siliconclaude-codedpofine-tuninggaslampgrpohuggingfaceloraqlorarlhfsfttransformer

View on GitHub →

TYH-labs/unsloth-buddy

More in RLHF & Preference