The Fine-Tuning Index / RLHF & Preference / #74
NJUxlj/Travel-Agent-based-on-Qwen2-RLHF
by NJUxlj · RLHF & Preference · updated 1mo ago
A travel agent based on Qwen2.5, fine-tuned by SFT + DPO/PPO/GRPO using traveling question-answer dataset, a mindmap can be output using the response. A RAG system is build upon the tuned qwen2, using Prompt-Template + Tool-Use + Chroma embedding database + LangChain
43
momentum
79
stars
6
forks
#74
rank
agentdpogrpolangchainlorappoqwen2ragrlhftool-use
View on GitHub →