The Fine-Tuning Index / RLHF & Preference / #171

dobriban/Principles-of-AI-LLMs

by dobriban · RLHF & Preference · updated 12mo ago

Materials for the course Principles of AI: LLMs at UPenn (Stat 9911, Spring 2025). LLM architectures, training paradigms (pre- and post-training, alignment), test-time computation, reasoning, safety and robustness (jailbreaking, oversight, uncertainty), representations, interpretability (circuits), etc.

momentum

stars

forks

#171

rank

aiaisafetyalignmentcircuitseducationfine-tuninghallucinationinferenceinterpretabilityjailbreakingllmsrlhf

View on GitHub →

dobriban/Principles-of-AI-LLMs

More in RLHF & Preference