The Fine-Tuning Index / RLHF & Preference / #171

dobriban/Principles-of-AI-LLMs

by dobriban · RLHF & Preference · updated 12mo ago

Materials for the course Principles of AI: LLMs at UPenn (Stat 9911, Spring 2025). LLM architectures, training paradigms (pre- and post-training, alignment), test-time computation, reasoning, safety and robustness (jailbreaking, oversight, uncertainty), representations, interpretability (circuits), etc.

19
momentum
46
stars
4
forks
#171
rank
aiaisafetyalignmentcircuitseducationfine-tuninghallucinationinferenceinterpretabilityjailbreakingllmsrlhf
View on GitHub →