← AI Notes

Group Relative Policy Optimization (GRPO)

31 Oct 2024

🚧 Work in progress…

This article will cover Group Relative Policy Optimization (GRPO), a recent advancement in policy optimization methods for training language models with human preferences.

Topics to cover:

  • Motivation: improving upon PPO and DPO
  • GRPO algorithm and key innovations
  • Group-based relative preference modeling
  • Empirical results and comparisons
← AI Notes