DeepSeek V3 + GRM SPCT: Self-Improving AI Reward Models

DeepSeek V3 + GRM SPCT: Self-Improving AI Reward Models @ May 16, 2025 at 11:00 AM

I recently gave a talk on DeepSeek V3 training improvements and the fascinating ideas behind GRM and SPCT. The talk took awhile to get posted, so here it is! Be sure to checkout the blogpost as well for a bit more on GRM and SPCT.

#ai #deepseek

🔗