hlfshell
DeepSeek V3 + GRM SPCT: Self-Improving AI Reward Models

I recently gave a talk on DeepSeek V3 training improvements and the fascinating ideas behind GRM and SPCT. The talk took awhile to get posted, so here it is! Be sure to checkout the blogpost as well for a bit more on GRM and SPCT.

#ai #deepseek