For the past few months I’ve been working for the ARC Prize - a non profit organization built around the idea of an Abstract and Reasoning Corpus - a set of reasoning games that are easy for humans to quickly and efficiently figure out, play, and solve - but near impossible for even the most cutting edge models. [Paper]
…and today we officially launched the beta of our toolkit and three games for the public to try out!
For ARC-AGI-3, we’ve prepped over 150 games to test AI. I’ve been building out tooling (most notably the benchmarking agent tools) to make it dead simple to test and research models and various agentic architectures.
Using this tooling (and I have more to release soon!) I’ve also been researching how models reason about these games. I’m trying to develop new architectures and techniques to maximize performance of the models and zero in on how these models reason internally and understand uniquely abstract state spaces. This is acutally quite difficult; these games are deceptively simple - outright easy for humans - but even so-called-superhuman LLMs and AI products can’t solve a single one of them.
Check it out and feel free to reach out to me to chat about it.
#ARC #AI #agents