hlfshell
Maker. Roboticist. Person.
Keith Chester

LATEST ARTICLE:

GRPO in DeepSeek-R1

GRPO in DeepSeek-R1

Lately I'm thinking about...

Cursor + Other AI Tools

I’ve long since embraced AI integrations into my AI, and I’ve experimented with several of the new app builders. I’m going to avoid commenting on the “vibe” coding meme that’s going on, I will say I’ve enjoyed some of the efficiencies gained. A few months ago per the suggestion of several peers I swapped from VS Code w/ Copilot to Cursor. It took some adjustment of workflow to incorporate Cursor into a viable tool worthy of its subscription cost, but once certain hurdles were overcome I found it certainly accelerated me. The release of Claude 3.7-sonnet came with a promise of drastically increased performance in code, which I found to be… not true. In fact, I noticed a significant decrease in performance…

…until I finally upgraded Cursor to its current version. The new agentic approach powered by Claude 3.7-sonnet has been significantly better, outperforming my expectations by far. It is far better with large contexts, tying together multiple systems, and behaving in predictable and desirable manners.

I’ve also experimented with Bolt, v0, lovable, and Replit. While I’ve snagged a few front end pieces from each, I’m yet to fully embrace the full “one-shot” builders. I’m hesitant to necessarily say they’re not ready though, as my experiments with them have admittingly been small experiments, not a full fledged dive allowing my workflow to warp to the tool versus the alternative.

There seems there are even more AI tools I’m yet to experiment with, each with a workflow from what I’m used to. I hope to sometime in the near future dive further into these. To that end I will be attempting a pause on my Cursor subscription this month; at that point I will embrace Windsurf for a small while, see if I enjoy an increased velocity or novel capabilities. I am also very much looking forward to the Replit sponsored SDx hackathon later this month; I want to see what in my workflow approach I am doing wrong to fully enjoy the tool.

#AI Tools #Cursor
DeepSeek + Inference-Time Scaling and Generalist Reward Modeling

DeepSeek released another cool paper expanding on reinforcement learning for LLM alignment. Building off of their prior work (which I talk about here), they introduce two new methods.

The first - Rejective Fine-Tuning (RFT) - They have a pre-trained model produce N responses. Then the collected responses are combined in a prompt wherein the model is instructed to produce principles for evaluating the responses, critiques of each response, and a reward score for each based on the generated principles. The process utilizes a human judge to critique its critiques, teaching the model to produce these evaluations eventually without human feedback.

The goal of this is to move the reward function from a simple scalar value to a language derived scalar value. If the model is being trained on domains that don’t translate well to automatic correctness, this improves performance. This is why reasoning models are generally focused to this point on mathematics, coding, and similar domains - there’s easy and clear ground truth results to evaluate against. However, in more subjective domains (like ethics, open-ended questions, etc.), a scalar value is not easily derived without human intervention.

Once the model is proficient at generating these evaluations, the second method - Self-Principled Critique Tuning (SPCT) - is introduced. This method uses reinforcement learning to adaptively improve the model’s ability to generate critiques and principles without human intervention. A sort of feedback loop now begins, wherein the model generates its own evaluation criteria, critiques, and responses, assigns scores, and then receives rewards based on how well it ranks the responses against known human preferences. GRPO is still used for policy optimization, but the model is learning without direct human ratings or real time human raters.

Finally, they introduce Inference-Time Sampling and Voting to further enhance the robustness of the reward model during inference by sampling multiple outputs and aggregating them through a voting mechanism. Essentially - have a language model generate multiple responses, then have the trained reward model judge each response. Finally, choose the highest rated one at inference time to improve performance. It feels like a great improvement to self-consistency mechanisms.

#AI #deepseek
Interview Practice App

One of the great parts of building out tools like arkaine is that it allows me to sit down and just build for a bit to test the framework. After some quick experimentation I have a great agent that:

  1. Takes in a resume and a job description

  2. Considers additional topics to research to build out a knowledge base of what certain acroynms, skills, and technologies mean, and what are industry standards and expectations are

  3. Searches the web for those topics with arkaine’s research agents, building a knowledge base of relevant information

  4. Builds a list of questions that an interview should ask and…

  5. Converts the questions to natural sounding language and uses TTS to create a virtual interviewer. (This part is… it works. But sometimes it’s too heavy on the pausing and filler words)

So far it’s working surprisingly well for a simple prototype script. I’ll probably expand on it quite a bit, especially since I haven’t utilized speech-to-text to allow the user to answer (and an agent to judge their response) yet.

Here are some examples (the TTS is handled by OpenAI’s gpt-4o-mini-tts):

#AI #arkaine
arkaine 0.0.21; next steps

Version 0.0.21 of arkaine is out, including the finalized format for the text to speech tooling, equivalent speech to text tooling (though, admittingly, I currently lack a locally hosted option for this), and the think tool I mentioned earlier.

There’s still a lot of features that I want to add, and some I’m in the middle of; adding a chat interface to Spellbook and expanding the number of possible chat interfaces would be fun, and I already started the process a month ago. Similarly I half a ~60% completed OCR implementation and a redefining of how researchers utilize resources (necessary for handling non website object search, like your own books/documents)… but right now I’m thinking of taking a moment to just building with what I have and creating something useful for people as is.

#AI #arkaine
Just give me a second to think...

I simply love when simple ideas get tested and proven to be quite effective. It’s a clear sign of slowly feeling out how to best understand the system at hand. Such a delight popped up when I saw that Anthropic had revealed that simply adding a no-op tool with a “thought” argument called “think”, allowing the agent to just output its thought in the chain of Action -> Result generation, improved performance on complicated tasks.

…of course, I also have already implemented it in arkaine; I’ll give it a more thorough testing with some more complicated agents later.

#AI #arkaine
arkaine docs

My framework arkaine, which I quickly presented a bit ago, finally has some nice documentation for it. I had v0 do an initial pass on it, which I rather liked. After two quick rounds of prompting on their free tier I downloaded the project and tried my hand at expanding it from there. It’s my first tailwind/next.js project, but it was surprisingly easy. Granted it’s a simple page relative to a typical SPA or backend service, but hey, I’ll take the wins where I can get them.

Check out the documentation, especially the toolbox, and see if you can get inspired to build anything cool with arkaine.

#arkaine #AI
BitNet b1.58 Reloaded

The paper club I host will be covering BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks! I’ve been looking forward to this one since I read the original paper on 1.58 bit nets. Join us and learn about the future of trinary and LLMs!

#AI
Mathematica
Attached image

I picked up David Bessis' Mathematica on a whim. It focused on a discussion of what math truly was to one accomplished in it, and how the public’s understanding of what mathematicians do is wildly, grossly inaccurate. The book’s premise: language is a poor medium for transmission of intuition itself, whereas math and logical proofs are overkill but required to express it. Mathematics, he argues, is the art of intuition and imagination, not calculation, trying its best damndest to define it. But, since intuition is so beyond language, we have to invent new concepts and symbols in which to grasp it and communicate it.

I enjoyed it, and have certainly spent time thinking on its lessons, but wished it delved more into direct hands-on walkthroughs of intuition in which to further illustrate the separation of language and logic, or perhaps more solid advice on directly attacking the problem of intuition growth.

#books #math

GRPO in DeepSeek-R1

GRPO in DeepSeek-R1
Liquid Time Constant Neural Networks
Attached image

Last night Adam gave a great presentation at the SDx paper club. The idea of using ODE solvers as an activation function was 🤯. It’s heavily used in robotics, so I’ll likely be doing a deep dive at some point; specifically building a neuron that uses the paper’s techniques to better understand the inner workings.

#AI #math
DeepSeek R1

A few weeks ago I gave a talk at an SDx paper club covering the DeepSeek R1 Paper. I talked in depth about the advancements made and the implications of their success with GRPO (group relative policy optimization) powered reinforcement learning.

The recording at the event borked so I re-recorded it the next day. Enjoy!

#AI
eli5-equations
Attached image

I’ve been working on arkaine’s OCr service all weekend, and need a break. I’ve been toying with the idea of an equation explainer that copies the style I present complicated math in my paper club presentations. I’ve decided to step away from arkaine and try using it a bit in a prototype. Hence: eli5-equations.

Want to get a walk through of a complicated equation? Pass it in with some context and see if your evening is a bit enlightened. I’ll do a further write up on this later probably.

#arkaine #AI #math
Mini hack-a-thon

Today I attended a mini-hackathon via SDx. I attended to solo work on some arkaine agents and to be present as a mentor/advisory role for other attendees. It was a short 6 hour affair, mainly focused on playing with the new OpenAI o3-mini. It also helps to be inspired by seeing other people creatively applying AI to a quick weekend project.

I ended up building a great prototype of a research agent - the original goal of arkaine for myself. It needs some work - I definitely ran into rate limiting issues and need to get the agent to better understand report generation at the end. Expect this to get added in to arkaine soon. Pushing myself to finish the project in the time allotted was also a great exercise in rapid prototyping. As for the other projects - there were quite a few that wowed me. I’m certainly looking forward to the next time I can dive in and code surrounded by other makers.

#arkaine #AI #SDx
Increased creativity by thinking longer
Attached image

Here’s an ingenious set of hacks to cheaply modify the behavior of existing LLMs to reason better. Most notably was the detecting the initial use of the </think> tag and instead replacing it with a second-guessing term (best performing was “Wait”). This forced the model to think longer, which in turn improved performance on tasks significantly.

I’ll likely be doing a deeper dive for my upcoming paper club presentation.

#AI
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

We’re kicking off 2025’s paper club series via SDx again on February 18th @ 6:30 pm. I’ll be presenting DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. Join in if you’re in the area and want to deep dive some of the recent cutting edge discoveries.

#AI #Meetup
I'm afraid I can't do that, Dave...

I found myself looking into the effects of censorship removal from LLMs - particularly the recent popular kid on the block Deepseek R-1. It seems that the model becomes uncooperative against certain topics that don’t align with party doctrine. I came a cross a generic refusals removal repository linked here which made me chuckle - it’s just control vectors fine tuned into the model, which I discussed here.

#AI
(Rapidly) introducing arkaine

I recently gave (an unfortunately rushed) talk about arkaine - a maker-focused agentic AI framework I’ve been spending most of my time building. Slides for the talk are here

#AI

Diffusion Models Are Real-Time Game Engines

Diffusion Models Are Real-Time Game Engines

Google DeepMind's Grandmaster-Level Chess Without Search

Google DeepMind's Grandmaster-Level Chess Without Search

Representation Engineering and Control Vectors - Neuroscience for LLMs

Representation Engineering and Control Vectors - Neuroscience for LLMs

Nerd Sniped - Solving for Jumbles and Letter Boxed

Nerd Sniped - Solving for Jumbles and Letter Boxed

Utilizing LLMs as a Task Planning Agent for Robotics

Utilizing LLMs as a Task Planning Agent for Robotics

A Corollary to Conway's Law - Build for The Team You Have

A Corollary to Conway's Law - Build for The Team You Have

Repeatable Dev Environments for ROS2

Repeatable Dev Environments for ROS2