hlfshell
Maker. Roboticist. Person.
Keith Chester

LATEST ARTICLE:

GRPO in DeepSeek-R1

GRPO in DeepSeek-R1

Lately I'm thinking about...

go-arkaine-parser

Back when I was working on coppermind at the heyday of GPT3.5’s initial world shattering release, I had… difficulty finding good ways to deal with parsing the stochastic LLM outputs.

I got better at this when I started work on arkaine, eventually developing a pretty useful and reliable parsing pattern.

With an idea that would be best served as a golang app requiring interacting with LLMs I decided to do a quick port of the parser to an idiomatic golang module.

So if you need AI parsing for your golang project, check out go-arkaine-parser.

#arkaine #golang #AI
hiredcoach

I have been busy due to a flood of inspiration to work on projects. Pleasant, but also exhausting.

The only updates to arkaine has been to fix an issue with default arguments in the ParallelList flow tool. The reason for the freeze in development is an accidental startup.

How does one go “oops, I made a startup?” Basically, I decided to quickly develop a series of simple apps that utilized arkaine and would demonstrate its usage. Then combine them into a large post walking through the examples.

The very first one you saw a prototype for in an earlier post wherein I made a interview practice question generator. This was a simple script, but the results struck me as promising. Why not build a nice site around it so that other people can use it? And so started hiredcoach

I really liked what I started to see, and realized that this could actually be useful to people if offered as a service. So that’s what I’m aiming to do - try and build out a startup from it as quickly as possible. I’m at ~12 days of development effort. I had hoped to get it out in 2 weeks - 3 or 4 still seems a possibility.

I’ll post here when I launch and my initial thoughts. I temper my anxiety of startup launching with thoughts that the worst case is a nice line item in my resume upon failure.

#hiredcoach #startup #AI #arkaine
reflecta

Given my recent weekend foray into Replit, I wanted to try a simple solo project to “kick the wheels” and test drive Replit a bit more.

It took about 2 hours from idea conception to deployed product with integrated DeepSeek AI - not too bad!

reflecta is a simple journal prompting app (trying to avoid overly cheesey prompts at that) that allows you to specify categories to focus on, and provides the prompt. The UI is designed to be as simple and calming as possible. As an added feature, there’s an integrated chat feature with DeepSeek’s R1 to reword, retheme, or discuss the prompt.

It’s always refreshing to quickly pump out small projects like this. Hopefully some people find it helpful!

I’ll be playing with Replit on a few other “mini-app” ideas to see if any of them get into a releaseable state.

#reflecta #AI #replit
SDx Replit Hackathon

This past weekend I participated in the Replit Hackathon held by SDx. It was a fun event.

It may seem odd trying my hand at vibe coding, given its weird hype and counter culture built around it, and the fact that I am a skilled and experienced developer. I wanted to explore it, see what the tools and workflow was like, learn what is possible without my traditional approach to problems.

To that end my three person team worked on a fascinating concept. We called it adict - demonstrating that naming still remains one of the hardest aspects of some projects - an app that used profiles of simulacrums of people with varying backgrounds - gender, age, beliefs, region, job, income, education level, etc - and tasked AI with determinnig the mental and emotional response of that individual to a given ad. Yes, there’s lots of room for error there, but it’s a 4 hour hackathon, so bare with me there.

I used arkaine to build out the agent that would consume the ad, a profile, and then utilized Gemini 2.5 Pro to develop the response. It worked pretty well!

We did, however, have significant issue tying the backend to the frontend through Replit. A constant cycle of frontend errors that ultimately should have been dealt with manually ate too much of our time and resulted in a poor final product. But that’s ok - we learned a lot!

Replit seems to be awesome at getting a quick outline and mock up done, but then struggles with the more complex tasks. I’ve experimented with different workflows for it, but it does seem often circuitious to get certain bugs fixed. My own intuition on where the tool would struggle - edge cases that were not predictable or common for the unique piece of software you’re making - proved accurate.

That being said, I do like Replit. If, for nothing else, then quick prototypes and mockups. I think people underestimate how powerful that can be. Hell I think most people underestimate how simple most software solutions can (and should) be. Bespoke custom software isn’t a bad thing. It’s where everything we use now grew from. Maybe we’ll see a resurgence of in house tech solutions without the expensive overhead and billion subscriptions SaaS has devolved into.

I expect a fair amount of growth from Replit’s tools and I’m excited to keep an eye on it. I’m not as bullish as the hype cycle hucksters are, but I do think it’s a promising tool.

#vibe coding #sdx #replit
Resistance with Data Preservation

I am a reserved and ultimately shy individual, dealing with his own life is a way that best suits me. Recently, however, I have found a way, however small, to provide my own resistance with my skillset and resources.

SciOp is a movement to backup government datasets and sights as they are targeted and removed for not toeing the party line. Since I have plenty of open space on my NAS, I’m dedicating a few terabytes to the effort.

So far I’ve backed up transgender protection and support resources, NOAA and climate change datasets, and some NIH datasets.

SciOp also provides a set of feeds that allows your backup to be automated to targeted and “endangered” datasets that we know are actively being removed, not just under threat, to ensure greater success in preservation.

If you have the tech setup for it I recommend checking it out.

#politics
Cursor + Other AI Tools

I’ve long since embraced AI integrations into my AI, and I’ve experimented with several of the new app builders. I’m going to avoid commenting on the “vibe” coding meme that’s going on, I will say I’ve enjoyed some of the efficiencies gained. A few months ago per the suggestion of several peers I swapped from VS Code w/ Copilot to Cursor. It took some adjustment of workflow to incorporate Cursor into a viable tool worthy of its subscription cost, but once certain hurdles were overcome I found it certainly accelerated me. The release of Claude 3.7-sonnet came with a promise of drastically increased performance in code, which I found to be… not true. In fact, I noticed a significant decrease in performance…

…until I finally upgraded Cursor to its current version. The new agentic approach powered by Claude 3.7-sonnet has been significantly better, outperforming my expectations by far. It is far better with large contexts, tying together multiple systems, and behaving in predictable and desirable manners.

I’ve also experimented with Bolt, v0, lovable, and Replit. While I’ve snagged a few front end pieces from each, I’m yet to fully embrace the full “one-shot” builders. I’m hesitant to necessarily say they’re not ready though, as my experiments with them have admittingly been small experiments, not a full fledged dive allowing my workflow to warp to the tool versus the alternative.

There seems there are even more AI tools I’m yet to experiment with, each with a workflow from what I’m used to. I hope to sometime in the near future dive further into these. To that end I will be attempting a pause on my Cursor subscription this month; at that point I will embrace Windsurf for a small while, see if I enjoy an increased velocity or novel capabilities. I am also very much looking forward to the Replit sponsored SDx hackathon later this month; I want to see what in my workflow approach I am doing wrong to fully enjoy the tool.

#AI Tools #Cursor
DeepSeek + Inference-Time Scaling and Generalist Reward Modeling

DeepSeek released another cool paper expanding on reinforcement learning for LLM alignment. Building off of their prior work (which I talk about here), they introduce two new methods.

The first - Rejective Fine-Tuning (RFT) - They have a pre-trained model produce N responses. Then the collected responses are combined in a prompt wherein the model is instructed to produce principles for evaluating the responses, critiques of each response, and a reward score for each based on the generated principles. The process utilizes a human judge to critique its critiques, teaching the model to produce these evaluations eventually without human feedback.

The goal of this is to move the reward function from a simple scalar value to a language derived scalar value. If the model is being trained on domains that don’t translate well to automatic correctness, this improves performance. This is why reasoning models are generally focused to this point on mathematics, coding, and similar domains - there’s easy and clear ground truth results to evaluate against. However, in more subjective domains (like ethics, open-ended questions, etc.), a scalar value is not easily derived without human intervention.

Once the model is proficient at generating these evaluations, the second method - Self-Principled Critique Tuning (SPCT) - is introduced. This method uses reinforcement learning to adaptively improve the model’s ability to generate critiques and principles without human intervention. A sort of feedback loop now begins, wherein the model generates its own evaluation criteria, critiques, and responses, assigns scores, and then receives rewards based on how well it ranks the responses against known human preferences. GRPO is still used for policy optimization, but the model is learning without direct human ratings or real time human raters.

Finally, they introduce Inference-Time Sampling and Voting to further enhance the robustness of the reward model during inference by sampling multiple outputs and aggregating them through a voting mechanism. Essentially - have a language model generate multiple responses, then have the trained reward model judge each response. Finally, choose the highest rated one at inference time to improve performance. It feels like a great improvement to self-consistency mechanisms.

#AI #deepseek
Interview Practice App

One of the great parts of building out tools like arkaine is that it allows me to sit down and just build for a bit to test the framework. After some quick experimentation I have a great agent that:

  1. Takes in a resume and a job description

  2. Considers additional topics to research to build out a knowledge base of what certain acroynms, skills, and technologies mean, and what are industry standards and expectations are

  3. Searches the web for those topics with arkaine’s research agents, building a knowledge base of relevant information

  4. Builds a list of questions that an interview should ask and…

  5. Converts the questions to natural sounding language and uses TTS to create a virtual interviewer. (This part is… it works. But sometimes it’s too heavy on the pausing and filler words)

So far it’s working surprisingly well for a simple prototype script. I’ll probably expand on it quite a bit, especially since I haven’t utilized speech-to-text to allow the user to answer (and an agent to judge their response) yet.

Here are some examples (the TTS is handled by OpenAI’s gpt-4o-mini-tts):

#AI #arkaine
arkaine 0.0.21; next steps

Version 0.0.21 of arkaine is out, including the finalized format for the text to speech tooling, equivalent speech to text tooling (though, admittingly, I currently lack a locally hosted option for this), and the think tool I mentioned earlier.

There’s still a lot of features that I want to add, and some I’m in the middle of; adding a chat interface to Spellbook and expanding the number of possible chat interfaces would be fun, and I already started the process a month ago. Similarly I half a ~60% completed OCR implementation and a redefining of how researchers utilize resources (necessary for handling non website object search, like your own books/documents)… but right now I’m thinking of taking a moment to just building with what I have and creating something useful for people as is.

#AI #arkaine
Just give me a second to think...

I simply love when simple ideas get tested and proven to be quite effective. It’s a clear sign of slowly feeling out how to best understand the system at hand. Such a delight popped up when I saw that Anthropic had revealed that simply adding a no-op tool with a “thought” argument called “think”, allowing the agent to just output its thought in the chain of Action -> Result generation, improved performance on complicated tasks.

…of course, I also have already implemented it in arkaine; I’ll give it a more thorough testing with some more complicated agents later.

#AI #arkaine
arkaine docs

My framework arkaine, which I quickly presented a bit ago, finally has some nice documentation for it. I had v0 do an initial pass on it, which I rather liked. After two quick rounds of prompting on their free tier I downloaded the project and tried my hand at expanding it from there. It’s my first tailwind/next.js project, but it was surprisingly easy. Granted it’s a simple page relative to a typical SPA or backend service, but hey, I’ll take the wins where I can get them.

Check out the documentation, especially the toolbox, and see if you can get inspired to build anything cool with arkaine.

#arkaine #AI
Mathematica
Attached image

I picked up David Bessis' Mathematica on a whim. It focused on a discussion of what math truly was to one accomplished in it, and how the public’s understanding of what mathematicians do is wildly, grossly inaccurate. The book’s premise: language is a poor medium for transmission of intuition itself, whereas math and logical proofs are overkill but required to express it. Mathematics, he argues, is the art of intuition and imagination, not calculation, trying its best damndest to define it. But, since intuition is so beyond language, we have to invent new concepts and symbols in which to grasp it and communicate it.

I enjoyed it, and have certainly spent time thinking on its lessons, but wished it delved more into direct hands-on walkthroughs of intuition in which to further illustrate the separation of language and logic, or perhaps more solid advice on directly attacking the problem of intuition growth.

#books #math

GRPO in DeepSeek-R1

GRPO in DeepSeek-R1
Liquid Time Constant Neural Networks
Attached image

Last night Adam gave a great presentation at the SDx paper club. The idea of using ODE solvers as an activation function was 🤯. It’s heavily used in robotics, so I’ll likely be doing a deep dive at some point; specifically building a neuron that uses the paper’s techniques to better understand the inner workings.

#AI #math
DeepSeek R1

A few weeks ago I gave a talk at an SDx paper club covering the DeepSeek R1 Paper. I talked in depth about the advancements made and the implications of their success with GRPO (group relative policy optimization) powered reinforcement learning.

The recording at the event borked so I re-recorded it the next day. Enjoy!

#AI
eli5-equations
Attached image

I’ve been working on arkaine’s OCr service all weekend, and need a break. I’ve been toying with the idea of an equation explainer that copies the style I present complicated math in my paper club presentations. I’ve decided to step away from arkaine and try using it a bit in a prototype. Hence: eli5-equations.

Want to get a walk through of a complicated equation? Pass it in with some context and see if your evening is a bit enlightened. I’ll do a further write up on this later probably.

#arkaine #AI #math
Mini hack-a-thon

Today I attended a mini-hackathon via SDx. I attended to solo work on some arkaine agents and to be present as a mentor/advisory role for other attendees. It was a short 6 hour affair, mainly focused on playing with the new OpenAI o3-mini. It also helps to be inspired by seeing other people creatively applying AI to a quick weekend project.

I ended up building a great prototype of a research agent - the original goal of arkaine for myself. It needs some work - I definitely ran into rate limiting issues and need to get the agent to better understand report generation at the end. Expect this to get added in to arkaine soon. Pushing myself to finish the project in the time allotted was also a great exercise in rapid prototyping. As for the other projects - there were quite a few that wowed me. I’m certainly looking forward to the next time I can dive in and code surrounded by other makers.

#arkaine #AI #SDx
Increased creativity by thinking longer
Attached image

Here’s an ingenious set of hacks to cheaply modify the behavior of existing LLMs to reason better. Most notably was the detecting the initial use of the </think> tag and instead replacing it with a second-guessing term (best performing was “Wait”). This forced the model to think longer, which in turn improved performance on tasks significantly.

I’ll likely be doing a deeper dive for my upcoming paper club presentation.

#AI
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

We’re kicking off 2025’s paper club series via SDx again on February 18th @ 6:30 pm. I’ll be presenting DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. Join in if you’re in the area and want to deep dive some of the recent cutting edge discoveries.

#AI #Meetup
I'm afraid I can't do that, Dave...

I found myself looking into the effects of censorship removal from LLMs - particularly the recent popular kid on the block Deepseek R-1. It seems that the model becomes uncooperative against certain topics that don’t align with party doctrine. I came a cross a generic refusals removal repository linked here which made me chuckle - it’s just control vectors fine tuned into the model, which I discussed here.

#AI
(Rapidly) introducing arkaine

I recently gave (an unfortunately rushed) talk about arkaine - a maker-focused agentic AI framework I’ve been spending most of my time building. Slides for the talk are here

#AI

Diffusion Models Are Real-Time Game Engines

Diffusion Models Are Real-Time Game Engines