hlfshell

DeepSeek V3 + GRM SPCT: Self-Improving AI Reward Models @ May 16, 2025 at 11:00 AM

I recently gave a talk on DeepSeek V3 training improvements and the fascinating ideas behind GRM and SPCT. The talk took awhile to get posted, so here it is! Be sure to checkout the blogpost as well for a bit more on GRM and SPCT.

#ai #deepseek

🔗

The Physical Turing Test: Nvidia's Vision for Embodied AI @ May 10, 2025 at 10:10 AM

It’s certaintly a fascinating time for robotics. I am generally pessimistic of the current state, and likely fates, of much of the current landscape of the robotics industry, but it is undeniable we are rapidly unlocking new capabilities and research is flying forward.

This talk acts as an excellent high level overview of the key innovations to our approach of utilizing reinforcement learning for better performance of robots with our VLMs / VLAs for a more general audience.

If you want a slightly deeper dive (though admittingly out of date with the pace the field has moved) checkout my overview of LLM research aligned with robotics from 2023, and my thesis work of using LLMs to utilize heuristic knowledge to understand context of requests and environments. If I were to do a similar project today, I would certainly take a more agentic route. I certainly wish I had written arkaine prior to that project - it would have avoided so many headaches.

#ai #robotics

🔗

Clever tooling for a 3d printed arm @ May 9, 2025 at 11:18 PM

I’ve seen dozens and dozens of open source robotic arm projects; probably because of my own addiction to mechanical automatons paired with my unending desire (despite a lack of need) for a robotic arm of my own.

This one caught my eye however as it has a particularly clever accessory method without the need to of a complex end effector tool changer. If they can route power/data control signals out to the accessories next I rather like the idea.

I wonder if perhaps a more direct approach should be utilized and the base end effector / finger gripper should be simply an accessory itself.

#robotics #3d printing #robotic arm

🔗

Mechanical Movement References @ May 8, 2025 at 1:12 PM

As I’ve mentioned before, I’ve been trying to learn to draw lately, initially inspired by a desire to be able to properly convey the headaches with pictures in my head to on-paper representations. Plus I’m eager to increase my practice in visualizatio nand intuition.

But drawing references tend to be oriented towards the more popular subjects of nature, people, animals. Very few people are as enamored by the mechanical movements of machines as I am, apparently. As a result - references can be hard to come by for practice.

Enter these sources; not just for drawing practice, but to also inspire mechanical solutions to your own projects.

#drawing #mechanics

🔗

DeepSeek GRM and SPCT - Complex Domain Rewards

May 3, 2025 DeepSeek GRM and SPCT - Complex Domain Rewards

Moldable Design @ May 3, 2025 at 4:49 PM

Currently I’m going through the struggles of learning to draw from a very poor foundation of sub-stick figure talent and primarily spurred on by the desire to convert the vague ideas in my head to actual designed physical objects, this video spoke to me. The creator walks through the creaton of his clay making set, all to allow him to shape tiny, organic shapes that he can then photograph to act as “good-enough” references for CAD modeling design. A simple idea, well executed.

#drawing #design

🔗

go-arkaine-parser @ April 27, 2025 at 1:52 PM

Back when I was working on coppermind at the heyday of GPT3.5’s initial world shattering release, I had… difficulty finding good ways to deal with parsing the stochastic LLM outputs.

I got better at this when I started work on arkaine, eventually developing a pretty useful and reliable parsing pattern.

With an idea that would be best served as a golang app requiring interacting with LLMs I decided to do a quick port of the parser to an idiomatic golang module.

So if you need AI parsing for your golang project, check out go-arkaine-parser.

#arkaine #golang #AI

🔗

eli5-equations @ April 26, 2025 at 11:17 PM

Did some work on eli5equation, cleaning up and fully deploying prior one day project. We’ll see if people find it useful. I find it super useful when tackling new AI papers.

I also gave it a bit of a glowup too. And yeah, still powered by arkaine.

#arkaine #AI #math

🔗

hiredcoach @ April 25, 2025 at 10:36 AM

I have been busy due to a flood of inspiration to work on projects. Pleasant, but also exhausting.

The only updates to arkaine has been to fix an issue with default arguments in the ParallelList flow tool. The reason for the freeze in development is an accidental startup.

How does one go “oops, I made a startup?” Basically, I decided to quickly develop a series of simple apps that utilized arkaine and would demonstrate its usage. Then combine them into a large post walking through the examples.

The very first one you saw a prototype for in an earlier post wherein I made a interview practice question generator. This was a simple script, but the results struck me as promising. Why not build a nice site around it so that other people can use it? And so started hiredcoach

I really liked what I started to see, and realized that this could actually be useful to people if offered as a service. So that’s what I’m aiming to do - try and build out a startup from it as quickly as possible. I’m at ~12 days of development effort. I had hoped to get it out in 2 weeks - 3 or 4 still seems a possibility.

I’ll post here when I launch and my initial thoughts. I temper my anxiety of startup launching with thoughts that the worst case is a nice line item in my resume upon failure.

#hiredcoach #startup #AI #arkaine

🔗

reflecta @ April 22, 2025 at 9:14 PM

Given my recent weekend foray into Replit, I wanted to try a simple solo project to “kick the wheels” and test drive Replit a bit more.

It took about 2 hours from idea conception to deployed product with integrated DeepSeek AI - not too bad!

reflecta is a simple journal prompting app (trying to avoid overly cheesey prompts at that) that allows you to specify categories to focus on, and provides the prompt. The UI is designed to be as simple and calming as possible. As an added feature, there’s an integrated chat feature with DeepSeek’s R1 to reword, retheme, or discuss the prompt.

It’s always refreshing to quickly pump out small projects like this. Hopefully some people find it helpful!

I’ll be playing with Replit on a few other “mini-app” ideas to see if any of them get into a releaseable state.

#reflecta #AI #replit

🔗

SDx Replit Hackathon @ April 21, 2025 at 9:14 PM

This past weekend I participated in the Replit Hackathon held by SDx. It was a fun event.

It may seem odd trying my hand at vibe coding, given its weird hype and counter culture built around it, and the fact that I am a skilled and experienced developer. I wanted to explore it, see what the tools and workflow was like, learn what is possible without my traditional approach to problems.

To that end my three person team worked on a fascinating concept. We called it adict - demonstrating that naming still remains one of the hardest aspects of some projects - an app that used profiles of simulacrums of people with varying backgrounds - gender, age, beliefs, region, job, income, education level, etc - and tasked AI with determinnig the mental and emotional response of that individual to a given ad. Yes, there’s lots of room for error there, but it’s a 4 hour hackathon, so bare with me there.

I used arkaine to build out the agent that would consume the ad, a profile, and then utilized Gemini 2.5 Pro to develop the response. It worked pretty well!

We did, however, have significant issue tying the backend to the frontend through Replit. A constant cycle of frontend errors that ultimately should have been dealt with manually ate too much of our time and resulted in a poor final product. But that’s ok - we learned a lot!

Replit seems to be awesome at getting a quick outline and mock up done, but then struggles with the more complex tasks. I’ve experimented with different workflows for it, but it does seem often circuitious to get certain bugs fixed. My own intuition on where the tool would struggle - edge cases that were not predictable or common for the unique piece of software you’re making - proved accurate.

That being said, I do like Replit. If, for nothing else, then quick prototypes and mockups. I think people underestimate how powerful that can be. Hell I think most people underestimate how simple most software solutions can (and should) be. Bespoke custom software isn’t a bad thing. It’s where everything we use now grew from. Maybe we’ll see a resurgence of in house tech solutions without the expensive overhead and billion subscriptions SaaS has devolved into.

I expect a fair amount of growth from Replit’s tools and I’m excited to keep an eye on it. I’m not as bullish as the hype cycle hucksters are, but I do think it’s a promising tool.

#vibe coding #sdx #replit

🔗

Resistance with Data Preservation @ April 18, 2025 at 10:17 PM

I am a reserved and ultimately shy individual, dealing with his own life is a way that best suits me. Recently, however, I have found a way, however small, to provide my own resistance with my skillset and resources.

SciOp is a movement to backup government datasets and sights as they are targeted and removed for not toeing the party line. Since I have plenty of open space on my NAS, I’m dedicating a few terabytes to the effort.

So far I’ve backed up transgender protection and support resources, NOAA and climate change datasets, and some NIH datasets.

SciOp also provides a set of feeds that allows your backup to be automated to targeted and “endangered” datasets that we know are actively being removed, not just under threat, to ensure greater success in preservation.

If you have the tech setup for it I recommend checking it out.

#politics

🔗

Cursor + Other AI Tools @ April 7, 2025 at 1:36 PM

I’ve long since embraced AI integrations into my AI, and I’ve experimented with several of the new app builders. I’m going to avoid commenting on the “vibe” coding meme that’s going on, I will say I’ve enjoyed some of the efficiencies gained. A few months ago per the suggestion of several peers I swapped from VS Code w/ Copilot to Cursor. It took some adjustment of workflow to incorporate Cursor into a viable tool worthy of its subscription cost, but once certain hurdles were overcome I found it certainly accelerated me. The release of Claude 3.7-sonnet came with a promise of drastically increased performance in code, which I found to be… not true. In fact, I noticed a significant decrease in performance…

…until I finally upgraded Cursor to its current version. The new agentic approach powered by Claude 3.7-sonnet has been significantly better, outperforming my expectations by far. It is far better with large contexts, tying together multiple systems, and behaving in predictable and desirable manners.

I’ve also experimented with Bolt, v0, lovable, and Replit. While I’ve snagged a few front end pieces from each, I’m yet to fully embrace the full “one-shot” builders. I’m hesitant to necessarily say they’re not ready though, as my experiments with them have admittingly been small experiments, not a full fledged dive allowing my workflow to warp to the tool versus the alternative.

There seems there are even more AI tools I’m yet to experiment with, each with a workflow from what I’m used to. I hope to sometime in the near future dive further into these. To that end I will be attempting a pause on my Cursor subscription this month; at that point I will embrace Windsurf for a small while, see if I enjoy an increased velocity or novel capabilities. I am also very much looking forward to the Replit sponsored SDx hackathon later this month; I want to see what in my workflow approach I am doing wrong to fully enjoy the tool.

#AI Tools #Cursor

🔗

DeepSeek + Inference-Time Scaling and Generalist Reward Modeling @ April 5, 2025 at 11:06 AM

DeepSeek released another cool paper expanding on reinforcement learning for LLM alignment. Building off of their prior work (which I talk about here), they introduce two new methods.

The first - Rejective Fine-Tuning (RFT) - They have a pre-trained model produce N responses. Then the collected responses are combined in a prompt wherein the model is instructed to produce principles for evaluating the responses, critiques of each response, and a reward score for each based on the generated principles. The process utilizes a human judge to critique its critiques, teaching the model to produce these evaluations eventually without human feedback.

The goal of this is to move the reward function from a simple scalar value to a language derived scalar value. If the model is being trained on domains that don’t translate well to automatic correctness, this improves performance. This is why reasoning models are generally focused to this point on mathematics, coding, and similar domains - there’s easy and clear ground truth results to evaluate against. However, in more subjective domains (like ethics, open-ended questions, etc.), a scalar value is not easily derived without human intervention.

Once the model is proficient at generating these evaluations, the second method - Self-Principled Critique Tuning (SPCT) - is introduced. This method uses reinforcement learning to adaptively improve the model’s ability to generate critiques and principles without human intervention. A sort of feedback loop now begins, wherein the model generates its own evaluation criteria, critiques, and responses, assigns scores, and then receives rewards based on how well it ranks the responses against known human preferences. GRPO is still used for policy optimization, but the model is learning without direct human ratings or real time human raters.

Finally, they introduce Inference-Time Sampling and Voting to further enhance the robustness of the reward model during inference by sampling multiple outputs and aggregating them through a voting mechanism. Essentially - have a language model generate multiple responses, then have the trained reward model judge each response. Finally, choose the highest rated one at inference time to improve performance. It feels like a great improvement to self-consistency mechanisms.

#AI #deepseek

🔗

Interview Practice App @ March 26, 2025 at 9:48 PM

One of the great parts of building out tools like arkaine is that it allows me to sit down and just build for a bit to test the framework. After some quick experimentation I have a great agent that:

Takes in a resume and a job description
Considers additional topics to research to build out a knowledge base of what certain acroynms, skills, and technologies mean, and what are industry standards and expectations are
Searches the web for those topics with arkaine’s research agents, building a knowledge base of relevant information
Builds a list of questions that an interview should ask and…
Converts the questions to natural sounding language and uses TTS to create a virtual interviewer. (This part is… it works. But sometimes it’s too heavy on the pausing and filler words)

So far it’s working surprisingly well for a simple prototype script. I’ll probably expand on it quite a bit, especially since I haven’t utilized speech-to-text to allow the user to answer (and an agent to judge their response) yet.

Here are some examples (the TTS is handled by OpenAI’s gpt-4o-mini-tts):

#AI #arkaine

🔗

arkaine 0.0.21; next steps @ March 23, 2025 at 10:37 PM

Version 0.0.21 of arkaine is out, including the finalized format for the text to speech tooling, equivalent speech to text tooling (though, admittingly, I currently lack a locally hosted option for this), and the think tool I mentioned earlier.

There’s still a lot of features that I want to add, and some I’m in the middle of; adding a chat interface to Spellbook and expanding the number of possible chat interfaces would be fun, and I already started the process a month ago. Similarly I half a ~60% completed OCR implementation and a redefining of how researchers utilize resources (necessary for handling non website object search, like your own books/documents)… but right now I’m thinking of taking a moment to just building with what I have and creating something useful for people as is.

#AI #arkaine

🔗

Just give me a second to think... @ March 22, 2025 at 9:03 PM

I simply love when simple ideas get tested and proven to be quite effective. It’s a clear sign of slowly feeling out how to best understand the system at hand. Such a delight popped up when I saw that Anthropic had revealed that simply adding a no-op tool with a “thought” argument called “think”, allowing the agent to just output its thought in the chain of Action -> Result generation, improved performance on complicated tasks.

…of course, I also have already implemented it in arkaine; I’ll give it a more thorough testing with some more complicated agents later.

#AI #arkaine

🔗

arkaine 0.0.20 - TTS @ March 20, 2025 at 9:33 PM

Inspired by OpenAI’s newest text to speech models, I decided to finally take a crack at text to speech on arkaine. To that end, I published 0.0.20 of arkaine with a beta build of text to speech tools, support OpenAI, Google, and Kokoro for local execution. Check them out in the toolbox!

#arkaine #AI

🔗

arkaine docs @ March 18, 2025 at 8:57 PM

My framework arkaine, which I quickly presented a bit ago, finally has some nice documentation for it. I had v0 do an initial pass on it, which I rather liked. After two quick rounds of prompting on their free tier I downloaded the project and tried my hand at expanding it from there. It’s my first tailwind/next.js project, but it was surprisingly easy. Granted it’s a simple page relative to a typical SPA or backend service, but hey, I’ll take the wins where I can get them.

Check out the documentation, especially the toolbox, and see if you can get inspired to build anything cool with arkaine.

#arkaine #AI

🔗

BitNet b1.58 Reloaded @ March 17, 2025 at 10:43 AM

The paper club I host will be covering BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks! I’ve been looking forward to this one since I read the original paper on 1.58 bit nets. Join us and learn about the future of trinary and LLMs!

#AI

🔗

Mathematica @ March 16, 2025 at 10:39 AM

I picked up David Bessis' Mathematica on a whim. It focused on a discussion of what math truly was to one accomplished in it, and how the public’s understanding of what mathematicians do is wildly, grossly inaccurate. The book’s premise: language is a poor medium for transmission of intuition itself, whereas math and logical proofs are overkill but required to express it. Mathematics, he argues, is the art of intuition and imagination, not calculation, trying its best damndest to define it. But, since intuition is so beyond language, we have to invent new concepts and symbols in which to grasp it and communicate it.

I enjoyed it, and have certainly spent time thinking on its lessons, but wished it delved more into direct hands-on walkthroughs of intuition in which to further illustrate the separation of language and logic, or perhaps more solid advice on directly attacking the problem of intuition growth.

#books #math

🔗

GRPO in DeepSeek-R1

Mar 14, 2025 GRPO in DeepSeek-R1

Liquid Time Constant Neural Networks @ March 14, 2025 at 1:23 PM

Last night Adam gave a great presentation at the SDx paper club. The idea of using ODE solvers as an activation function was 🤯. It’s heavily used in robotics, so I’ll likely be doing a deep dive at some point; specifically building a neuron that uses the paper’s techniques to better understand the inner workings.

#AI #math

🔗

DeepSeek R1 @ March 11, 2025 at 10:05 PM

A few weeks ago I gave a talk at an SDx paper club covering the DeepSeek R1 Paper. I talked in depth about the advancements made and the implications of their success with GRPO (group relative policy optimization) powered reinforcement learning.

The recording at the event borked so I re-recorded it the next day. Enjoy!

#AI

🔗

eli5-equations @ March 9, 2025 at 10:05 PM

I’ve been working on arkaine’s OCr service all weekend, and need a break. I’ve been toying with the idea of an equation explainer that copies the style I present complicated math in my paper club presentations. I’ve decided to step away from arkaine and try using it a bit in a prototype. Hence: eli5-equations.

Want to get a walk through of a complicated equation? Pass it in with some context and see if your evening is a bit enlightened. I’ll do a further write up on this later probably.

#arkaine #AI #math

🔗

LATEST ARTICLE:

DeepSeek GRM and SPCT - Complex Domain Rewards

Lately I'm thinking about...

DeepSeek GRM and SPCT - Complex Domain Rewards

GRPO in DeepSeek-R1