Minimizing token spending.

Easy does it. Going to take baby steps here. Step 1

May 12, 2026

Hello everyone, and welcome back!

Today, I am going to list all the steps I have taken to minimize my token spending. I pretty much consider anything that reduces token costs without downgrading my experience a win.

Why Defaults Can Kill Your Budget

The first step I touched on in my previous article. With Hermes Agent, you have to be very careful; fresh out of the box, it often uses Claude 4.5 Sonnet (or a similar high-tier model). I blew through my tokens fast enough to change my perspective completely. To give you an idea of the cost difference, look at the chart below:

It’s crazy how much more expensive Claude 4.5 Sonnet is compared to two of the more popular open-source models. While they vary in strengths and weaknesses, it is generally accepted that Sonnet is the “smarter” model. The question for me is: Is it 5x better?

I can easily drop a million tokens using my agent. For me—and I feel for many others here—I simply cannot afford the cost of running Sonnet for every single task.

Mixing Models for Maximum Value

So, what am I left to do? I set up a workflow to get the best of both worlds. Currently, Kimi 2.6 is my version of “splurging.” I have Kimi 2.6 plan the architecture and develop the core logic.

Side note: I work on several side projects and tend to reuse logic between them. Sometimes I pull code in manually and ask Kimi if it sees anything useful. Usually, it finds a good way to implement it and gives me one tip to “make it even better.” Sometimes I take the advice, sometimes I don’t! :)

Polishing Without the Price Tag

Once the Kimi 2.6 code is generated, I check it using DeepSeek-V4-Flash. My prompt is: “Check this for mistakes, check twice, then give me a plan to either fix or improve it (or both).”

Even a simple app can cost between 500k and 1 million tokens, so the best way to control costs while coding is to “pick the right model.” Honestly, I am still learning this. I am taking steps to improve, but I still have a long way to go.

Why Agentic Workflows Win

To state the obvious: the less I use the “expensive models,” the better off I am. Could I have gone cheaper? Sure, I could have used free “chatbots” from various companies. I’ve done that in the past and it works; however, I’ve moved past that by adopting agentic workflows.

The long and short of it is that utilizing two different models is almost always a good thing. It’s a well-known fact that some models are better at specific tasks than others. You aren’t losing anything but a bit of time by asking a cheaper model to review code for errors and potential improvements.

Cheap and Effective

TL;DR: The most important step when on a budget is to use the cheapest effective model for the job. Yes, it takes some trial and error. You will make mistakes, but you will learn.

While I try to avoid hyperbole, it’s pretty amazing to get the same—or similar—results for 5x or even 10x cheaper! Try it out; when it works, it’s a great feeling.

The Secret: Finding Hidden Gems

On a side note, I would be remiss if I didn’t point out that platforms like OpenRouter and Nous Portal have free models to try out seemingly all the time. Those and other providers offer many alternatives. It’s really up to the user—if you get skillful at working with various models, you would be surprised just how good your results can be.

I hope this was helpful! If you have any questions or suggestions, feel free to comment. I hope everyone enjoyed this and comes back for more!

Regards,

Steven Martin

Steven's Substack

Discussion about this post

Ready for more?