When All You Have Is A Hammer...

Most of what AI can do is a happy accident. Maybe we should just use it for what it's good at.

February 02, 2026

5mo

I feel like a lot of automations folks are doing with agents can probably be done mostly by shell scripts in a more deterministic fashion. Maybe just use a LLM for some categorization, instead of to do the whole thing.

LLMs are really good with taking some input, interpreting it, and spitting out some output, where each part of the output is based on the input plus previous parts of the output. At 30,000ft, that's all that they do. This is a very cool thing, and enables some really interesting use-cases.

For example, I can upload a CSV of data and say "Turn this into a JSON object that matches this Typescript interface." And it will do it, because it has been trained on CSVs, JSON, and Typescript enough that it can transform between them. 3Blue1Brown has a great video series about how all this works.

But to do that, they need the entire corpus of human writing in their training data. A nifty side effect is that it can generate believable responses to queries. They can act like a chat bot, they can write something that looks (and often behaves) like code, they can write something that looks like poetry.

But it's really easy to say that it's thinking, making decisions, or feeling. I don't think any of that is actually happening. I think it's a side-effect of the non-deterministic guessing engine under the hood. (I'll concede that whether this is how human consciousness evolved is debatable.) GPT stands for "Generative Pretrained Transformer", for goodness sake!

AI code generation can get away with it because it's verifiable — you can both check and make sure that the code works correctly, and you can have the LLM generate automated tests to verify it. Verification is built into the model. But it's easy to find examples where the code had performance issues, security vulnerabilities, or unexpected side effects that broke other parts of the program. And pretty much everyone who looks at vibe coded programs say that the code architecture and organization is terrible. And the YOLO, unintentional attitude of vibe coding has negative impacts on the rest of the developer ecosystem.

I'm actually kind of horrified by OpenClaw. Don't get me wrong, it's technically impressive. But it seems like the kind of YOLO project where you burn through millions of tokens doing basically nothing. Meanwhile, a lot of stuff can go wrong.

Jeff nails it.

5mo

OpenClaw (which was Moltbot, which was ClawdBot, all in the course of a week) is the new `sudo curl | bash`

In other words, it's probably not a good idea to give a bot root access to your computer.

But also, this is the kind of stuff that a good set of well-defined scripts and commands can already do. You just have to be intentional about it, instead of letting the bot do whatever it wants to.

As I've used and thought about LLMs, I've come up with two really great use-cases that would be tough to pull off with other tools:

Text embedding comparison, which is ultimately what makes LLMs, AI image generators, and speech-to-text tools like Whisper possible.
Text classification and entity extraction, where you provide a system prompt that defines different classifications and different parts that can be extracted from text.

I did some tests of this second one for my spaceship game. The use-case is crews sending messages to NPCs, and having the NPCs respond appropriately. I'm really really worried about handing over the response generation to LLMs — it's way too easy for things to go off the rails — but I might tap an LLM for doing classification with a prompt like this:

Your job is to classify messages into intents, which are provided below. Here is your message:

<message>
Where are you hiding. Give us back the oxygen generator that you stole!
</message>

Your response should be valid JSON that matches the following TypeScript type. Your response should only be JSON. Do not provide any explanation, context, instructions, or clarification. Avoid writing your response as Markdown, only write JSON.

type Intent = {type:"distress", location?:string}|{type:"status-report"}|{type:"threat", threatLevel:"high"|"medium"|"low"}|{type:"information",query:string}

If this looks similar to what Brittany was talking about, yes. Yes, it's exactly what she's talking about.

Something like this allows me to explicitly define how the different NPCs respond to different messages without leaving the guardrails that I get to define. Best of all, this kind of simple behavior can easily be handled by a local LLM like llama3.1, so if I actually decide to implement this in my game, my users won't be required to pay for LLM tokens.

Does it lack unlimited flexibility? Yes, by design. Good game design (in fact, good design in general) is all about requirements and constraints and how you can work around both of them.

9mo

the problem is, the reason people are "into" non-deterministic automation is they're absorbed in fantasy. shell scripts and pipes are too real — YOU have to do the work, to think things through, not wave a magic wand — and therefore unappealing.

Amy also nails it. This isn't about getting things done better or faster. It's about convenience and not having to think through integrations. It all reeks of un-intentionality.

I want my life to be full of intentional choices.

Finally, just for giggles:

5mo

You can test new tech ideas using the Seinfeld Test Would the product eliminate the plot of an episode? (Google maps, cell phones, paypal, battery packs) Good tech. Would the product inspire new Seinfeld plots? (NFTs, AI chatbots, crypto currency, blindboxes, metaverse land sales) Bad tech.

In Defense of Going Slow

My Personal History Part 1

Alex’s Writing Journal

Daily posts written by Alex about whatever crosses my mind. An attempt at a daily writing practice.