Date: 26-04-26

Building an agent harness

I have been using Ai a lot lately mostly to build my own tool which I will start writing about soon enough, but today I wanted a to write about how you could build your own agent harness. If you don’t know what an LLM or as marketing refers to it “Artificial intelligence” I would watch the 3 blue 1 browns 8 minute video as I think it does the best job of explaining how it maps to our physical world.

I will be walking through how to build your own little agent harness, heavily using Ai to write all the code to build your own little agent that you can do what ever you want with. I will be using swift in this example because I like and it has good openapi spec support for this which you can also use the spec to build a forwarding server (serving the spec you are consuming) if you want to talk to your agent from an iOS app like I setup for my self so that I don’t have to be at my computer to check in on it.

What is an agent harness?

An Agent harness is some code you or another Ai writes to wrap the output of an LLM and provide it with the ability to call tools like read and edit file and provide a UI for you to interact with the Ai. Opencode, Cursor, CodeX, ClaudCode are all examples of agent harnesses. Looking at these examples you can see that there is a lot you can add on top of this basic concept but today i’m going to guide you through the basics as it’s pretty easy to build your own and demystifies the LLM part from the agent and if you want allows you to setup tight control over what the LLM can and can’t do on your machines.

Listening to the LLM

To get started we first need to programmatically be able to “listen” to what the Ai/LLM is outputting so that we can do various things with the output. The easiest way I have found to do this is to trim down OpenAI’s openapi spec located on there github (first link). Once you have the spec downloaded, I hand it to a local coding tool like opencode or any Ai chat window your comfortable with for that matter. I ask it to trim down the spec to just the streaming chat completions and list models endpoints as this is all you need to build an agent and this allows your agent to consume from most inference providers like vllm or Ollama.

Most inference engines or providers support the openai spec. I believe Claude has it’s own spec and probably a few others but bang for buck the openai spec will get you off the ground.

My spec was about 500 lines of yaml after trimming down.

I am using the streaming format as it will give you a more responsive UI then the batch format and swift makes it pretty easy to handle streams like this.

Ollama by default uses it’s own output but if you call a different endpoint it will do the openAi version.

For getting tokens to consume with your up coming agent. I prefer to use my local GPU using something like Ollama as you don’t really need “intelligence” at this points just a stream of tokens and an LLM to send responses back to. If you want you can use your opencode subscription for tokens if you want “smarter” models.

Once you have this spec and your token provider api setup. I like to use the swift-openapi-generator to generate a client that supports the streaming format. I just ask the Ai to do this for me and have it generate a little command line interface (CLI) for you to test with. I have found using CLI’s to test Ai output as very effect as you easily re-run the command over and over and even ask the Ai to run the command it’s self to validate it’s work. This works well to get the beginnings of a user interface working and then you can package up the logic of your CLI into a package or target for a real user interface to display and have shared code paths between the agent and you.

I will leave it up to you on how you want to chat with the LLM but at this point you are pretty much just trying to validate that you get a streaming like response from your client you generated. I encourage having the Ai use ANSI escape codes to give the interface a little bit of flare.

Now that we have a client to consume tokens lets build our agent.

What is an agent

An agent at it’s simplest terms is tool calls in a loop, so lets build that.

I suggest using a small model as it will have the fastest output. I like to turn off thinking (reasoning_content) as at this point you don’t care about quality of the output and a model shouldn’t have to think to call tools. Turning this off helps get faster responses and make it easier to iterator and fix your agent as you will have a lot of bugs that you have to understand and work through.

I really love the elegance that pi pointed out that you only need four tools, everything else is an optimization: shell, read, write, edit. I started with a “get time” tool call as it is pretty easy to tell if the agent is lying and it adds a pretty minimal amount of code that you can read if you want to understand what is happening.

To give a summary of how this works. Your agent will output and receive messages with some context like assistant, system, and user which contains the text contents of that part of the conversation. The tool call will show up as an assistant message where the LLM will provide a json object which your harness will parse and give you to pass to whatever code you want to pass that data. The get time won’t take any arguments but it’s a good way to test that your system prompt and model can make a tool call. You will then pass the output of your tool call to the agent with a JSON object and associated id which the agent understands and the output of a specific tool call it made.

Summary

That’s the basics of building your own agent harness. It’s pretty easy to build with Ai and is pretty fun to toy around with in my opinion. You could make a specific read email agent where you have good sandboxes and rules around the job or you can try and expand it out to your own coding agent which you can use to build another agent harness or whatever.

One thing I did that I thought was pretty cool is once I setup the forwarding server for the LLM I was able to run the tool calls on my phone. So you could give the agent access to your messages, or location if you wanted and because it’s all your code and probably a model running on your own machine you can send it private date.

I started an empty directory and gave this article into the kimi 2.5 model. Copy and pasting the errors back into the agent I had a working get time tool call against Ollama working in under thousand lines of text between swift and yaml. Pretty cool if you ask me.

Thanks for reading.

Zane

Leave Feedback