I want to build an agent that I can use across both Claude and Codex.
But after only about a week of using these tools, I still do not think I understand agents deeply enough yet.
My current conclusion is simple: before I design a large shared architecture, I should start with the smallest useful agent behavior.
The main risk right now is not that the agent will be too weak. The real risk is that it will behave as if a non-existent initiative is already running.
This article explains what I have sorted out so far, what I have decided not to decide yet, and where I think my current stage really is.
Why I am not starting with a big architecture
My first instinct was to design a shared agent architecture for Claude and Codex.
I wanted common roles, common rules, and a common request format.
But I ran into a more basic problem first.
I still had not clearly separated what I should ask an agent to do from what I should never delegate.
That matters more than architecture.
If I build a polished structure on top of fuzzy boundaries, it becomes easier to create the illusion of progress without real execution.
So the first design problem is not system shape.
It is stopping conditions and delegation boundaries.
The main thing I am worried about
The biggest concern is that an agent might treat an idea, a hypothesis, or a rough memo as if it were already an approved initiative.
When that happens, it looks like work is moving, but there may be no real plan underneath it.
This is mostly a failure problem.
Success matters, of course. But if failed actions or undefined actions are mixed together, it becomes hard to tell what helped and what was meaningless.
That is why I care less about making the agent feel active and more about being able to evaluate what actually happened afterward.
What I have clarified so far
After thinking through this, I now believe the agent's scope should be narrow.
My current boundary looks like this:
| Item | Current answer |
|---|---|
| Who starts a new initiative? | A human decides |
| What the agent may do | Execution, logging, and aggregation |
| How much discretion it gets | Only follow the defined steps as written |
| Minimum condition for a delegable initiative | It must have a purpose, steps, an exit condition, and a logging method |
| What happens if assumptions are missing | It stops, reports the missing items, and does not fill gaps on its own |
| Its role after failure | Record results and draft a retrospective, but not decide whether to continue |
In other words, the agent is not the owner of an initiative.
It is the executor of a defined initiative.
What I want to standardize across Claude and Codex
I still want a shared way to work across Claude and Codex.
But what I want to standardize is not model behavior. It is the boundary around behavior.
There are three parts I want to keep common.
1. A request format
I want a simple request structure such as:
- target
- expected output
- allowed execution scope
This matters more than tool-specific details because it makes the allowed surface explicit before work starts.
2. Stop rules
I also want the same stopping logic across both tools.
If the task has no exit condition or no logging method, the agent should stop instead of improvising.
That would reduce the chance that one tool quietly turns a vague idea into an active workflow.
3. A record of what happened
The third shared piece is record-keeping.
I want a way to capture not only what succeeded, but also what did not matter.
Without that, agent activity becomes hard to evaluate and easy to overestimate.
What I am not deciding yet
There are also several things I do not think I should decide yet.
I do not want to over-specialize roles too early.
It is tempting to split everything into planner, reviewer, analyst, and writer agents, but that feels premature.
I also do not want complex orchestration yet.
Multiple agents talking to each other sounds powerful, but it becomes harder to see where a bad assumption entered the system.
And I do not want to optimize for intelligence first.
Right now I need an agent that does not overreach more than an agent that feels impressive.
Where I think I am right now
My current stage is not "shared architecture design completed."
It is closer to "minimum operating rules becoming clear."
Before I go further, I want to lock down five things:
- Which kinds of work can be delegated
- The minimum conditions for a valid initiative
- The stopping rule when assumptions are missing
- The logging format for execution results
- The review format for failed or low-value actions
Once those are stable, it will be much easier to build Claude-specific and Codex-specific usage on top of them.
If I build one agent first
If I build only one agent first, it should be a safe execution agent.
Its job would not be to invent or approve new initiatives. Its job would be to run already approved and already defined work.
That means tasks like these:
- follow a fixed checklist
- write logs in a fixed format
- aggregate known results
- stop and return missing items when a requirement is unclear
And it should not do things like these:
- decide to start a new initiative
- turn vague ideas into active plans on its own
- optimize work without an agreed evaluation method
- decide to continue a failed action
The first useful agent does not need to be a brilliant strategist.
It needs to be a reliable operator.
Summary
I still want an agent workflow that works across Claude and Codex.
But the immediate step is not to design the biggest possible architecture.
The immediate step is to start with the smallest useful agent behavior.
That means clear boundaries, clear stop rules, and a clear difference between an idea and an approved initiative.
For now, that is the real current position.