Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.
— Kunal
Ego is an Extensible Agent Orchestrator: inspired by emacs' architecture; I always feel compelled to hack on codex and claude, and I wanted something that would let me do that really trivially.
Getting closer, as I think about this it's going to be a very tiny amount of code that should be easy to reason about but really reusable and powerful. I think I've spent more time with pen and paper than with actually typing it out in this case.
There's the core that handles the actual interaction loop with the model, and then there are different UIs I can attach to the core to drive the model behavior, like the REPL which I've defined.
Keeping tools and agents configurable means I should be trivially able to use a sandboxed python instance running in a carefully crafted container (or even remote lambdas if I so choose), and have agents converse with each other while giving them rules and jobs to do. Leaning into asyncio also gives me some flexibility.
I'm going to need to maintain an explicit log and smoe form of state to easily serialize, deserialize and fork contexts; need to spend more time noodling on this. There're so many options open for building truly interesting agent harnesses; I'm sad all of Claude, Codex and Gemini converged to a "browser ui" in the terminal.
I'm still noodling about the design on my head and on paper to figure out how to structure the code. There were a couple of design decisions / principles to guide my choices I came up with:
ego : claude :: emacs : vscode in terms of design and extensibility; I'm leaning on python as the ~lisp of choiceand implementation affecting
Another idea, after trying this out in a project yesterday: I'd really want this to be something that I can embed into any live python project -- and make any tool AI powered almost trivially. Lots of prototyping and exploration to do here!
(I think there's a lot that can be done here by just building the right MCP servers and giving agents access to them -- eg. gdb integration -- but I'm really curious about how far I can get by opening up a Python program's internals to a model.)
The proxy api router doesn't seem like it's approved anymore, so I'll have to switch to explicit API keys. Switching to openrouter to figure this out better.
Finally, I realized that an embeddable version of ego is a fantastic way to build something that can easily inject intelligence into any python program: it can introspect and change behavior, and generate code with the right tools. Python is remarkably reflection friendly, including access to documentation so this should be really powerful.
Oh, and I need to choose a name to ship to Pypi with. ego is already taken, sadly.
General options
The RLM paper has prompts in the appendix
I think next steps are to make a single process agent in python that can independently improve, and then figure out the human interaction model and instrumentation I'm planning to do.
The general idea is to have a very simple python interpreter loop at the heart of the program that maintains state, and can easily be customized, hooked into, and overridden: exactly like emacs. I frequently feel like hacking on Claude Code, and then it's ... tricky? impossible?
The other bit is that I can have the AIs modify the orchestrator and save code in real time: and generally need to make sure this versionining is explicitly maintained so that restarting the interpreter maintains old state correctly. Particularly if the AIs want to make their own Python tools instead of installing shell commands.
I'd like to validate this with small experiments to see how far I can get, and also define an interface that works for this. Presumably users (& agents) will also want to install additional dependencies trivially into this application, so I need to figure that part out too.