Agent.sh — Instant LLM Ops On Any Shell
I’ve always wanted a way to drop onto a random Linux box—fresh Arch VM, prod bastion, whatever—and get LLM superpowers without installing half the internet first. Agent.sh is the answer: a single Bash script that spins up a REPL, proxies an OpenAI-compatible chat endpoint, and can execute shell commands (with my approval) on demand. No daemons, no Python runtimes, no magical background services—just curl, jq, and the terminal.
Grab the code here: github.com/surya-prakash-susarla/agent_sh
Why build it?
Setting up Arch from scratch reminded me how much context-switching it takes to go hunt down obscure flags or man pages. I wanted the model to do the reasoning and let me simply approve the commands. That meant:
- Zero-runtime footprint. Pure Bash + standard utilities so it runs anywhere.
- Explicit control. The model must ask before launching anything, and I get to approve/deny.
- Multi-turn tool loops. The agent should chain commands (run → inspect output → run more) without losing context.
- No persistent services. Just download the script and go.
Architecture in one shot
Everything lives under src/ with a tiny bashly config generating the CLI:
| Layer | What it does | File(s) |
|---|---|---|
| CLI Scaffolding | bashly.yml + build.sh emit the final dist/agent binary. |
bashly.yml, build.sh, dist/agent |
| Entry Point | Prints the resolved config and enters the REPL. | src/root_command.sh |
| System Instruction | Long-form prompt that tells the LLM how to behave (tool JSON format, network-safety rules). | src/lib/instructions.sh |
| REPL Loop | Handles user input, displays labeled blocks, logs history, and mediates tool approvals. | src/lib/repl.sh |
| Network Layer | Posts .agent_conversation to the endpoint, parses tool calls vs. assistant replies, propagates API errors. |
src/lib/network.sh |
| Logging Helpers | Debug output gating (--debug flag). |
src/lib/logging.sh |
Conversation state is just a JSON lines file (.agent_conversation). Each turn gets appended as {role, content} (system/user/assistant/tool). That history is sent back to the LLM every time so the model remembers previous turns.
Tool calls, done right
Originally the loop would re-run the same command if the model tried a multi-step flow. I fixed that by structuring the response from get_response() as a typed JSON object:
{
"type": "tool",
"assistant_message": { ... raw LLM message ... },
"command": "git log -n 5 --oneline"
}
Whenever type is tool, the REPL records the assistant’s request, prints a [Tool Request] block, asks for approval, and only after the command finishes does it append a tool message with stdout/stderr. The loop keeps running until the model sends a normal assistant reply (type: "assistant").
The REPL output is now much cleaner:
[User]
> can you run git log -n 5 --oneline?
[Tool Request]
git log -n 5 --oneline
[Approval] Run this command? (y/n) y
[Assistant]
ab367ff test: Add comprehensive denial and approval tests
fd7a639 test: Implement comprehensive test suite for tool use
...
That formatting makes long sessions easier to follow, especially when the agent fires multiple commands back-to-back.
Web lookups without tears
I added a prompt section telling the model how to use curl/wget safely:
- Always run with
-s,--fail, and timeouts. - Prefer plain text or JSON endpoints (DuckDuckGo Lite, Wikipedia API, etc.).
- No binary downloads unless the user explicitly asks.
- Summarize the result and cite the URL.
- Report failures instead of retrying blindly.
This keeps the shell script lightweight—no complicated HTML parsing—and the model still fetches whatever docs it needs.
Tests you can trust
Everything is exercised via shell scripts under tests/:
test_conversation.shverifies normal chat and memory.test_tool_use.shcovers command execution and approval.test_simple_denial.sh,test_multi_denial.sh,test_recursive_approval.shwalk through denial/approval edge cases.
Each script now reads AGENT_ENDPOINT and AGENT_MODEL from the environment, so you can point the suite at any endpoint without editing code:
export AGENT_ENDPOINT="http://demo-llm.local:11434"
export AGENT_MODEL="gpt-oss:20b"
bash run_tests.sh
Logs drop into outputs/*.log so you can review exactly what happened.
Shipping it
To make this actually usable by other people:
- The built binary (
dist/agent) is versioned in git. - Releases (e.g., v1.0.0) include the binary as a downloadable asset.
- The README now has an onboarding checklist (Download → Configure → Run) plus embedded transcripts of real sessions.
- Everything’s MIT-licensed.
The end result: download a single file, point it at your OpenAI-compatible server, and you instantly have an LLM shell sidekick.
What’s next?
Stuff I’d love to explore:
- Optional helper functions for common web APIs (DuckDuckGo, package search, etc.).
- Alternative tool types (
git,apt,systemctl) routed to specific helper scripts. - Maybe a persistent memory layer for long-running projects.
But even as-is, Agent.sh already solves my day-one VM setup pain. Drop the binary onto a fresh Arch box, set the endpoint, and let the LLM sweat the details while you stay in control.
Grab it here: github.com/surya-prakash-susarla/agent_sh