BLOG / December 15, 2025

Why I built a voice-controlled terminal

I don't memorize bash commands. I talk to LLMs in plain English. So I built a terminal that lets me do that with my voice.

@jamditis

7 min read

I should start with a confession: I'm not a developer. Not in the traditional sense, anyway. I don't have a CS degree. I can't write a sorting algorithm from memory. If you asked me to grep something a year ago, I'd have stared at you.

What I am is a journalist and media researcher who got interested in AI tools, started using Claude Code in the terminal, and gradually realized I was spending most of my day talking to an LLM through a command line interface.

And I was doing it by typing. Slowly. With lots of typos.

The actual problem

Here's what my workflow looked like before AudioBash:

Open a terminal
Run claude to start Claude Code
Type out a long request in plain English
Wait for the response
Type another long request
Repeat for hours

I wasn't running ls -la or find . -name "*.tsx". I was writing things like "can you look at the settings component and figure out why the API key field isn't saving when I close the panel." Full sentences. Paragraphs, sometimes.

I can say that out loud in about four seconds. Typing it takes thirty.

There were voice dictation tools out there. macOS has built-in dictation. There are apps like Wispr and Talon. But none of them did what I wanted, which was: press a button, talk, and have the words appear directly in my terminal session. No clipboard. No copy-paste. No switching windows.

So I built one

AudioBash started on December 10, 2025, as a weekend project. The idea was simple: an Electron app with an embedded terminal, a microphone button, and a transcription service. You press a hotkey, talk, and your words get written into the terminal as if you'd typed them.

The first version did exactly that and nothing else. One terminal. One microphone button. Gemini for transcription. It worked.

Then I kept using it and kept finding things that didn't work well enough.

The agent mode gap

Here's the thing about using an LLM in the terminal: most of the time, you're not giving it bash commands. You're giving it instructions in English. "Look at this file." "Fix the bug in the login flow." "Run the tests and tell me what broke."

But sometimes the transcription service would hear me say "list the files in this directory" and transcribe it literally: list the files in this directory. Which isn't a command. The shell doesn't know what to do with that.

What I needed was an agent mode — a way for the transcription to understand context. If Claude Code is running in the terminal, just send my words as-is, because Claude understands English. But if I'm at a raw shell prompt, I might actually need the transcription service to figure out that "list the files in this directory" means ls -la.

So I built two modes:

Raw mode — transcribes exactly what you say, sends it straight to the terminal. Use this when you're talking to Claude Code, Gemini CLI, or any LLM.
Agent mode — sends your speech plus context (current directory, recent terminal output) to an LLM, which decides whether to pass it through as English or translate it into an actual shell command.

Agent mode is for people like me. People who know what they want to do but don't have the command memorized. I can say "find all the TypeScript files that import React" and agent mode turns that into the right grep or find command. I don't have to know the flags. I don't have to look it up.

Most voice coding tools assume you know the commands and just want to say them faster. AudioBash assumes you might not know the commands at all, and that's fine. You're still a developer if your primary language is English and your compiler is an LLM.

Who this is for

If you're a senior backend engineer who thinks in bash and types 120 WPM, you probably don't need this. You're already fast.

AudioBash is for a different kind of developer. The kind that's growing fast right now:

People who build with AI assistants (Claude Code, Gemini CLI, Copilot) and spend most of their time writing natural language prompts
People who came to coding through a non-traditional path and don't have years of shell muscle memory
People who think faster than they type
People who want to keep their hands on the keyboard but use their voice for the long instructions

There's a name for this style of development now: vibe coding. Andrej Karpathy coined it. The idea is that you describe what you want in natural language and let AI handle the implementation. Whether or not you like the term, the practice is real and growing. A huge number of people are building software by talking to LLMs, and most of them are doing it by typing into a chat window or a terminal.

Talking is faster.

What happened next

Within the first week, AudioBash went from a single terminal with a mic button to something I was using all day:

Split view so I could have Claude Code in one pane and a running dev server in another
Custom vocabulary mapping (so "next js" transcribes as "Next.js" instead of "next JS")
CLI notification chimes that play a sound when a tool needs my input, so I can look away from the screen
A preview pane for seeing localhost output without switching to a browser

Every feature came from the same place: something annoyed me during my own workflow, so I fixed it. I'd describe the problem to Claude Code, talk through the solution, and ship a new version. Sometimes two or three versions in a day.

Five days in, I'd shipped seven releases.

The part where I admit I don't know what I'm doing

I want to be honest about something. I'm building this app with Claude Code. The AI writes most of the code. I describe what I want, review what it produces, test it, and ship it. I understand the architecture at a high level. I can read the code and follow what it's doing. But I'm not sitting here writing React components from scratch.

Some people would say that means I'm not a "real" developer. I don't care about that argument. The app works. It solves a problem I have. Other people might find it useful too. The code is open source — anyone can look at it, improve it, or tell me I'm doing it wrong.

What I do bring is a clear sense of what the tool should do, because I'm the target user. I'm not building for an imagined persona. I'm building for me, right now, every day.

What's next

I'm working on macOS support (I switch between a Windows desktop and a MacBook). After that, I want to get the pane system right — tmux-style splits where each terminal can have its own voice routing.

The code is at github.com/jamditis/audiobash. If you use Claude Code or Gemini CLI and you're tired of typing long prompts, give it a shot. Press Alt+S, talk, and see what happens.

The best dev tool is the one that matches how you already work. If you think in English and talk to LLMs all day, your input method should support that.