The first version of what became Ugle uploaded files to a server. We built it that way because it was easier — existing transcription APIs, no on-device model to maintain, faster to ship.
We showed it to an investigative journalist we respect. She looked at the interface and said, without hesitating:
“Where does the audio go?”
We told her. She closed the laptop.
That was the answer. Not the architecture decision — the moment before it. The fact that the first question a professional journalist asked about a tool for her source recordings was about data location told us everything about what kind of tool this needed to be.
The cloud version worked. Fast, accurate, reasonably priced. It also required trust we had no right to ask for — trust that the server was secure, that data was not being used to train models, that a legal demand would not reveal a source. Not unreasonable conditions for a transcription tool. Unreasonable conditions for a journalism tool.
On-device transcription was technically harder. Larger models. Tighter engineering constraints. We wrote a custom inference pipeline, managed model distribution, and solved for hardware variation across consumer machines.
Worth it. Not because local-first is a feature. Because for the workflows Ugle is built for, it is the only architecture that respects the user's actual situation.
Your archive never leaves your machine.