We say 95% accuracy. That number is word error rate measured against manually transcribed ground truth across languages, accents, recording conditions, and speaker types.
In practice: in a 1,000-word recording, approximately 50 words will be wrong. Errors cluster around proper nouns, field-specific terminology, heavy regional accents, and poor audio quality.
For search, this matters less than it sounds. You are searching 'rent control’, not ‘Councillor Singh’. The model is unlikely to miss ‘rent control’ in a clean recording.
For verbatim quotation, always verify against the source. Ugle makes this fast — click any result and the audio plays from that timestamp.
| Condition | Accuracy |
|---|---|
| Studio-quality, single speaker | 97–99% |
| Standard broadcast audio | 95–97% |
| Video conference, single speaker | 94–96% |
| Phone recording | 85–88% |
| Multiple overlapping speakers | 80–88% |
| Consistent background noise (>60dB) | 75–85% |
We benchmark every model update against the same test set. If accuracy drops, we do not ship. The 95% figure is a floor, not an aspiration.