VoiceGenius

Voice-first workflows and agentic UX for mobile and web.

TL;DR: A voice-first product surface designed for speed, clarity, and real-world usage-not demos.

TL;DR: A voice-first product surface designed for speed, clarity, and real-world usage-not demos.


What It Does

Most voice interfaces are built as novelty layers on top of existing UIs. VoiceGenius takes the opposite approach: voice is the primary input, and the interface adapts around it. The system captures intent through natural speech, routes it through agentic workflows, and returns structured, actionable outputs - task creation, data entry, search, navigation - without requiring the user to tap through forms or menus.

Built for professionals working in the field - inspectors, adjusters, clinicians - where hands-free operation isn’t a convenience, it’s a constraint.

Role & Scope

End-to-end ownership: product architecture, voice pipeline design, mobile UX, and the agentic layer that bridges raw transcription to structured action. The core challenge was building a system that feels responsive and accurate enough that users trust it over manual input, which meant treating latency and error recovery as first-class design problems, not afterthoughts.

Technical Notes

The pipeline breaks into three stages: capture, interpretation, and execution. Transcription feeds into an intent classifier backed by structured schemas, which routes to domain-specific agents. Each agent validates its output against expected shapes before committing any action - no “best guess” writes to production data.

Key decisions:

  • Local-first buffering on mobile to handle spotty connectivity without losing input
  • Confidence thresholds with graceful fallback - low-confidence intents surface a confirmation step rather than silently failing or guessing
  • Streaming partial results so the user sees the system working, which dramatically reduces perceived latency even when round-trips are slow

The hard part was never the transcription. It was building an interpretation layer that handles the ambiguity of real speech - half-finished sentences, corrections mid-thought, domain jargon - without becoming brittle.

Outcomes

Reduced average task-entry time by more than half compared to the existing form-based flow. Field users adopted voice as their default input method within weeks, which was the real signal - adoption without mandate means the tool actually works. Error rates on structured data entry dropped, largely because the schema validation layer caught mistakes that manual entry never would.