Executive overview

Classification: Internal · Version: 1.0 · Audience: Leadership, sponsors · Primary focus: Product & architecture alignment

This is the main program document for the Voice Hotkey initiative. It frames the product vision, architecture, and delivery model. Technical depth is in the Software requirements; UX detail is in the Design guide.

1. Executive summary

Voice Hotkey is a Linux desktop application that binds a configurable modifier key (default: Right Control) as a double-click trigger to record voice, transcribe it locally using OpenAI Whisper, and copy the resulting text to the clipboard. The MVP is a headless daemon—no GUI required.

2. Strategic objectives

  1. Frictionless voice-to-text on Linux via a single keyboard gesture.
  2. Local-first transcription: no cloud dependency in the default configuration.
  3. Minimal footprint: background process, configurable via TOML file.
  4. Extensible: future support for TTS, autocomplete, autocorrect, and multi-platform.

3. Architecture (executive view)

LayerDecision
Language Python 3.9+ — ecosystem maturity for audio, ML, and input capture.
Key capture pynput — global keyboard listener (X11/XWayland).
Audio sounddevice + numpy — real-time microphone capture.
Transcription openai-whisper — local inference, no API key required.
Output pyperclipxclip / wl-copy — clipboard.
Config TOML at ~/.config/voice-hotkey/config.toml.

4. Governance

5. Investment & risk

6. Roadmap

  1. MVP: Right Control double-click → record → Whisper → clipboard.
  2. Phase 2: Configurable keys, model selection, language detection.
  3. Phase 3: TTS, autocomplete, autocorrect, active-window injection.
  4. Phase 4: Multi-platform support (Windows, macOS).
← Back to Voice Hotkey