Design guide

Classification: Internal · Version: 1.0 · Audience: Product, design, engineering · Primary focus: UX, key bindings, wireframes

UX patterns and interaction design for the Voice Hotkey application. Infrastructure context is in the Executive overview; technical requirements are in the Software requirements.

1. Interaction model

Voice Hotkey uses a double-tap gesture on a modifier key to toggle recording. This avoids accidental activation from single key presses while remaining fast to trigger.

1.1 Default flow

  1. Idle: Daemon listens for key events in the background.
  2. Double-tap Right Ctrl: Recording begins. Terminal prints status.
  3. Double-tap Right Ctrl again (or timeout at 30s): Recording stops.
  4. Transcription: Whisper processes the audio buffer.
  5. Output: Transcribed text is copied to the system clipboard.

1.2 Feedback (MVP)

MVP feedback is terminal-only: status messages printed to stdout. Future versions may add desktop notifications, a system tray icon, or an overlay indicator.

2. Key binding design

Keypynput nameNotes
Right ControlKey.ctrl_rDefault. Rarely used alone; low conflict risk.
Right AltKey.alt_rMay conflict with AltGr on some layouts.
Super / WindowsKey.cmdOften captured by desktop environments.
FnVariesHardware-level; may not generate key events.
Copilot keyVariesNew hardware; keycode mapping may be needed.

2.1 Double-click timing

Default interval: 400 ms. This is consistent with typical OS double-click speed settings. Configurable via double_click_ms in config.

3. Whisper model guidance

ModelParametersSpeedAccuracyRAM
tiny39MFastestLower~1 GB
base74MFastGood~1 GB
small244MModerateBetter~2 GB
medium769MSlowerHigh~5 GB
large1550MSlowestHighest~10 GB

Recommendation: start with base for a balance of speed and accuracy. Upgrade to small or medium if quality is insufficient.

4. Output targets

5. Wireframe reference

See mockups in the Voice Hotkey site:

← Back to Voice Hotkey