Design guide

Classification: Internal · Version: 1.0 · Audience: Product, design, engineering · Primary focus: UX, key bindings, wireframes

UX patterns and interaction design for the Voice Hotkey application. Infrastructure context is in the Executive overview; technical requirements are in the Software requirements.

1. Interaction model

Voice Hotkey uses a double-tap gesture on a modifier key to toggle recording. This avoids accidental activation from single key presses while remaining fast to trigger.

1.1 Default flow

Idle: Daemon listens for key events in the background.
Double-tap Right Ctrl: Recording begins. Terminal prints status.
Double-tap Right Ctrl again (or timeout at 30s): Recording stops.
Transcription: Whisper processes the audio buffer.
Output: Transcribed text is copied to the system clipboard.

1.2 Feedback (MVP)

MVP feedback is terminal-only: status messages printed to stdout. Future versions may add desktop notifications, a system tray icon, or an overlay indicator.

2. Key binding design

Key	pynput name	Notes
Right Control	`Key.ctrl_r`	Default. Rarely used alone; low conflict risk.
Right Alt	`Key.alt_r`	May conflict with AltGr on some layouts.
Super / Windows	`Key.cmd`	Often captured by desktop environments.
Fn	Varies	Hardware-level; may not generate key events.
Copilot key	Varies	New hardware; keycode mapping may be needed.

2.1 Double-click timing

Default interval: 400 ms. This is consistent with typical OS double-click speed settings. Configurable via double_click_ms in config.

3. Whisper model guidance

Model	Parameters	Speed	Accuracy	RAM
tiny	39M	Fastest	Lower	~1 GB
base	74M	Fast	Good	~1 GB
small	244M	Moderate	Better	~2 GB
medium	769M	Slower	High	~5 GB
large	1550M	Slowest	Highest	~10 GB

Recommendation: start with base for a balance of speed and accuracy. Upgrade to small or medium if quality is insufficient.

4. Output targets

Clipboard (MVP): Text is copied via pyperclip. User pastes with Ctrl+V.
Active window (future): Type text directly into focused app via xdotool type or wtype.
Buffer (future): Hold text in an internal buffer for review before committing.

5. Wireframe reference

See mockups in the Voice Hotkey site:

Settings: Key binding selector, model picker, output target.
Recording flow: Step-by-step flow from trigger to clipboard.
Future features: TTS, autocomplete, autocorrect placeholders.

← Back to Voice Hotkey