VAD, Streaming & More
Voice Activity Detection, streaming modes, multi-task queue, Smart Bubble, and visual effects.
Multi-Task Queue
What It Is
Multi-Task Queue lets you process multiple voice inputs simultaneously. Instead of waiting for one transcription/optimization to finish before starting another, you can queue multiple tasks.
How It Works
- Start a voice recording -- it begins processing.
- While it is processing, start another recording.
- Both tasks run in parallel.
- Each task shows as a "pill" indicator.
- Completed tasks notify you and copy results.
Configuration
Settings > Automation > Multi-Task Queue
| Setting | Range | Default | Description |
|---|---|---|---|
| Max Active Tasks | 2-10 | 2 | Maximum tasks running simultaneously |
| Max Local GPU Tasks | 1-4 | 1 | Maximum Ollama local tasks (GPU-bound) |
| Visible Stacked Pills | 1-5 | 1 | How many task pills to show in the UI |
| Task Notifications | On/Off | ON | Show notifications for task events |
| Notify on Success | On/Off | ON | Notification when a task completes |
| Notify on Errors | On/Off | ON | Notification when a task fails |
Tips
- Keep Max Local GPU Tasks at 1 if using large Whisper models (they are GPU-intensive).
- Cloud LLM tasks can run in higher parallel since they don't use local resources.
- Increase Visible Stacked Pills if you frequently have multiple tasks running.
Smart Bubble
What It Is
The Smart Bubble is a small floating pill that appears on your screen when a long-running task transitions from the main Vocoding window. It lets you monitor task progress without keeping the full window open.
How It Works
- You start a voice task (transcribe or optimize).
- If processing takes longer than the Transition Threshold, the task moves to a floating pill.
- The pill shows task status (processing, complete, error).
- When complete, the pill can auto-paste the result or wait for you to click it.
- The pill auto-dismisses after the configured timeout.
Configuration
Settings > Automation > Smart Bubble
| Setting | Range | Default | Description |
|---|---|---|---|
| Transition Threshold | 3-30 seconds | 8s | Time before task moves to bubble |
| Pill Corner | 4 options | top-right | Which screen corner |
| Auto-Dismiss Timeout | 10-300s or "Never" | 60s | When bubble disappears |
| Paste Behavior | 3 options | smart | How auto-paste works |
| Tray Recent Results | 3-20 | 5 | Results shown in system tray menu |
Paste Behavior Options
| Option | Behavior |
|---|---|
| Always | Auto-paste result at cursor immediately |
| Smart | Auto-paste only if the cursor is in a text field (macOS default) |
| Never | Only copy to clipboard, never auto-paste |
Corner Options
| Position | Description |
|---|---|
| top-right | Top-right corner of screen (default) |
| top-left | Top-left corner of screen |
| bottom-right | Bottom-right corner of screen |
| bottom-left | Bottom-left corner of screen |
Voice Activity Detection (VAD)
What It Does
Voice Activity Detection automatically detects when you start and stop speaking. This makes the recording experience more natural -- you don't have to manually click "stop" after finishing your thought.
How It Works
- Vocoding continuously monitors the audio input.
- When voice is detected above the sensitivity threshold, recording is active.
- When silence is detected, VAD can either:
- Continue listening (standard mode)
- Auto-stop after the configured delay (auto-stop mode)
Configuration
Settings > Transcription > VAD Settings
| Setting | Description | Default |
|---|---|---|
| Enable VAD | Turn voice detection on/off | ON |
| Sensitivity | How sensitive to voice (0-100%) | 50% |
| Auto-Stop | Stop recording automatically on silence | OFF |
| Auto-Stop Delay | Wait time after last speech (0.5-5 seconds) | 1.5s |
Sensitivity Tuning
| Level | Best For |
|---|---|
| Low (0-30%) | Noisy environments -- only picks up clear, loud speech |
| Medium (30-70%) | Most environments -- good balance |
| High (70-100%) | Quiet environments -- picks up soft speech |
Auto-Stop Tips
- Short delay (0.5-1s): For quick commands and short phrases.
- Medium delay (1.5-2.5s): For normal speaking with natural pauses.
- Long delay (3-5s): For thoughtful speaking with extended pauses between sentences.
Streaming Modes
Streaming modes determine how transcription results are delivered to you in real-time during recording.
Settings > Transcription > Streaming Mode
| Mode | Behavior | CPU Usage | Best For |
|---|---|---|---|
| Quiet | Minimal streaming -- shows final result only | Lowest | Battery saving, background use |
| Balanced | VAD-gated streaming -- shows partial results while you speak | Medium | Daily use (recommended) |
| Max Performance | Continuous streaming -- real-time partial transcription | Highest | Live demos, when you want to see every word instantly |
Choosing the Right Mode
- Quiet: Choose this if you don't need to see partial results and want to minimize CPU usage.
- Balanced: The default -- shows partial transcription only when you are actively speaking.
- Max Performance: Shows every word as it is recognized, but uses more CPU. Great for presentations or when you want immediate visual feedback.
Response Profiles
Response Profiles determine how detailed the AI's responses should be. There are two options, accessible in the header bar.
Fast Profile
| Attribute | Value |
|---|---|
| Temperature | Lower |
| Style | Concise, focused |
| Best for | Quick answers, simple code generation, rapid iteration |
Deep Profile
| Attribute | Value |
|---|---|
| Temperature | Higher |
| Style | Thorough, detailed |
| Best for | Architecture decisions, complex debugging, comprehensive analysis |
When to Use Each
| Scenario | Recommended |
|---|---|
| Quick code fix | Fast |
| New feature design | Deep |
| Renaming a variable | Fast |
| Security review | Deep |
| Simple question | Fast |
| Learning a new concept | Deep |
| Rapid prototyping | Fast |
| Code review | Deep |
Click the response profile button in the header bar (between the profile selector and the microphone area) to toggle between Fast and Deep.
Window Chrome & Visual Effects
State-Based Glow
The Vocoding window border shows your current state through a colored glow effect:
| State | Glow Color | Meaning |
|---|---|---|
| Idle | Subtle gray | Ready for input |
| Recording | Red/orange pulse | Actively recording your voice |
| Transcribing | Blue | Converting voice to text |
| Optimizing | Purple | LLM is processing your prompt |
Overlay Banners
Context-sensitive banners appear at the top of the window:
| Banner | When It Appears |
|---|---|
| Trial banner | During trial period -- shows days/credits remaining, upgrade CTA |
| Error banner | When an error occurs -- red, with dismiss button |
| Clipboard permission | When clipboard access is needed |
| Model change | When switching models -- shows previous to new, with undo option |
| Shortcut busy | When the global shortcut can't be registered |
Title Bar
- Standard drag region for moving the window.
- Native window controls (minimize, maximize, close).
- Styled to match the operating system (macOS or Windows native look).