VAD, Streaming & More

Voice Activity Detection, streaming modes, multi-task queue, Smart Bubble, and visual effects.

Multi-Task Queue

What It Is

Multi-Task Queue lets you process multiple voice inputs simultaneously. Instead of waiting for one transcription/optimization to finish before starting another, you can queue multiple tasks.

How It Works

Start a voice recording -- it begins processing.
While it is processing, start another recording.
Both tasks run in parallel.
Each task shows as a "pill" indicator.
Completed tasks notify you and copy results.

Configuration

Settings > Automation > Multi-Task Queue

Setting	Range	Default	Description
Max Active Tasks	2-10	2	Maximum tasks running simultaneously
Max Local GPU Tasks	1-4	1	Maximum Ollama local tasks (GPU-bound)
Visible Stacked Pills	1-5	1	How many task pills to show in the UI
Task Notifications	On/Off	ON	Show notifications for task events
Notify on Success	On/Off	ON	Notification when a task completes
Notify on Errors	On/Off	ON	Notification when a task fails

Tips

Keep Max Local GPU Tasks at 1 if using large Whisper models (they are GPU-intensive).
Cloud LLM tasks can run in higher parallel since they don't use local resources.
Increase Visible Stacked Pills if you frequently have multiple tasks running.

Smart Bubble

What It Is

The Smart Bubble is a small floating pill that appears on your screen when a long-running task transitions from the main Vocoding window. It lets you monitor task progress without keeping the full window open.

How It Works

You start a voice task (transcribe or optimize).
If processing takes longer than the Transition Threshold, the task moves to a floating pill.
The pill shows task status (processing, complete, error).
When complete, the pill can auto-paste the result or wait for you to click it.
The pill auto-dismisses after the configured timeout.

Configuration

Settings > Automation > Smart Bubble

Setting	Range	Default	Description
Transition Threshold	3-30 seconds	8s	Time before task moves to bubble
Pill Corner	4 options	top-right	Which screen corner
Auto-Dismiss Timeout	10-300s or "Never"	60s	When bubble disappears
Paste Behavior	3 options	smart	How auto-paste works
Tray Recent Results	3-20	5	Results shown in system tray menu

Paste Behavior Options

Option	Behavior
Always	Auto-paste result at cursor immediately
Smart	Auto-paste only if the cursor is in a text field (macOS default)
Never	Only copy to clipboard, never auto-paste

Corner Options

Position	Description
top-right	Top-right corner of screen (default)
top-left	Top-left corner of screen
bottom-right	Bottom-right corner of screen
bottom-left	Bottom-left corner of screen

Voice Activity Detection (VAD)

What It Does

Voice Activity Detection automatically detects when you start and stop speaking. This makes the recording experience more natural -- you don't have to manually click "stop" after finishing your thought.

How It Works

Vocoding continuously monitors the audio input.
When voice is detected above the sensitivity threshold, recording is active.
When silence is detected, VAD can either:
- Continue listening (standard mode)
- Auto-stop after the configured delay (auto-stop mode)

Configuration

Settings > Transcription > VAD Settings

Setting	Description	Default
Enable VAD	Turn voice detection on/off	ON
Sensitivity	How sensitive to voice (0-100%)	50%
Auto-Stop	Stop recording automatically on silence	OFF
Auto-Stop Delay	Wait time after last speech (0.5-5 seconds)	1.5s

Sensitivity Tuning

Level	Best For
Low (0-30%)	Noisy environments -- only picks up clear, loud speech
Medium (30-70%)	Most environments -- good balance
High (70-100%)	Quiet environments -- picks up soft speech

Auto-Stop Tips

Short delay (0.5-1s): For quick commands and short phrases.
Medium delay (1.5-2.5s): For normal speaking with natural pauses.
Long delay (3-5s): For thoughtful speaking with extended pauses between sentences.

Streaming Modes

Streaming modes determine how transcription results are delivered to you in real-time during recording.

Settings > Transcription > Streaming Mode

Mode	Behavior	CPU Usage	Best For
Quiet	Minimal streaming -- shows final result only	Lowest	Battery saving, background use
Balanced	VAD-gated streaming -- shows partial results while you speak	Medium	Daily use (recommended)
Max Performance	Continuous streaming -- real-time partial transcription	Highest	Live demos, when you want to see every word instantly

Choosing the Right Mode

Quiet: Choose this if you don't need to see partial results and want to minimize CPU usage.
Balanced: The default -- shows partial transcription only when you are actively speaking.
Max Performance: Shows every word as it is recognized, but uses more CPU. Great for presentations or when you want immediate visual feedback.

Response Profiles

Response Profiles determine how detailed the AI's responses should be. There are two options, accessible in the header bar.

Fast Profile

Attribute	Value
Temperature	Lower
Style	Concise, focused
Best for	Quick answers, simple code generation, rapid iteration

Deep Profile

Attribute	Value
Temperature	Higher
Style	Thorough, detailed
Best for	Architecture decisions, complex debugging, comprehensive analysis

When to Use Each

Scenario	Recommended
Quick code fix	Fast
New feature design	Deep
Renaming a variable	Fast
Security review	Deep
Simple question	Fast
Learning a new concept	Deep
Rapid prototyping	Fast
Code review	Deep

Click the response profile button in the header bar (between the profile selector and the microphone area) to toggle between Fast and Deep.

Window Chrome & Visual Effects

State-Based Glow

The Vocoding window border shows your current state through a colored glow effect:

State	Glow Color	Meaning
Idle	Subtle gray	Ready for input
Recording	Red/orange pulse	Actively recording your voice
Transcribing	Blue	Converting voice to text
Optimizing	Purple	LLM is processing your prompt

Overlay Banners

Context-sensitive banners appear at the top of the window:

Banner	When It Appears
Trial banner	During trial period -- shows days/credits remaining, upgrade CTA
Error banner	When an error occurs -- red, with dismiss button
Clipboard permission	When clipboard access is needed
Model change	When switching models -- shows previous to new, with undo option
Shortcut busy	When the global shortcut can't be registered

Title Bar

Standard drag region for moving the window.
Native window controls (minimize, maximize, close).
Styled to match the operating system (macOS or Windows native look).

Multi-Task Queue

What It Is

How It Works

Configuration

Tips

Smart Bubble

What It Is

How It Works

Configuration

Paste Behavior Options

Corner Options

Voice Activity Detection (VAD)

What It Does

How It Works

Configuration

Sensitivity Tuning

Auto-Stop Tips

Streaming Modes

Choosing the Right Mode

Response Profiles

Fast Profile

Deep Profile

When to Use Each

Window Chrome & Visual Effects

State-Based Glow

Overlay Banners

Title Bar

On this page