Vocoding
Vocoding Docs
Advanced Features

VAD, Streaming & More

Voice Activity Detection, streaming modes, multi-task queue, Smart Bubble, and visual effects.

Multi-Task Queue

What It Is

Multi-Task Queue lets you process multiple voice inputs simultaneously. Instead of waiting for one transcription/optimization to finish before starting another, you can queue multiple tasks.

How It Works

  1. Start a voice recording -- it begins processing.
  2. While it is processing, start another recording.
  3. Both tasks run in parallel.
  4. Each task shows as a "pill" indicator.
  5. Completed tasks notify you and copy results.

Configuration

Settings > Automation > Multi-Task Queue

SettingRangeDefaultDescription
Max Active Tasks2-102Maximum tasks running simultaneously
Max Local GPU Tasks1-41Maximum Ollama local tasks (GPU-bound)
Visible Stacked Pills1-51How many task pills to show in the UI
Task NotificationsOn/OffONShow notifications for task events
Notify on SuccessOn/OffONNotification when a task completes
Notify on ErrorsOn/OffONNotification when a task fails

Tips

  • Keep Max Local GPU Tasks at 1 if using large Whisper models (they are GPU-intensive).
  • Cloud LLM tasks can run in higher parallel since they don't use local resources.
  • Increase Visible Stacked Pills if you frequently have multiple tasks running.

Smart Bubble

What It Is

The Smart Bubble is a small floating pill that appears on your screen when a long-running task transitions from the main Vocoding window. It lets you monitor task progress without keeping the full window open.

How It Works

  1. You start a voice task (transcribe or optimize).
  2. If processing takes longer than the Transition Threshold, the task moves to a floating pill.
  3. The pill shows task status (processing, complete, error).
  4. When complete, the pill can auto-paste the result or wait for you to click it.
  5. The pill auto-dismisses after the configured timeout.

Configuration

Settings > Automation > Smart Bubble

SettingRangeDefaultDescription
Transition Threshold3-30 seconds8sTime before task moves to bubble
Pill Corner4 optionstop-rightWhich screen corner
Auto-Dismiss Timeout10-300s or "Never"60sWhen bubble disappears
Paste Behavior3 optionssmartHow auto-paste works
Tray Recent Results3-205Results shown in system tray menu

Paste Behavior Options

OptionBehavior
AlwaysAuto-paste result at cursor immediately
SmartAuto-paste only if the cursor is in a text field (macOS default)
NeverOnly copy to clipboard, never auto-paste

Corner Options

PositionDescription
top-rightTop-right corner of screen (default)
top-leftTop-left corner of screen
bottom-rightBottom-right corner of screen
bottom-leftBottom-left corner of screen

Voice Activity Detection (VAD)

What It Does

Voice Activity Detection automatically detects when you start and stop speaking. This makes the recording experience more natural -- you don't have to manually click "stop" after finishing your thought.

How It Works

  1. Vocoding continuously monitors the audio input.
  2. When voice is detected above the sensitivity threshold, recording is active.
  3. When silence is detected, VAD can either:
    • Continue listening (standard mode)
    • Auto-stop after the configured delay (auto-stop mode)

Configuration

Settings > Transcription > VAD Settings

SettingDescriptionDefault
Enable VADTurn voice detection on/offON
SensitivityHow sensitive to voice (0-100%)50%
Auto-StopStop recording automatically on silenceOFF
Auto-Stop DelayWait time after last speech (0.5-5 seconds)1.5s

Sensitivity Tuning

LevelBest For
Low (0-30%)Noisy environments -- only picks up clear, loud speech
Medium (30-70%)Most environments -- good balance
High (70-100%)Quiet environments -- picks up soft speech

Auto-Stop Tips

  • Short delay (0.5-1s): For quick commands and short phrases.
  • Medium delay (1.5-2.5s): For normal speaking with natural pauses.
  • Long delay (3-5s): For thoughtful speaking with extended pauses between sentences.

Streaming Modes

Streaming modes determine how transcription results are delivered to you in real-time during recording.

Settings > Transcription > Streaming Mode

ModeBehaviorCPU UsageBest For
QuietMinimal streaming -- shows final result onlyLowestBattery saving, background use
BalancedVAD-gated streaming -- shows partial results while you speakMediumDaily use (recommended)
Max PerformanceContinuous streaming -- real-time partial transcriptionHighestLive demos, when you want to see every word instantly

Choosing the Right Mode

  • Quiet: Choose this if you don't need to see partial results and want to minimize CPU usage.
  • Balanced: The default -- shows partial transcription only when you are actively speaking.
  • Max Performance: Shows every word as it is recognized, but uses more CPU. Great for presentations or when you want immediate visual feedback.

Response Profiles

Response Profiles determine how detailed the AI's responses should be. There are two options, accessible in the header bar.

Fast Profile

AttributeValue
TemperatureLower
StyleConcise, focused
Best forQuick answers, simple code generation, rapid iteration

Deep Profile

AttributeValue
TemperatureHigher
StyleThorough, detailed
Best forArchitecture decisions, complex debugging, comprehensive analysis

When to Use Each

ScenarioRecommended
Quick code fixFast
New feature designDeep
Renaming a variableFast
Security reviewDeep
Simple questionFast
Learning a new conceptDeep
Rapid prototypingFast
Code reviewDeep

Click the response profile button in the header bar (between the profile selector and the microphone area) to toggle between Fast and Deep.


Window Chrome & Visual Effects

State-Based Glow

The Vocoding window border shows your current state through a colored glow effect:

StateGlow ColorMeaning
IdleSubtle grayReady for input
RecordingRed/orange pulseActively recording your voice
TranscribingBlueConverting voice to text
OptimizingPurpleLLM is processing your prompt

Overlay Banners

Context-sensitive banners appear at the top of the window:

BannerWhen It Appears
Trial bannerDuring trial period -- shows days/credits remaining, upgrade CTA
Error bannerWhen an error occurs -- red, with dismiss button
Clipboard permissionWhen clipboard access is needed
Model changeWhen switching models -- shows previous to new, with undo option
Shortcut busyWhen the global shortcut can't be registered

Title Bar

  • Standard drag region for moving the window.
  • Native window controls (minimize, maximize, close).
  • Styled to match the operating system (macOS or Windows native look).