Settings
Transcription
Configure Whisper models, language, Voice Activity Detection, and streaming modes.
All voice transcription in Vocoding is processed 100% locally using Whisper -- no audio data ever leaves your device.
Whisper Model Selection
Choose which Whisper model to use for voice transcription.
| Model | Size | Speed | Accuracy | Languages |
|---|---|---|---|---|
| tiny | 75 MB | Fastest | Basic | Multi |
| tiny.en | 75 MB | Fastest | Basic | English only |
| base | 142 MB | Fast | Good | Multi |
| base.en | 142 MB | Fast | Good | English only |
| small | 466 MB | Medium | Better | Multi |
| small.en | 466 MB | Medium | Better | English only |
| medium | 1.5 GB | Slow | High | Multi |
| large-v3 | 2.9 GB | Slowest | Best | Multi |
| large-v3-turbo | 1.5 GB | Fast | High | Multi |
Status Badges
Each model shows a status badge:
- Active -- Currently selected and ready
- Activate -- Downloaded but not selected (click to activate)
- Download -- Not yet downloaded (click to download)
Model Storage Location
All models are stored locally at ~/.vocoding/models/.
Language
Select the transcription language:
| Option | Description |
|---|---|
| Auto Detect | Whisper detects the language automatically |
| English | Force English recognition |
| Espanol | Force Spanish recognition |
| Francais | Force French recognition |
| Deutsch | Force German recognition |
| Italiano | Force Italian recognition |
| Portugues | Force Portuguese recognition |
| Chinese | Force Chinese recognition |
| Japanese | Force Japanese recognition |
| Korean | Force Korean recognition |
Voice Activity Detection (VAD)
VAD detects when you start and stop speaking, making the recording experience more natural.
| Setting | Description | Default |
|---|---|---|
| Enable VAD | Turn VAD on or off | ON |
| Sensitivity | How sensitive to voice (0-100%) | 50% |
| Auto-Stop | Automatically stop recording when silence detected | OFF |
| Auto-Stop Delay | How long to wait after silence (0.5s - 5s) | 1.5s |
Auto-Stop Delay slider only appears when Auto-Stop is enabled.
Sensitivity Tuning
| Level | Best For |
|---|---|
| Low (0-30%) | Noisy environments -- only picks up clear, loud speech |
| Medium (30-70%) | Most environments -- good balance |
| High (70-100%) | Quiet environments -- picks up soft speech |
Auto-Stop Delay Tips
- Short delay (0.5-1s): For quick commands and short phrases
- Medium delay (1.5-2.5s): For normal speaking with natural pauses
- Long delay (3-5s): For thoughtful speaking with extended pauses between sentences
Streaming Modes
Controls how transcription results are delivered in real-time during recording.
| Mode | Description | CPU Usage | Best For |
|---|---|---|---|
| Quiet | Minimal streaming -- shows final result only | Lower | Battery saving, background use |
| Balanced | VAD-gated streaming -- shows partial results while speaking | Medium | Daily use (recommended) |
| Max Performance | Continuous streaming -- real-time partial transcription | Higher | Live demos, immediate visual feedback |
Choosing the Right Mode
- Quiet: Choose this if you don't need to see partial results and want to minimize CPU usage
- Balanced: The default -- shows partial transcription only when you are actively speaking
- Max Performance: Shows every word as it's recognized, but uses more CPU. Great for presentations or when you want immediate visual feedback