Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.5.0 (2026-03-04)
Added
- Undo Last Dictation (TF-258):
dictation.undocommand (Cmd+Zcontext-aware) reverts the last dictation insertion. Works in both editor (via document edit reversal) and terminal (via backspace sequence). Single-level undo with automatic expiry after new edits. - Dictation History with Full-Text Search (TF-264): Persistent dictation history stored via
globalState. Browse recent dictations via Quick Pick (dictation.showHistory), search across raw transcript and cleaned text (dictation.searchHistory). Actions: insert at cursor, copy to clipboard, show details. Configurable max entries (verba.history.maxEntries, default 500). Privacy: history stays local, never sent to APIs. - Continuous Dictation with Deepgram Streaming (TF-260): Longer dictation sessions using Deepgram Nova-3 WebSocket streaming. ffmpeg captures microphone audio (raw PCM to stdout), piped directly to Deepgram's real-time transcription API. Deepgram's built-in VAD handles pause detection and utterance segmentation. Each completed utterance goes through Claude cleanup, then insertion. New command
dictation.startContinuous(Cmd+Shift+Alt+D). Per-utterance undo and history records. - Multi-Language Auto-Detection (TF-261): Deepgram
language: 'multi'enables automatic language detection for both single-shot and continuous transcription. Detected language is passed through to Claude post-processing for language-appropriate cleanup. Manual override viaverba.languagesetting. - Retry Logic for API Overload (TF-273): Exponential backoff with up to 3 retries when Claude API returns HTTP 529 (overloaded). Backoff delays: 2s, 4s, 8s. Prevents dictation loss during transient API capacity issues.
Changed
- Deepgram Consolidation (TF-272): Replaced OpenAI Whisper API with Deepgram Nova-3 pre-recorded API for single-shot dictation. Both single-shot and continuous now use Deepgram (shared API key).
openainpm package retained for embeddings only. Cost reduced from $0.006/min (Whisper) to $0.0043/min (Deepgram Nova-3). - Deepgram keyterm token limit reduced to prevent request failures with large glossaries
- Language configuration passed through from transcription service to Deepgram API
Fixed
- API key deletion now checks for HTTP 401/403 status codes instead of deleting on any transcription error
- Undo records cleared correctly after new editor edits to prevent stale undo operations
- History persistence errors handled gracefully without disrupting the dictation pipeline
- Resource leak fixed when continuous recorder fails to start (status bar listener now disposed)
- Unreachable auth-detection code removed from continuous segment error handler
- Duplicate
historyManagerimport statements removed
[0.4.0] - 2026-03-02
Added
- API Key Management:
dictation.manageApiKeyscommand to view (masked), update, or delete stored OpenAI and Anthropic API keys via the Command Palette. - LLM Cost Tracking & Overview (TF-270):
dictation.showCostOverviewcommand opens a WebView panel showing per-model API usage costs. Tracks Whisper transcription (by audio duration), Claude processing (by input/output tokens), and OpenAI Embeddings (by prompt tokens). Costs displayed per session and accumulated across all sessions, grouped by provider (OpenAI / Anthropic) in a card layout with VS Code theme support. Total costs reset automatically on the 1st of each month; older records are retained in storage. - Adaptive Personal Dictionary (TF-263):
dictation.generateGlossarycommand scans workspace for project-specific terms (package names, class/interface/function names, README/CLAUDE.md headings and bold terms). Users review suggestions via Multi-Select Quick Pick before merging into.verba-glossary.json. Supports TypeScript, Java, and Python projects. - Multi-Cursor / Selection-aware Dictation (TF-265): Dictation now respects editor selections and multi-cursors. When text is selected, the dictated output replaces the selection; with multi-cursors, text is inserted at all cursor positions simultaneously. Selected text is passed as context to Claude post-processing via
<selection>tags. New "Transform Selection" default template for voice-driven text transformation (e.g. translate, refactor, explain). - Text Expansion / Abbreviations (TF-262): User-defined abbreviations that are automatically expanded during post-processing. Configure via
verba.expansionssetting (e.g."mfg"→"Mit freundlichen Grüssen") or workspace-specific.verba-expansions.json. Global and workspace expansions are merged, with workspace entries taking precedence for the same abbreviation. - File-Type-Aware Templates (TF-259): Templates can now define a
fileTypesarray with VS Code language IDs (e.g.["java", "kotlin"]). Whenverba.autoSelectTemplateis enabled (default), the template matching the active editor's file type is automatically selected. Falls back to the last manually chosen template when no match is found. Built-in defaults: JavaDoc → java/kotlin, Markdown → markdown.
Changed
- Release automation migrated from manual workflow to release-please for automatic versioning, changelog generation, and GitHub Releases
[0.3.0] - 2026-02-26
Added
- Offline Transcription (TF-257): Local transcription via whisper.cpp CLI as an alternative to OpenAI Whisper API. Audio never leaves the machine. Strategy pattern on
TranscriptionServicewithsetProvider('openai'|'local'). - Model Download: GGML models from Hugging Face via
dictation.downloadModelcommand with progress indicator and cancellation support. Model selection: tiny, base, small, medium, large. - Provider Display in Status Bar: Tooltip shows active provider (OpenAI Whisper / Local whisper.cpp), transcribing state displays provider explicitly.
- Streaming Post-Processing: Claude responses are received via streaming with real-time progress display in the status bar (e.g. "Processing... 182 chars"). Dictation can be cancelled during processing by pressing the shortcut again.
- Course Correction: Self-corrections in dictation are automatically detected and removed (e.g. "no wait, actually Friday" → "Friday"). Active in all modes (freeform and templates).
- Voice Commands: Spoken formatting commands are recognized and applied (e.g. "New paragraph", "Period", "Bullet point"). Works language-independently in all modes.
- Glossary/Dictionary: Terms (product names, technical terms, abbreviations) are preserved exactly during transcription and cleanup. Global terms via
verba.glossarysetting, project-specific via.verba-glossary.json. - JSDoc Documentation: All public APIs across 13 source files documented.
Fixed
- SIGKILL escalation for hanging whisper-cli processes
- Provider validation with fallback for invalid settings
- Minimum file size check after model download
Changed
- Marketplace homepage link updated to
talent-factory.xyz - Marketplace category changed from
OthertoSnippets
[0.2.0] - 2026-02-24
Added
- Configurable Prompt Templates (TF-248): 5 default templates (Freeform, Commit Message, JavaDoc, Markdown, Email) with full customization via
verba.templatessetting - Context-Aware Dictation with semantic code search via grepai or OpenAI Embeddings
- 3 context-aware templates: Code Comment, Explain Code, Claude Code Prompt
dictation.indexProjectcommand to build a local embeddings index for context searchverba.contextSearch.providersetting (auto,grepai,openai) andverba.contextSearch.maxResultssetting- GitHub Actions changelog preview workflow: posts a categorized changelog preview as a PR comment when opening PRs to
main - Template Quick-Pick selection menu integrated into dictation workflow
- Template auto-reuse: last used template is automatically reused, Quick Pick only on first use
dictation.selectTemplatecommand (Cmd+Alt+T/Ctrl+Alt+T) to change template without starting a recording- Status bar shows active template name in idle state
- Pipeline context architecture for dynamic system prompt passing to Claude
- Cross-Platform Audio Device Selection: macOS (AVFoundation), Linux (PulseAudio), Windows (DirectShow)
dictation.selectAudioDevicecommand for microphone selection on any platformverba.audioDevicesetting for manual microphone configuration- Windows-specific ffmpeg search paths (Chocolatey, Scoop, WinGet, Program Files)
- PowerShell fallback (
Win32_SoundDevice) when ffmpeg finds no Windows audio devices - Marketplace Publishing: extension icon (SVG/PNG), screenshots, workflow GIF
- Open-source governance files: MIT License, Code of Conduct, Security Policy
- Semantic-release configuration with emoji-aware conventional commit parsing
- GitHub Actions release workflow for automated versioning on merge to
main
Fixed
- Template shortcut changed from
Cmd+Shift+TtoCmd+Alt+Tto avoid conflict with VS Code's built-in shortcut - CHANGELOG.md included in VSIX package for the Marketplace Changelog tab
- Git identity configured for GitHub Actions release commits (
github-actions[bot]) - Extension bundled with esbuild to fix activation failure
- Transcript wrapped in XML tags to prevent Claude generating conversational responses instead of clean output
- Terminal focus detection rewritten with two separate commands and mutually exclusive
whenclauses - Template prompt wrapped with framing context for reliable output format
- Windows audio recording: robust device detection supporting both ffmpeg v7 (section-based) and v8+ (inline
(audio)) output formats - Audio device selection enabled for macOS and Linux (previously only Windows)
make devworkflow compatible with Windows/Cygwininstall:localnpm script now works cross-platform
[0.1.1] - 2026-02-20
Added
- Terminal dictation:
dictation.startFromTerminalcommand inserts dictated text into the active terminal verba.terminal.executeCommandsetting to auto-execute dictated text as a terminal command- Shared keybinding
Cmd+Shift+D/Ctrl+Shift+Dwith context-aware routing (terminal vs editor) - Whisper silence/hallucination detection (recordings with only dots/ellipsis are rejected)
- Debug logging throughout the transcription and cleanup pipeline