Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

0.5.0 (2026-03-04)

Added

Undo Last Dictation (TF-258): dictation.undo command (Cmd+Z context-aware) reverts the last dictation insertion. Works in both editor (via document edit reversal) and terminal (via backspace sequence). Single-level undo with automatic expiry after new edits.
Dictation History with Full-Text Search (TF-264): Persistent dictation history stored via globalState. Browse recent dictations via Quick Pick (dictation.showHistory), search across raw transcript and cleaned text (dictation.searchHistory). Actions: insert at cursor, copy to clipboard, show details. Configurable max entries (verba.history.maxEntries, default 500). Privacy: history stays local, never sent to APIs.
Continuous Dictation with Deepgram Streaming (TF-260): Longer dictation sessions using Deepgram Nova-3 WebSocket streaming. ffmpeg captures microphone audio (raw PCM to stdout), piped directly to Deepgram's real-time transcription API. Deepgram's built-in VAD handles pause detection and utterance segmentation. Each completed utterance goes through Claude cleanup, then insertion. New command dictation.startContinuous (Cmd+Shift+Alt+D). Per-utterance undo and history records.
Multi-Language Auto-Detection (TF-261): Deepgram language: 'multi' enables automatic language detection for both single-shot and continuous transcription. Detected language is passed through to Claude post-processing for language-appropriate cleanup. Manual override via verba.language setting.
Retry Logic for API Overload (TF-273): Exponential backoff with up to 3 retries when Claude API returns HTTP 529 (overloaded). Backoff delays: 2s, 4s, 8s. Prevents dictation loss during transient API capacity issues.

Changed

Deepgram Consolidation (TF-272): Replaced OpenAI Whisper API with Deepgram Nova-3 pre-recorded API for single-shot dictation. Both single-shot and continuous now use Deepgram (shared API key). openai npm package retained for embeddings only. Cost reduced from $0.006/min (Whisper) to $0.0043/min (Deepgram Nova-3).
Deepgram keyterm token limit reduced to prevent request failures with large glossaries
Language configuration passed through from transcription service to Deepgram API

Fixed

API key deletion now checks for HTTP 401/403 status codes instead of deleting on any transcription error
Undo records cleared correctly after new editor edits to prevent stale undo operations
History persistence errors handled gracefully without disrupting the dictation pipeline
Resource leak fixed when continuous recorder fails to start (status bar listener now disposed)
Unreachable auth-detection code removed from continuous segment error handler
Duplicate historyManager import statements removed

[0.4.0] - 2026-03-02

Added

API Key Management: dictation.manageApiKeys command to view (masked), update, or delete stored OpenAI and Anthropic API keys via the Command Palette.
LLM Cost Tracking & Overview (TF-270): dictation.showCostOverview command opens a WebView panel showing per-model API usage costs. Tracks Whisper transcription (by audio duration), Claude processing (by input/output tokens), and OpenAI Embeddings (by prompt tokens). Costs displayed per session and accumulated across all sessions, grouped by provider (OpenAI / Anthropic) in a card layout with VS Code theme support. Total costs reset automatically on the 1st of each month; older records are retained in storage.
Adaptive Personal Dictionary (TF-263): dictation.generateGlossary command scans workspace for project-specific terms (package names, class/interface/function names, README/CLAUDE.md headings and bold terms). Users review suggestions via Multi-Select Quick Pick before merging into .verba-glossary.json. Supports TypeScript, Java, and Python projects.
Multi-Cursor / Selection-aware Dictation (TF-265): Dictation now respects editor selections and multi-cursors. When text is selected, the dictated output replaces the selection; with multi-cursors, text is inserted at all cursor positions simultaneously. Selected text is passed as context to Claude post-processing via <selection> tags. New "Transform Selection" default template for voice-driven text transformation (e.g. translate, refactor, explain).
Text Expansion / Abbreviations (TF-262): User-defined abbreviations that are automatically expanded during post-processing. Configure via verba.expansions setting (e.g. "mfg" → "Mit freundlichen Grüssen") or workspace-specific .verba-expansions.json. Global and workspace expansions are merged, with workspace entries taking precedence for the same abbreviation.
File-Type-Aware Templates (TF-259): Templates can now define a fileTypes array with VS Code language IDs (e.g. ["java", "kotlin"]). When verba.autoSelectTemplate is enabled (default), the template matching the active editor's file type is automatically selected. Falls back to the last manually chosen template when no match is found. Built-in defaults: JavaDoc → java/kotlin, Markdown → markdown.

Changed

Release automation migrated from manual workflow to release-please for automatic versioning, changelog generation, and GitHub Releases

[0.3.0] - 2026-02-26

Added

Offline Transcription (TF-257): Local transcription via whisper.cpp CLI as an alternative to OpenAI Whisper API. Audio never leaves the machine. Strategy pattern on TranscriptionService with setProvider('openai'|'local').
Model Download: GGML models from Hugging Face via dictation.downloadModel command with progress indicator and cancellation support. Model selection: tiny, base, small, medium, large.
Provider Display in Status Bar: Tooltip shows active provider (OpenAI Whisper / Local whisper.cpp), transcribing state displays provider explicitly.
Streaming Post-Processing: Claude responses are received via streaming with real-time progress display in the status bar (e.g. "Processing... 182 chars"). Dictation can be cancelled during processing by pressing the shortcut again.
Course Correction: Self-corrections in dictation are automatically detected and removed (e.g. "no wait, actually Friday" → "Friday"). Active in all modes (freeform and templates).
Voice Commands: Spoken formatting commands are recognized and applied (e.g. "New paragraph", "Period", "Bullet point"). Works language-independently in all modes.
Glossary/Dictionary: Terms (product names, technical terms, abbreviations) are preserved exactly during transcription and cleanup. Global terms via verba.glossary setting, project-specific via .verba-glossary.json.
JSDoc Documentation: All public APIs across 13 source files documented.

Fixed

SIGKILL escalation for hanging whisper-cli processes
Provider validation with fallback for invalid settings
Minimum file size check after model download

Changed

Marketplace homepage link updated to talent-factory.xyz
Marketplace category changed from Other to Snippets

[0.2.0] - 2026-02-24

Added

Configurable Prompt Templates (TF-248): 5 default templates (Freeform, Commit Message, JavaDoc, Markdown, Email) with full customization via verba.templates setting
Context-Aware Dictation with semantic code search via grepai or OpenAI Embeddings
3 context-aware templates: Code Comment, Explain Code, Claude Code Prompt
dictation.indexProject command to build a local embeddings index for context search
verba.contextSearch.provider setting (auto, grepai, openai) and verba.contextSearch.maxResults setting
GitHub Actions changelog preview workflow: posts a categorized changelog preview as a PR comment when opening PRs to main
Template Quick-Pick selection menu integrated into dictation workflow
Template auto-reuse: last used template is automatically reused, Quick Pick only on first use
dictation.selectTemplate command (Cmd+Alt+T / Ctrl+Alt+T) to change template without starting a recording
Status bar shows active template name in idle state
Pipeline context architecture for dynamic system prompt passing to Claude
Cross-Platform Audio Device Selection: macOS (AVFoundation), Linux (PulseAudio), Windows (DirectShow)
dictation.selectAudioDevice command for microphone selection on any platform
verba.audioDevice setting for manual microphone configuration
Windows-specific ffmpeg search paths (Chocolatey, Scoop, WinGet, Program Files)
PowerShell fallback (Win32_SoundDevice) when ffmpeg finds no Windows audio devices
Marketplace Publishing: extension icon (SVG/PNG), screenshots, workflow GIF
Open-source governance files: MIT License, Code of Conduct, Security Policy
Semantic-release configuration with emoji-aware conventional commit parsing
GitHub Actions release workflow for automated versioning on merge to main

Fixed

Template shortcut changed from Cmd+Shift+T to Cmd+Alt+T to avoid conflict with VS Code's built-in shortcut
CHANGELOG.md included in VSIX package for the Marketplace Changelog tab
Git identity configured for GitHub Actions release commits (github-actions[bot])
Extension bundled with esbuild to fix activation failure
Transcript wrapped in XML tags to prevent Claude generating conversational responses instead of clean output
Terminal focus detection rewritten with two separate commands and mutually exclusive when clauses
Template prompt wrapped with framing context for reliable output format
Windows audio recording: robust device detection supporting both ffmpeg v7 (section-based) and v8+ (inline (audio)) output formats
Audio device selection enabled for macOS and Linux (previously only Windows)
make dev workflow compatible with Windows/Cygwin
install:local npm script now works cross-platform

[0.1.1] - 2026-02-20

Added

Terminal dictation: dictation.startFromTerminal command inserts dictated text into the active terminal
verba.terminal.executeCommand setting to auto-execute dictated text as a terminal command
Shared keybinding Cmd+Shift+D / Ctrl+Shift+D with context-aware routing (terminal vs editor)
Whisper silence/hallucination detection (recordings with only dots/ellipsis are rejected)
Debug logging throughout the transcription and cleanup pipeline