Architecture

Verba follows a pipeline architecture where each stage transforms the data before passing it to the next.

Processing Pipeline

Microphone → ffmpeg (WAV) → Deepgram API → Claude API → Editor/Terminal
                                            (Template)

Recording — ffmpeg captures audio from the microphone as a WAV file.
Transcription — The WAV file is sent to Deepgram's Nova-3 API, which returns raw text.
Post-Processing — The transcript is sent to Claude with the active template's prompt. Context-aware templates include code snippets from the semantic search.
Insertion — The processed text is inserted at the cursor position in the editor, or pasted into the terminal.

Module Overview

Module	Responsibility
`extension.ts`	Extension entry point, command registration, activation
`recorder.ts`	ffmpeg child process for audio recording (macOS/Linux/Windows)
`transcriptionService.ts`	Transcription via Deepgram pre-recorded API or local whisper.cpp CLI (glossary hints)
`cleanupService.ts`	Anthropic Claude API integration (streaming, course correction, voice commands, glossary, text expansions)
`pipeline.ts`	Processing stage orchestration
`templatePicker.ts`	Quick Pick menu for template selection with auto-reuse
`insertText.ts`	Text insertion into editor or terminal (multi-cursor, selection replacement)
`statusBarManager.ts`	Status bar display (Idle/Recording/Transcribing/Processing with character counter)
`costTracker.ts`	API usage cost tracking with persistence via globalState
`costOverviewPanel.ts`	WebView panel for cost overview (card layout, session/total toggle)
`wavDuration.ts`	WAV file duration calculation from PCM header (for Deepgram cost tracking)
`glossaryGenerator.ts`	Scans workspace for project-specific glossary terms (metadata, symbols, docs)
`historyManager.ts`	Dictation history with globalState persistence and full-text search
`historyCommands.ts`	Quick Pick UI for browsing, searching, and acting on history entries
`continuousRecorder.ts`	Deepgram WebSocket streaming, ffmpeg audio capture, EventEmitter
`undoManager.ts`	Single-level undo for dictation insertions (editor + terminal)
`contextProvider.ts`	Unified context search abstraction
`grepaiProvider.ts`	grepai CLI wrapper for semantic code search
`embeddingService.ts`	OpenAI text-embedding-3-small for local embeddings
`indexer.ts`	File chunking and incremental index updates
`vectorStore.ts`	In-memory vector store with cosine similarity search

Context-Aware Pipeline

For context-aware templates, the pipeline includes an additional step before post-processing:

Transcript → Context Search → Claude API (transcript + code snippets) → Result

The context search uses one of two providers:

grepai — External CLI tool that provides semantic search over the codebase.
OpenAI Embeddings — Local vector store built from chunked project files, queried via cosine similarity.

Transcription Provider

Verba uses Deepgram Nova-3 for cloud transcription (both single-shot and continuous mode). This replaced OpenAI Whisper after systematic evaluation of 7 providers -- Whisper's hallucination problem on short audio segments made it unreliable for continuous dictation. Deepgram was chosen for its built-in VAD, WebSocket streaming, lower cost ($0.0043/min vs $0.006/min), and minimal hallucinations.

Local offline transcription via whisper.cpp remains available as an alternative.

For the full evaluation and decision rationale, see ADR: Deepgram Migration.

Cross-Platform Audio

The recorder.ts module handles platform differences:

Platform	Audio Framework	Device Listing
macOS	AVFoundation	`ffmpeg -f avfoundation -list_devices`
Linux	PulseAudio	`pactl list sources`
Windows	DirectShow	`ffmpeg -f dshow -list_devices` + PowerShell fallback