// Project

Cora AI
About this project
A real-time AI voice assistant that orchestrates speech-to-text, LLM inference, and text-to-speech into a single low-latency pipeline for fluid voice conversations. Features wake word activation, interruptible and patient conversation modes, and a model-agnostic architecture supporting Claude, GPT, Gemini, and local models.
Architecture Overview
Microphone → Wake Word Engine (Picovoice, runs locally) → Streaming STT (Deepgram Nova-2) → LLM with streaming inference (Claude/GPT/Gemini) → Streaming TTS (ElevenLabs) → Speaker. A Turn Manager enforces user-first priority across all stages, with conversation mode logic controlling interrupt behavior.
Key Features
Voice Pipeline
End-to-end streaming pipeline — Deepgram Nova-2 for speech-to-text, Claude/GPT for inference, ElevenLabs for text-to-speech — targeting sub-1-second total latency.
Interruptible Conversations
Speak mid-response and Cora immediately stops, listens, and responds with full context of what it already said — mimicking natural human conversation.
Model Agnostic
Swap the underlying LLM (Claude, GPT-4o, Gemini, or local models via Ollama) without rebuilding the pipeline. The architecture is designed to be provider-independent.
Wake Word Activation
Always-on local wake word detection via Picovoice Porcupine. No audio leaves the device until activation — privacy by design.
Three Conversation Modes
Interruptible mode for fast-paced Q&A, Patient mode for long instructions, and Text mode for traditional chat — all sharing one unified conversation history.
Cross-Platform
Desktop app (Electron), mobile app, and headless CLI mode. Minimal always-on-top floating window with waveform visualizer and status indicators.