Architecture Overview
Engram uses a hub-and-spoke design. A Python/FastAPI backend acts as the central hub, orchestrating modular "spoke" components through a state machine. A React frontend communicates with the backend via REST API and WebSocket for real-time updates.
High-Level Diagram
React Dashboard
(Dashboard, Review Queue, History, Config Wizard)
|
WebSocket (with heartbeat)
|
FastAPI Backend
|
JobManager (thin orchestrator)
|
+-------------------+-------------------+
| | |
Identification Matching Finalization
Coordinator Coordinator Coordinator
(scan, classify, (subtitle DL, (conflict resolution,
DiscDB, TMDB, fingerprint, file organization,
AI fallback) DiscDB assign) review workflow)
| | |
+-------------------+-------------------+
| |
CleanupService SimulationService
(staging cleanup, (test-only E2E
DiscDB export) simulation)
|
+----------+---------+-----------+-----------+
| | | | |
Sentinel Analyst Extractor Curator Organizer
(drive (TV vs (MakeMKV (episode (file
monitor) movie) wrapper) matching) organization)
The Job Manager is a thin orchestrator (~1,166 lines) that wires coordinators together and manages the job lifecycle. Each coordinator owns a focused stage of the pipeline. Every state transition is persisted in SQLite and broadcast to connected clients over WebSocket.
Backend
The backend lives in backend/app/ and is built with FastAPI.
- Entry point (
main.py) -- FastAPI app with lifespan management, CORS configured for the Vite dev server, and a WebSocket endpoint at/ws. - Config (
config.py) -- Pydantic Settings for server-level overrides (host, port, debug). No.envfile is required; all fields have defaults. - Database (
database.py) -- Async SQLite via SQLModel + aiosqlite. Tables are auto-created on startup. Schema migration usesALTER TABLE ADD COLUMNfor additive changes (preserving job history) and drop/recreate only when columns are removed.
Core Modules (backend/app/core/)
Each module maps to a stage in the disc processing pipeline.
| Module | File | Purpose |
|---|---|---|
| Sentinel | sentinel.py |
Drive monitor. Polls optical drives on Windows using ctypes/kernel32. Fires async callbacks on disc insert/remove events. |
| Analyst | analyst.py |
Disc classification. Heuristic-based TV vs Movie detection using cluster analysis of title durations. Outputs DiscAnalysisResult with content type, confidence score, and whether review is needed. |
| Extractor | extractor.py |
MakeMKV CLI wrapper. Async subprocess management for makemkvcon scanning and ripping. Emits RipProgress callbacks. Tracks processes per job for multi-drive cancel isolation. |
| Curator | curator.py |
Episode matching via audio fingerprinting. Classifies matches into high-confidence (auto-organize) and needs-review buckets. |
| Organizer | organizer.py |
File organization. Moves files from staging to the media library with naming conventions: Movies/Name (Year)/Name (Year).mkv and TV/Show/Season XX/Show - SXXEXX.mkv. |
| DiscDB Classifier | discdb_classifier.py |
TheDiscDB integration. Identifies discs via content hash fingerprinting (MD5 of concatenated BDMV/STREAM file sizes). Provides title-to-episode mappings. |
| TMDB Classifier | tmdb_classifier.py |
TMDB-based content type classification. Uses name similarity and popularity ranking to provide strong TV vs Movie signals beyond the heuristic Analyst. |
| Errors | errors.py |
Custom exception hierarchy (EngramError base, with MakeMKVError, MatchingError, ConfigurationError, OrganizationError, SubtitleError, DatabaseError). Includes @handle_errors decorator. |
| Logging | logging.py |
Centralized logging configuration. |
Services (backend/app/services/)
| Service | File | Purpose |
|---|---|---|
| JobManager | job_manager.py |
Thin orchestrator (~1,166 lines). Wires coordinators together, manages job lifecycle, handles drive events, and coordinates ripping. |
| IdentificationCoordinator | identification_coordinator.py |
Disc scanning, DiscDB/TMDB/AI lookup, classification pipeline. |
| MatchingCoordinator | matching_coordinator.py |
Episode matching, subtitle download, file readiness, DiscDB assignment, extras handling. Owns per-job caches (_discdb_mappings, _episode_runtimes). |
| FinalizationCoordinator | finalization_coordinator.py |
Conflict resolution with cascading reassignment, file organization, review workflow, job completion. |
| CleanupService | cleanup_service.py |
Staging directory cleanup, timed cleanup, TheDiscDB auto-export. |
| SimulationService | simulation_service.py |
All simulation methods for E2E testing (only active when DEBUG=true). |
| JobStateMachine | job_state_machine.py |
Explicit state machine with validated transitions: IDLE -> IDENTIFYING -> RIPPING -> MATCHING -> ORGANIZING -> COMPLETED, with REVIEW_NEEDED and FAILED branching from most states. Fires terminal-state callbacks. |
| EventBroadcaster | event_broadcaster.py |
Abstraction layer for broadcasting events to WebSocket clients. Wraps ConnectionManager with typed, domain-specific methods for each event type. |
| ConfigService | config_service.py |
Configuration service with helper functions for loading and updating config from the database. Caches sync engine for performance. |
Data Models (backend/app/models/)
- DiscJob -- Central entity with
JobStateenum (idle, identifying, review_needed, ripping, matching, organizing, completed, failed) andContentTypeenum (tv, movie, unknown). Key fields includecleared_at(soft-delete from dashboard),completed_at(auto-set on terminal state),content_hash(TheDiscDB fingerprint), anddiscdb_mappings_json(persisted title mappings). - DiscTitle -- Individual title/track on a disc, linked to a job by foreign key. Stores match results (episode code, confidence),
TitleState, edition info, conflict resolution, and organization tracking. - AppConfig -- Persisted application configuration including paths, API keys, and preferences.
Matcher (backend/app/matcher/)
Integrated from the standalone mkv-episode-matcher project.
- ASR (
asr_provider.py) -- Speech recognition using faster-whisper/onnxruntime. - Subtitle matching (
subtitle_provider.py) -- Matches transcribed audio against reference subtitles. - Core engine (
core/engine.py,core/matcher.py) -- Matching engine logic. - Subtitle sources --
addic7ed_client.py,opensubtitles_scraper.py,subtitle_utils.py.
API (backend/app/api/)
routes.py-- REST endpoints under/apiprefix (job CRUD, review actions, config, simulation, staging management, job history, stats, diagnostics).validation.py-- Tool validation endpoints (POST /api/validate/makemkv,POST /api/validate/ffmpeg,GET /api/detect-tools).test_routes.py-- Standalone testing endpoints for subtitle download, transcription, and matching.websocket.py--ConnectionManagersingleton for broadcasting real-time updates to all connected clients.
Frontend
The frontend is a React 18 + TypeScript + Vite single-page application located in frontend/src/. Vite proxies /api and /ws requests to the backend at localhost:8000 during development.
Key libraries: React Router v7, Framer Motion, Recharts, React Hook Form, Tailwind CSS v4, shadcn/ui, Lucide React, Sonner.
| Component | Location | Purpose |
|---|---|---|
| Dashboard | app/App.tsx |
Filterable job card list (Active, Done, All) with real-time progress, speed/ETA, cover art, and browser notifications. Cyberpunk dual-tone cyan/magenta theme. |
| DiscCard | app/components/DiscCard.tsx |
Main job display component with subcomponents for media type badge, disc metadata, action buttons, and poster image hook. |
| ReviewQueue | components/ReviewQueue.tsx |
Human-in-the-Loop UI for resolving ambiguous episode matches and movie edition selection. |
| HistoryPage | components/HistoryPage.tsx |
All completed/failed jobs with stats dashboard, filterable table, slide-out detail panel, and deep-linking via /history/:jobId. |
| ConfigWizard | components/ConfigWizard.tsx |
First-run setup and settings modal for library paths, MakeMKV license, TMDB token, and preferences. |
| Supporting | Various | StateIndicator, CyberpunkProgressBar, TrackGrid, MatchingVisualizer, NamePromptModal. |
Hooks: useJobManagement (job lifecycle + WebSocket), useDiscFilters (job filtering/transformation), useWebSocket (connection management with auto-reconnect).
Database
Engram uses async SQLite via SQLModel + aiosqlite. The database file is stored at backend/engram.db in development mode.
Schema Migration
Engram uses Alembic for versioned database migrations with SQLModel metadata autogeneration.
- Startup behavior:
init_db()runscreate_allfor new databases, then stamps or upgrades via Alembic. - Existing databases: Auto-stamped at head on first startup after upgrading to Alembic.
- AppConfig: Data preserved via backup/restore during schema changes (independent of Alembic).
- SQLite batch mode:
render_as_batch=Trueenables ALTER TABLE operations that SQLite doesn't natively support.
Key Patterns
Async Everywhere
The backend uses async throughout: async SQLAlchemy sessions, asyncio tasks for background jobs, and async subprocess calls for MakeMKV CLI operations.
Singleton Services
Core services are module-level singletons: job_manager, ws_manager, curator, movie_organizer, tv_organizer. They are initialized once and shared across the application.
State Machine Driven
All job lifecycle is tracked through JobState transitions persisted in SQLite. The JobStateMachine validates every transition against a map of allowed state changes, ensuring jobs never enter invalid states.
Subtitle Coordination
Subtitle downloads run in the background during ripping. An asyncio.Event synchronizes the two processes -- matching awaits the event before proceeding, ensuring subtitles are available when needed.
DiscDB Mapping Persistence
TheDiscDB title mappings are serialized as JSON in the discdb_mappings_json column on DiscJob. They are persisted during identification and restored from the database on server startup via _restore_discdb_mappings().
Custom Error Hierarchy
All domain errors extend EngramError with typed subclasses (MakeMKVError, MatchingError, ConfigurationError, etc.). The @handle_errors decorator provides standardized error handling in service methods.
Simulation Endpoints
When DEBUG=true, simulation endpoints allow testing the full workflow without a physical disc. These are used extensively by E2E tests and manual development.