Skip to content

Architecture Overview

Engram uses a hub-and-spoke design. A Python/FastAPI backend acts as the central hub, orchestrating modular "spoke" components through a state machine. A React frontend communicates with the backend via REST API and WebSocket for real-time updates.

High-Level Diagram

                        React Dashboard
          (Dashboard, Review Queue, History, Config Wizard)
                             |
                          WebSocket (with heartbeat)
                             |
                        FastAPI Backend
                             |
                   JobManager (thin orchestrator)
                             |
         +-------------------+-------------------+
         |                   |                   |
  Identification      Matching           Finalization
  Coordinator         Coordinator        Coordinator
   (scan, classify,   (subtitle DL,      (conflict resolution,
    DiscDB, TMDB,      fingerprint,       file organization,
    AI fallback)       DiscDB assign)     review workflow)
         |                   |                   |
         +-------------------+-------------------+
         |                                       |
    CleanupService                       SimulationService
    (staging cleanup,                    (test-only E2E
     DiscDB export)                      simulation)
         |
        +----------+---------+-----------+-----------+
        |          |         |           |           |
    Sentinel   Analyst   Extractor   Curator    Organizer
     (drive    (TV vs    (MakeMKV    (episode   (file
     monitor)  movie)    wrapper)    matching)  organization)

The Job Manager is a thin orchestrator (~1,166 lines) that wires coordinators together and manages the job lifecycle. Each coordinator owns a focused stage of the pipeline. Every state transition is persisted in SQLite and broadcast to connected clients over WebSocket.


Backend

The backend lives in backend/app/ and is built with FastAPI.

  • Entry point (main.py) -- FastAPI app with lifespan management, CORS configured for the Vite dev server, and a WebSocket endpoint at /ws.
  • Config (config.py) -- Pydantic Settings for server-level overrides (host, port, debug). No .env file is required; all fields have defaults.
  • Database (database.py) -- Async SQLite via SQLModel + aiosqlite. Tables are auto-created on startup. Schema migration uses ALTER TABLE ADD COLUMN for additive changes (preserving job history) and drop/recreate only when columns are removed.

Core Modules (backend/app/core/)

Each module maps to a stage in the disc processing pipeline.

Module File Purpose
Sentinel sentinel.py Drive monitor. Polls optical drives on Windows using ctypes/kernel32. Fires async callbacks on disc insert/remove events.
Analyst analyst.py Disc classification. Heuristic-based TV vs Movie detection using cluster analysis of title durations. Outputs DiscAnalysisResult with content type, confidence score, and whether review is needed.
Extractor extractor.py MakeMKV CLI wrapper. Async subprocess management for makemkvcon scanning and ripping. Emits RipProgress callbacks. Tracks processes per job for multi-drive cancel isolation.
Curator curator.py Episode matching via audio fingerprinting. Classifies matches into high-confidence (auto-organize) and needs-review buckets.
Organizer organizer.py File organization. Moves files from staging to the media library with naming conventions: Movies/Name (Year)/Name (Year).mkv and TV/Show/Season XX/Show - SXXEXX.mkv.
DiscDB Classifier discdb_classifier.py TheDiscDB integration. Identifies discs via content hash fingerprinting (MD5 of concatenated BDMV/STREAM file sizes). Provides title-to-episode mappings.
TMDB Classifier tmdb_classifier.py TMDB-based content type classification. Uses name similarity and popularity ranking to provide strong TV vs Movie signals beyond the heuristic Analyst.
Errors errors.py Custom exception hierarchy (EngramError base, with MakeMKVError, MatchingError, ConfigurationError, OrganizationError, SubtitleError, DatabaseError). Includes @handle_errors decorator.
Logging logging.py Centralized logging configuration.

Services (backend/app/services/)

Service File Purpose
JobManager job_manager.py Thin orchestrator (~1,166 lines). Wires coordinators together, manages job lifecycle, handles drive events, and coordinates ripping.
IdentificationCoordinator identification_coordinator.py Disc scanning, DiscDB/TMDB/AI lookup, classification pipeline.
MatchingCoordinator matching_coordinator.py Episode matching, subtitle download, file readiness, DiscDB assignment, extras handling. Owns per-job caches (_discdb_mappings, _episode_runtimes).
FinalizationCoordinator finalization_coordinator.py Conflict resolution with cascading reassignment, file organization, review workflow, job completion.
CleanupService cleanup_service.py Staging directory cleanup, timed cleanup, TheDiscDB auto-export.
SimulationService simulation_service.py All simulation methods for E2E testing (only active when DEBUG=true).
JobStateMachine job_state_machine.py Explicit state machine with validated transitions: IDLE -> IDENTIFYING -> RIPPING -> MATCHING -> ORGANIZING -> COMPLETED, with REVIEW_NEEDED and FAILED branching from most states. Fires terminal-state callbacks.
EventBroadcaster event_broadcaster.py Abstraction layer for broadcasting events to WebSocket clients. Wraps ConnectionManager with typed, domain-specific methods for each event type.
ConfigService config_service.py Configuration service with helper functions for loading and updating config from the database. Caches sync engine for performance.

Data Models (backend/app/models/)

  • DiscJob -- Central entity with JobState enum (idle, identifying, review_needed, ripping, matching, organizing, completed, failed) and ContentType enum (tv, movie, unknown). Key fields include cleared_at (soft-delete from dashboard), completed_at (auto-set on terminal state), content_hash (TheDiscDB fingerprint), and discdb_mappings_json (persisted title mappings).
  • DiscTitle -- Individual title/track on a disc, linked to a job by foreign key. Stores match results (episode code, confidence), TitleState, edition info, conflict resolution, and organization tracking.
  • AppConfig -- Persisted application configuration including paths, API keys, and preferences.

Matcher (backend/app/matcher/)

Integrated from the standalone mkv-episode-matcher project.

  • ASR (asr_provider.py) -- Speech recognition using faster-whisper/onnxruntime.
  • Subtitle matching (subtitle_provider.py) -- Matches transcribed audio against reference subtitles.
  • Core engine (core/engine.py, core/matcher.py) -- Matching engine logic.
  • Subtitle sources -- addic7ed_client.py, opensubtitles_scraper.py, subtitle_utils.py.

API (backend/app/api/)

  • routes.py -- REST endpoints under /api prefix (job CRUD, review actions, config, simulation, staging management, job history, stats, diagnostics).
  • validation.py -- Tool validation endpoints (POST /api/validate/makemkv, POST /api/validate/ffmpeg, GET /api/detect-tools).
  • test_routes.py -- Standalone testing endpoints for subtitle download, transcription, and matching.
  • websocket.py -- ConnectionManager singleton for broadcasting real-time updates to all connected clients.

Frontend

The frontend is a React 18 + TypeScript + Vite single-page application located in frontend/src/. Vite proxies /api and /ws requests to the backend at localhost:8000 during development.

Key libraries: React Router v7, Framer Motion, Recharts, React Hook Form, Tailwind CSS v4, shadcn/ui, Lucide React, Sonner.

Component Location Purpose
Dashboard app/App.tsx Filterable job card list (Active, Done, All) with real-time progress, speed/ETA, cover art, and browser notifications. Cyberpunk dual-tone cyan/magenta theme.
DiscCard app/components/DiscCard.tsx Main job display component with subcomponents for media type badge, disc metadata, action buttons, and poster image hook.
ReviewQueue components/ReviewQueue.tsx Human-in-the-Loop UI for resolving ambiguous episode matches and movie edition selection.
HistoryPage components/HistoryPage.tsx All completed/failed jobs with stats dashboard, filterable table, slide-out detail panel, and deep-linking via /history/:jobId.
ConfigWizard components/ConfigWizard.tsx First-run setup and settings modal for library paths, MakeMKV license, TMDB token, and preferences.
Supporting Various StateIndicator, CyberpunkProgressBar, TrackGrid, MatchingVisualizer, NamePromptModal.

Hooks: useJobManagement (job lifecycle + WebSocket), useDiscFilters (job filtering/transformation), useWebSocket (connection management with auto-reconnect).


Database

Engram uses async SQLite via SQLModel + aiosqlite. The database file is stored at backend/engram.db in development mode.

Schema Migration

Engram uses Alembic for versioned database migrations with SQLModel metadata autogeneration.

  • Startup behavior: init_db() runs create_all for new databases, then stamps or upgrades via Alembic.
  • Existing databases: Auto-stamped at head on first startup after upgrading to Alembic.
  • AppConfig: Data preserved via backup/restore during schema changes (independent of Alembic).
  • SQLite batch mode: render_as_batch=True enables ALTER TABLE operations that SQLite doesn't natively support.

Key Patterns

Async Everywhere

The backend uses async throughout: async SQLAlchemy sessions, asyncio tasks for background jobs, and async subprocess calls for MakeMKV CLI operations.

Singleton Services

Core services are module-level singletons: job_manager, ws_manager, curator, movie_organizer, tv_organizer. They are initialized once and shared across the application.

State Machine Driven

All job lifecycle is tracked through JobState transitions persisted in SQLite. The JobStateMachine validates every transition against a map of allowed state changes, ensuring jobs never enter invalid states.

Subtitle Coordination

Subtitle downloads run in the background during ripping. An asyncio.Event synchronizes the two processes -- matching awaits the event before proceeding, ensuring subtitles are available when needed.

DiscDB Mapping Persistence

TheDiscDB title mappings are serialized as JSON in the discdb_mappings_json column on DiscJob. They are persisted during identification and restored from the database on server startup via _restore_discdb_mappings().

Custom Error Hierarchy

All domain errors extend EngramError with typed subclasses (MakeMKVError, MatchingError, ConfigurationError, etc.). The @handle_errors decorator provides standardized error handling in service methods.

Simulation Endpoints

When DEBUG=true, simulation endpoints allow testing the full workflow without a physical disc. These are used extensively by E2E tests and manual development.