# Stripstream Librarian — Features & Business Rules ## Libraries ### Multi-Library Management - Create and manage multiple independent libraries, each with its own root path - Enable/disable libraries individually - Delete a library cascades to all its books, jobs, and metadata ### Scanning & Indexing - **Incremental scan**: uses directory mtime tracking to skip unchanged directories - **Full rebuild**: force re-walk all directories, ignoring cached mtimes - **Rescan**: deep rescan to discover newly supported formats - **Two-phase pipeline**: - Phase 1 (Discovery): fast filename-based metadata extraction (no archive I/O) - Phase 2 (Analysis): extract page counts, first page image from archives ### Real-Time Monitoring - **Automatic periodic scanning**: configurable interval (default 5 seconds) - **Filesystem watcher**: real-time detection of file changes for instant indexing - Each can be toggled per library (`monitor_enabled`, `watcher_enabled`) --- ## Books ### Format Support - **CBZ** (ZIP-based comic archives) - **CBR** (RAR-based comic archives) - **PDF** - **EPUB** - Automatic format detection from file extension and magic bytes ### Metadata Extraction - **Title**: derived from filename or external metadata - **Series**: derived from directory structure (first directory level under library root) - **Volume**: extracted from filename with pattern detection: - `T##` (Tome) — most common for French comics - `Vol.##`, `Vol ##`, `Volume ##` - `###` (standalone number) - `-## ` (dash-separated) - **Author(s)**: single scalar and array support - **Page count**: extracted from archive analysis - **Language**, **kind** (ebook, comic, bd) ### Thumbnails - Generated from the first page of each archive - Output format configurable: WebP (default), JPEG, PNG - Configurable dimensions (default 300×400) - Lazy generation: created on first access if missing - Bulk operations: rebuild missing or regenerate all ### CBR to CBZ Conversion - Convert RAR archives to ZIP format - Tracked as background job with progress --- ## Series ### Automatic Aggregation - Series derived from directory structure during scanning - Books without series grouped as "unclassified" ### Series Metadata - Description, publisher, start year, status (`ongoing`, `ended`, `completed`, `on_hold`, `hiatus`) - Total volume count (from external providers) - Authors (aggregated from books or metadata) ### Filtering & Discovery - Filter by: series name (partial match), reading status, series status, metadata provider linkage - Sort by: name, reading status, book count - **Missing books detection**: identifies gaps in volume numbering within a series --- ## Reading Progress ### Per-Book Tracking - Three states: `unread` (default), `reading`, `read` - Current page tracking when status is `reading` - `last_read_at` timestamp auto-updated ### Series-Level Status - Calculated from book statuses: - All read → series `read` - None read → series `unread` - Mixed → series `reading` ### Bulk Operations - Mark entire series as read (updates all books) --- ## Search & Discovery ### Full-Text Search - PostgreSQL-based (`ILIKE` + `pg_trgm`) - Searches across: book titles, series names, authors (scalar and array fields), series metadata authors - Case-insensitive partial matching - Library-scoped filtering ### Results - Book hits: title, authors, series, volume, language, kind - Series hits: name, book count, read count, first book (for linking) - Processing time included in response --- ## Authors - Unique author aggregation from books and series metadata - Per-author book and series count - Searchable by name (partial match) - Sortable by name or book count --- ## External Metadata ### Supported Providers | Provider | Focus | |----------|-------| | Google Books | General books (default fallback) | | ComicVine | Comics | | BedéThèque | Franco-Belgian comics | | AniList | Manga/anime | | Open Library | General books | ### Provider Configuration - Global default provider with library-level override - Fallback provider if primary is unavailable ### Matching Workflow 1. **Search**: query a provider, get candidates with confidence scores 2. **Match**: link a series to an external result (status `pending`) 3. **Approve**: validate and sync metadata to series and books 4. **Reject**: discard a match ### Batch Processing - Auto-match all series in a library via `metadata_batch` job - Configurable confidence threshold - Result statuses: `auto_matched`, `no_results`, `too_many_results`, `low_confidence`, `already_linked` ### Metadata Refresh - Update approved links with latest data from providers - Change tracking reports per series/book - Non-destructive: only updates when provider has new data ### Field Locking - Individual book fields can be locked to prevent external sync from overwriting manual edits --- ## External Integrations ### Komga Sync - Import reading progress from a Komga server - Matches local series/books by name - Detailed sync report: matched, already read, newly marked, unmatched ### Prowlarr (Indexer Search) - Search Prowlarr for missing volumes in a series - Volume pattern matching against release titles - Results: title, size, seeders/leechers, download URL, matched missing volumes ### qBittorrent - Add torrents directly from Prowlarr search results - Connection test endpoint --- ## Page Rendering & Caching ### Page Extraction - Render any page from supported archive formats - 1-indexed page numbers ### Image Processing - Output formats: original, JPEG, PNG, WebP - Quality parameter (1–100) - Max width parameter (1–2160 px) - Configurable resampling filter: lanczos3, nearest, triangle/bilinear - Concurrent render limit (default 8) with semaphore ### Caching - **LRU in-memory cache**: 512 entries - **Disk cache**: SHA256-keyed, two-level directory structure - Cache key = hash(path + page + format + quality + width) - Configurable cache directory and max size - Manual cache clear via settings --- ## Background Jobs ### Job Types | Type | Description | |------|-------------| | `rebuild` | Incremental scan | | `full_rebuild` | Full filesystem rescan | | `rescan` | Deep rescan for new formats | | `thumbnail_rebuild` | Generate missing thumbnails | | `thumbnail_regenerate` | Clear and regenerate all thumbnails | | `cbr_to_cbz` | Convert RAR to ZIP | | `metadata_batch` | Auto-match series to metadata | | `metadata_refresh` | Update approved metadata links | ### Job Lifecycle - Status flow: `pending` → `running` → `success` | `failed` | `cancelled` - Intermediate statuses: `extracting_pages`, `generating_thumbnails` - Real-time progress via **Server-Sent Events** (SSE) - Per-file error tracking (non-fatal: job continues on errors) - Cancellation support for pending/running jobs ### Progress Tracking - Percentage (0–100), current file, processed/total counts - Timing: started_at, finished_at, phase2_started_at - Stats JSON blob with job-specific metrics --- ## Authentication & Security ### Token System - **Bootstrap token**: admin token via `API_BOOTSTRAP_TOKEN` env var - **API tokens**: create, list, revoke with scopes - Token format: `stl_{prefix}_{secret}` with Argon2 hashing - Expiration dates, last usage tracking, revocation ### Access Control - **Two scopes**: `admin` (full access) and `read` (read-only) - Route-level middleware enforcement - Rate limiting: configurable sliding window (default 120 req/s) --- ## Backoffice (Web UI) ### Dashboard - Statistics cards: books, series, authors, libraries - Donut charts: reading status breakdown, format distribution - Bar charts: books per language - Per-library reading progress bars - Top series by book/page count - Monthly addition timeline - Metadata coverage stats ### Pages - **Libraries**: list, create, delete, configure monitoring and metadata provider - **Books**: global list with filtering/sorting, detail view with metadata and page rendering - **Series**: global list, per-library view, detail with metadata management - **Authors**: list with book/series counts, detail with author's books - **Jobs**: history, live progress via SSE, error details - **Tokens**: create, list, revoke API tokens - **Settings**: image processing, cache, thumbnails, external services (Prowlarr, qBittorrent) ### Interactive Features - Real-time search with suggestions - Metadata search and matching modals - Prowlarr search modal for missing volumes - Folder browser/picker for library paths - Book/series editing forms - Quick reading status toggles - CBR to CBZ conversion trigger --- ## API ### Documentation - OpenAPI/Swagger UI available at `/swagger-ui` - Health check (`/health`), readiness (`/ready`), Prometheus metrics (`/metrics`) ### Public Endpoints (no auth) - `GET /health`, `GET /ready`, `GET /metrics`, `GET /swagger-ui` ### Read Endpoints (read scope) - Libraries, books, series, authors listing and detail - Book pages and thumbnails - Reading progress get/update - Full-text search, collection statistics ### Admin Endpoints (admin scope) - Library CRUD and configuration - Book metadata editing, CBR conversion - Series metadata editing - Indexing job management (trigger, cancel, stream) - API token management - Metadata operations (search, match, approve, reject, batch, refresh) - External integrations (Prowlarr, qBittorrent, Komga) - Application settings and cache management --- ## Database ### Key Design Decisions - PostgreSQL with `pg_trgm` for full-text search (no external search engine) - All deletions cascade from libraries - Unique constraints: file paths, token prefixes, metadata links (library + series + provider) - Directory mtime caching for incremental scan optimization - Connection pool: 10 (API), 20 (indexer) ### Archive Resilience - CBZ: fallback streaming reader if central directory corrupted - CBR: RAR extraction via system `unar`, fallback to CBZ parsing - PDF: `pdfinfo` for page count, `pdftoppm` for rendering - EPUB: ZIP-based extraction - FD exhaustion detection: aborts if too many consecutive IO errors