docs: add comprehensive features list to README and docs/FEATURES.md

Replace the minimal README features section with a concise categorized summary and link to a detailed docs/FEATURES.md covering all features, business rules, API endpoints, and integrations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 14:34:36 +01:00
parent 41228430cf
commit bd74c9e3e3
2 changed files with 359 additions and 19 deletions
--- a/README.md
+++ b/README.md
@@ -81,28 +81,58 @@ The backoffice will be available at http://localhost:7082
 ## Features
-### Libraries Management
+> For the full feature list, business rules, and API details, see [docs/FEATURES.md](docs/FEATURES.md).
 - Create and manage multiple libraries
 - Configure automatic scanning schedules (hourly, daily, weekly)
 - Real-time file watcher for instant indexing
 - Full and incremental rebuild options
-### Books Management
+### Libraries
- Support for CBZ, CBR, and PDF formats
+- Multi-library management with per-library configuration
- Automatic metadata extraction
+- Incremental and full scanning, real-time filesystem watcher
- Series and volume detection
+- Per-library metadata provider selection (Google Books, ComicVine, BedéThèque, AniList, Open Library)
 - Full-text search powered by PostgreSQL
-### Jobs Monitoring
+### Books & Series
- Real-time job progress tracking
+- **Formats**: CBZ, CBR, PDF, EPUB
- Detailed statistics (scanned, indexed, removed, errors)
+- Automatic metadata extraction (title, series, volume, authors, page count) from filenames and directory structure
- Job history and logs
+- Series aggregation with missing volume detection
- Cancel pending jobs
+- Thumbnail generation (WebP/JPEG/PNG) with lazy generation and bulk rebuild
 - CBR → CBZ conversion
-### Search
+### Reading Progress
- Full-text search across titles, authors, and series
+- Per-book tracking: unread / reading / read with current page
- Library filtering
+- Series-level aggregated reading status
- Real-time suggestions
+- Bulk mark-as-read for series
 ### Search & Discovery
 - Full-text search across titles, authors, and series (PostgreSQL `pg_trgm`)
 - Author listing with book/series counts
 - Filtering by reading status, series status, format, metadata provider
 ### External Metadata
 - Search, match, approve/reject workflow with confidence scoring
 - Batch auto-matching and scheduled metadata refresh
 - Field locking to protect manual edits from sync
 ### External Integrations
 - **Komga**: import reading progress
 - **Prowlarr**: search for missing volumes
 - **qBittorrent**: add torrents directly from search results
 ### Background Jobs
 - Rebuild, rescan, thumbnail generation, metadata batch, CBR conversion
 - Real-time progress via Server-Sent Events (SSE)
 - Job history, error tracking, cancellation
 ### Page Rendering
 - On-demand page extraction from all formats
 - Image processing (format, quality, max width, resampling filter)
 - LRU in-memory + disk cache
 ### Security
 - Token-based auth (`admin` / `read` scopes) with Argon2 hashing
 - Rate limiting, token expiration and revocation
 ### Web UI (Backoffice)
 - Dashboard with statistics, charts, and reading progress
 - Library, book, series, author management
 - Live job monitoring, metadata search modals, settings panel
 ## Environment Variables
--- a/docs/FEATURES.md
+++ b/docs/FEATURES.md
@@ -0,0 +1,310 @@
 # Stripstream Librarian — Features & Business Rules
 ## Libraries
 ### Multi-Library Management
 - Create and manage multiple independent libraries, each with its own root path
 - Enable/disable libraries individually
 - Delete a library cascades to all its books, jobs, and metadata
 ### Scanning & Indexing
 - **Incremental scan**: uses directory mtime tracking to skip unchanged directories
 - **Full rebuild**: force re-walk all directories, ignoring cached mtimes
 - **Rescan**: deep rescan to discover newly supported formats
 - **Two-phase pipeline**:
  - Phase 1 (Discovery): fast filename-based metadata extraction (no archive I/O)
  - Phase 2 (Analysis): extract page counts, first page image from archives
 ### Real-Time Monitoring
 - **Automatic periodic scanning**: configurable interval (default 5 seconds)
 - **Filesystem watcher**: real-time detection of file changes for instant indexing
 - Each can be toggled per library (`monitor_enabled`, `watcher_enabled`)
 ---
 ## Books
 ### Format Support
 - **CBZ** (ZIP-based comic archives)
 - **CBR** (RAR-based comic archives)
 - **PDF**
 - **EPUB**
 - Automatic format detection from file extension and magic bytes
 ### Metadata Extraction
 - **Title**: derived from filename or external metadata
 - **Series**: derived from directory structure (first directory level under library root)
 - **Volume**: extracted from filename with pattern detection:
  - `T##` (Tome) — most common for French comics
  - `Vol.##`, `Vol ##`, `Volume ##`
  - `###` (standalone number)
  - `-## ` (dash-separated)
 - **Author(s)**: single scalar and array support
 - **Page count**: extracted from archive analysis
 - **Language**, **kind** (ebook, comic, bd)
 ### Thumbnails
 - Generated from the first page of each archive
 - Output format configurable: WebP (default), JPEG, PNG
 - Configurable dimensions (default 300×400)
 - Lazy generation: created on first access if missing
 - Bulk operations: rebuild missing or regenerate all
 ### CBR to CBZ Conversion
 - Convert RAR archives to ZIP format
 - Tracked as background job with progress
 ---
 ## Series
 ### Automatic Aggregation
 - Series derived from directory structure during scanning
 - Books without series grouped as "unclassified"
 ### Series Metadata
 - Description, publisher, start year, status (`ongoing`, `ended`, `completed`, `on_hold`, `hiatus`)
 - Total volume count (from external providers)
 - Authors (aggregated from books or metadata)
 ### Filtering & Discovery
 - Filter by: series name (partial match), reading status, series status, metadata provider linkage
 - Sort by: name, reading status, book count
 - **Missing books detection**: identifies gaps in volume numbering within a series
 ---
 ## Reading Progress
 ### Per-Book Tracking
 - Three states: `unread` (default), `reading`, `read`
 - Current page tracking when status is `reading`
 - `last_read_at` timestamp auto-updated
 ### Series-Level Status
 - Calculated from book statuses:
  - All read → series `read`
  - None read → series `unread`
  - Mixed → series `reading`
 ### Bulk Operations
 - Mark entire series as read (updates all books)
 ---
 ## Search & Discovery
 ### Full-Text Search
 - PostgreSQL-based (`ILIKE` + `pg_trgm`)
 - Searches across: book titles, series names, authors (scalar and array fields), series metadata authors
 - Case-insensitive partial matching
 - Library-scoped filtering
 ### Results
 - Book hits: title, authors, series, volume, language, kind
 - Series hits: name, book count, read count, first book (for linking)
 - Processing time included in response
 ---
 ## Authors
 - Unique author aggregation from books and series metadata
 - Per-author book and series count
 - Searchable by name (partial match)
 - Sortable by name or book count
 ---
 ## External Metadata
 ### Supported Providers
 | Provider | Focus |
 |----------|-------|
 | Google Books | General books (default fallback) |
 | ComicVine | Comics |
 | BedéThèque | Franco-Belgian comics |
 | AniList | Manga/anime |
 | Open Library | General books |
 ### Provider Configuration
 - Global default provider with library-level override
 - Fallback provider if primary is unavailable
 ### Matching Workflow
 1. **Search**: query a provider, get candidates with confidence scores
 2. **Match**: link a series to an external result (status `pending`)
 3. **Approve**: validate and sync metadata to series and books
 4. **Reject**: discard a match
 ### Batch Processing
 - Auto-match all series in a library via `metadata_batch` job
 - Configurable confidence threshold
 - Result statuses: `auto_matched`, `no_results`, `too_many_results`, `low_confidence`, `already_linked`
 ### Metadata Refresh
 - Update approved links with latest data from providers
 - Change tracking reports per series/book
 - Non-destructive: only updates when provider has new data
 ### Field Locking
 - Individual book fields can be locked to prevent external sync from overwriting manual edits
 ---
 ## External Integrations
 ### Komga Sync
 - Import reading progress from a Komga server
 - Matches local series/books by name
 - Detailed sync report: matched, already read, newly marked, unmatched
 ### Prowlarr (Indexer Search)
 - Search Prowlarr for missing volumes in a series
 - Volume pattern matching against release titles
 - Results: title, size, seeders/leechers, download URL, matched missing volumes
 ### qBittorrent
 - Add torrents directly from Prowlarr search results
 - Connection test endpoint
 ---
 ## Page Rendering & Caching
 ### Page Extraction
 - Render any page from supported archive formats
 - 1-indexed page numbers
 ### Image Processing
 - Output formats: original, JPEG, PNG, WebP
 - Quality parameter (1–100)
 - Max width parameter (1–2160 px)
 - Configurable resampling filter: lanczos3, nearest, triangle/bilinear
 - Concurrent render limit (default 8) with semaphore
 ### Caching
 - **LRU in-memory cache**: 512 entries
 - **Disk cache**: SHA256-keyed, two-level directory structure
 - Cache key = hash(path + page + format + quality + width)
 - Configurable cache directory and max size
 - Manual cache clear via settings
 ---
 ## Background Jobs
 ### Job Types
 | Type | Description |
 |------|-------------|
 | `rebuild` | Incremental scan |
 | `full_rebuild` | Full filesystem rescan |
 | `rescan` | Deep rescan for new formats |
 | `thumbnail_rebuild` | Generate missing thumbnails |
 | `thumbnail_regenerate` | Clear and regenerate all thumbnails |
 | `cbr_to_cbz` | Convert RAR to ZIP |
 | `metadata_batch` | Auto-match series to metadata |
 | `metadata_refresh` | Update approved metadata links |
 ### Job Lifecycle
 - Status flow: `pending` → `running` → `success` | `failed` | `cancelled`
 - Intermediate statuses: `extracting_pages`, `generating_thumbnails`
 - Real-time progress via **Server-Sent Events** (SSE)
 - Per-file error tracking (non-fatal: job continues on errors)
 - Cancellation support for pending/running jobs
 ### Progress Tracking
 - Percentage (0–100), current file, processed/total counts
 - Timing: started_at, finished_at, phase2_started_at
 - Stats JSON blob with job-specific metrics
 ---
 ## Authentication & Security
 ### Token System
 - **Bootstrap token**: admin token via `API_BOOTSTRAP_TOKEN` env var
 - **API tokens**: create, list, revoke with scopes
 - Token format: `stl_{prefix}_{secret}` with Argon2 hashing
 - Expiration dates, last usage tracking, revocation
 ### Access Control
 - **Two scopes**: `admin` (full access) and `read` (read-only)
 - Route-level middleware enforcement
 - Rate limiting: configurable sliding window (default 120 req/s)
 ---
 ## Backoffice (Web UI)
 ### Dashboard
 - Statistics cards: books, series, authors, libraries
 - Donut charts: reading status breakdown, format distribution
 - Bar charts: books per language
 - Per-library reading progress bars
 - Top series by book/page count
 - Monthly addition timeline
 - Metadata coverage stats
 ### Pages
 - **Libraries**: list, create, delete, configure monitoring and metadata provider
 - **Books**: global list with filtering/sorting, detail view with metadata and page rendering
 - **Series**: global list, per-library view, detail with metadata management
 - **Authors**: list with book/series counts, detail with author's books
 - **Jobs**: history, live progress via SSE, error details
 - **Tokens**: create, list, revoke API tokens
 - **Settings**: image processing, cache, thumbnails, external services (Prowlarr, qBittorrent)
 ### Interactive Features
 - Real-time search with suggestions
 - Metadata search and matching modals
 - Prowlarr search modal for missing volumes
 - Folder browser/picker for library paths
 - Book/series editing forms
 - Quick reading status toggles
 - CBR to CBZ conversion trigger
 ---
 ## API
 ### Documentation
 - OpenAPI/Swagger UI available at `/swagger-ui`
 - Health check (`/health`), readiness (`/ready`), Prometheus metrics (`/metrics`)
 ### Public Endpoints (no auth)
 - `GET /health`, `GET /ready`, `GET /metrics`, `GET /swagger-ui`
 ### Read Endpoints (read scope)
 - Libraries, books, series, authors listing and detail
 - Book pages and thumbnails
 - Reading progress get/update
 - Full-text search, collection statistics
 ### Admin Endpoints (admin scope)
 - Library CRUD and configuration
 - Book metadata editing, CBR conversion
 - Series metadata editing
 - Indexing job management (trigger, cancel, stream)
 - API token management
 - Metadata operations (search, match, approve, reject, batch, refresh)
 - External integrations (Prowlarr, qBittorrent, Komga)
 - Application settings and cache management
 ---
 ## Database
 ### Key Design Decisions
 - PostgreSQL with `pg_trgm` for full-text search (no external search engine)
 - All deletions cascade from libraries
 - Unique constraints: file paths, token prefixes, metadata links (library + series + provider)
 - Directory mtime caching for incremental scan optimization
 - Connection pool: 10 (API), 20 (indexer)
 ### Archive Resilience
 - CBZ: fallback streaming reader if central directory corrupted
 - CBR: RAR extraction via system `unar`, fallback to CBZ parsing
 - PDF: `pdfinfo` for page count, `pdftoppm` for rendering
 - EPUB: ZIP-based extraction
 - FD exhaustion detection: aborts if too many consecutive IO errors