diff --git a/README.md b/README.md index df4153f..b8c62f2 100644 --- a/README.md +++ b/README.md @@ -81,28 +81,58 @@ The backoffice will be available at http://localhost:7082 ## Features -### Libraries Management -- Create and manage multiple libraries -- Configure automatic scanning schedules (hourly, daily, weekly) -- Real-time file watcher for instant indexing -- Full and incremental rebuild options +> For the full feature list, business rules, and API details, see [docs/FEATURES.md](docs/FEATURES.md). -### Books Management -- Support for CBZ, CBR, and PDF formats -- Automatic metadata extraction -- Series and volume detection -- Full-text search powered by PostgreSQL +### Libraries +- Multi-library management with per-library configuration +- Incremental and full scanning, real-time filesystem watcher +- Per-library metadata provider selection (Google Books, ComicVine, BedéThèque, AniList, Open Library) -### Jobs Monitoring -- Real-time job progress tracking -- Detailed statistics (scanned, indexed, removed, errors) -- Job history and logs -- Cancel pending jobs +### Books & Series +- **Formats**: CBZ, CBR, PDF, EPUB +- Automatic metadata extraction (title, series, volume, authors, page count) from filenames and directory structure +- Series aggregation with missing volume detection +- Thumbnail generation (WebP/JPEG/PNG) with lazy generation and bulk rebuild +- CBR → CBZ conversion -### Search -- Full-text search across titles, authors, and series -- Library filtering -- Real-time suggestions +### Reading Progress +- Per-book tracking: unread / reading / read with current page +- Series-level aggregated reading status +- Bulk mark-as-read for series + +### Search & Discovery +- Full-text search across titles, authors, and series (PostgreSQL `pg_trgm`) +- Author listing with book/series counts +- Filtering by reading status, series status, format, metadata provider + +### External Metadata +- Search, match, approve/reject workflow with confidence scoring +- Batch auto-matching and scheduled metadata refresh +- Field locking to protect manual edits from sync + +### External Integrations +- **Komga**: import reading progress +- **Prowlarr**: search for missing volumes +- **qBittorrent**: add torrents directly from search results + +### Background Jobs +- Rebuild, rescan, thumbnail generation, metadata batch, CBR conversion +- Real-time progress via Server-Sent Events (SSE) +- Job history, error tracking, cancellation + +### Page Rendering +- On-demand page extraction from all formats +- Image processing (format, quality, max width, resampling filter) +- LRU in-memory + disk cache + +### Security +- Token-based auth (`admin` / `read` scopes) with Argon2 hashing +- Rate limiting, token expiration and revocation + +### Web UI (Backoffice) +- Dashboard with statistics, charts, and reading progress +- Library, book, series, author management +- Live job monitoring, metadata search modals, settings panel ## Environment Variables diff --git a/docs/FEATURES.md b/docs/FEATURES.md new file mode 100644 index 0000000..3377c0b --- /dev/null +++ b/docs/FEATURES.md @@ -0,0 +1,310 @@ +# Stripstream Librarian — Features & Business Rules + +## Libraries + +### Multi-Library Management +- Create and manage multiple independent libraries, each with its own root path +- Enable/disable libraries individually +- Delete a library cascades to all its books, jobs, and metadata + +### Scanning & Indexing +- **Incremental scan**: uses directory mtime tracking to skip unchanged directories +- **Full rebuild**: force re-walk all directories, ignoring cached mtimes +- **Rescan**: deep rescan to discover newly supported formats +- **Two-phase pipeline**: + - Phase 1 (Discovery): fast filename-based metadata extraction (no archive I/O) + - Phase 2 (Analysis): extract page counts, first page image from archives + +### Real-Time Monitoring +- **Automatic periodic scanning**: configurable interval (default 5 seconds) +- **Filesystem watcher**: real-time detection of file changes for instant indexing +- Each can be toggled per library (`monitor_enabled`, `watcher_enabled`) + +--- + +## Books + +### Format Support +- **CBZ** (ZIP-based comic archives) +- **CBR** (RAR-based comic archives) +- **PDF** +- **EPUB** +- Automatic format detection from file extension and magic bytes + +### Metadata Extraction +- **Title**: derived from filename or external metadata +- **Series**: derived from directory structure (first directory level under library root) +- **Volume**: extracted from filename with pattern detection: + - `T##` (Tome) — most common for French comics + - `Vol.##`, `Vol ##`, `Volume ##` + - `###` (standalone number) + - `-## ` (dash-separated) +- **Author(s)**: single scalar and array support +- **Page count**: extracted from archive analysis +- **Language**, **kind** (ebook, comic, bd) + +### Thumbnails +- Generated from the first page of each archive +- Output format configurable: WebP (default), JPEG, PNG +- Configurable dimensions (default 300×400) +- Lazy generation: created on first access if missing +- Bulk operations: rebuild missing or regenerate all + +### CBR to CBZ Conversion +- Convert RAR archives to ZIP format +- Tracked as background job with progress + +--- + +## Series + +### Automatic Aggregation +- Series derived from directory structure during scanning +- Books without series grouped as "unclassified" + +### Series Metadata +- Description, publisher, start year, status (`ongoing`, `ended`, `completed`, `on_hold`, `hiatus`) +- Total volume count (from external providers) +- Authors (aggregated from books or metadata) + +### Filtering & Discovery +- Filter by: series name (partial match), reading status, series status, metadata provider linkage +- Sort by: name, reading status, book count +- **Missing books detection**: identifies gaps in volume numbering within a series + +--- + +## Reading Progress + +### Per-Book Tracking +- Three states: `unread` (default), `reading`, `read` +- Current page tracking when status is `reading` +- `last_read_at` timestamp auto-updated + +### Series-Level Status +- Calculated from book statuses: + - All read → series `read` + - None read → series `unread` + - Mixed → series `reading` + +### Bulk Operations +- Mark entire series as read (updates all books) + +--- + +## Search & Discovery + +### Full-Text Search +- PostgreSQL-based (`ILIKE` + `pg_trgm`) +- Searches across: book titles, series names, authors (scalar and array fields), series metadata authors +- Case-insensitive partial matching +- Library-scoped filtering + +### Results +- Book hits: title, authors, series, volume, language, kind +- Series hits: name, book count, read count, first book (for linking) +- Processing time included in response + +--- + +## Authors + +- Unique author aggregation from books and series metadata +- Per-author book and series count +- Searchable by name (partial match) +- Sortable by name or book count + +--- + +## External Metadata + +### Supported Providers +| Provider | Focus | +|----------|-------| +| Google Books | General books (default fallback) | +| ComicVine | Comics | +| BedéThèque | Franco-Belgian comics | +| AniList | Manga/anime | +| Open Library | General books | + +### Provider Configuration +- Global default provider with library-level override +- Fallback provider if primary is unavailable + +### Matching Workflow +1. **Search**: query a provider, get candidates with confidence scores +2. **Match**: link a series to an external result (status `pending`) +3. **Approve**: validate and sync metadata to series and books +4. **Reject**: discard a match + +### Batch Processing +- Auto-match all series in a library via `metadata_batch` job +- Configurable confidence threshold +- Result statuses: `auto_matched`, `no_results`, `too_many_results`, `low_confidence`, `already_linked` + +### Metadata Refresh +- Update approved links with latest data from providers +- Change tracking reports per series/book +- Non-destructive: only updates when provider has new data + +### Field Locking +- Individual book fields can be locked to prevent external sync from overwriting manual edits + +--- + +## External Integrations + +### Komga Sync +- Import reading progress from a Komga server +- Matches local series/books by name +- Detailed sync report: matched, already read, newly marked, unmatched + +### Prowlarr (Indexer Search) +- Search Prowlarr for missing volumes in a series +- Volume pattern matching against release titles +- Results: title, size, seeders/leechers, download URL, matched missing volumes + +### qBittorrent +- Add torrents directly from Prowlarr search results +- Connection test endpoint + +--- + +## Page Rendering & Caching + +### Page Extraction +- Render any page from supported archive formats +- 1-indexed page numbers + +### Image Processing +- Output formats: original, JPEG, PNG, WebP +- Quality parameter (1–100) +- Max width parameter (1–2160 px) +- Configurable resampling filter: lanczos3, nearest, triangle/bilinear +- Concurrent render limit (default 8) with semaphore + +### Caching +- **LRU in-memory cache**: 512 entries +- **Disk cache**: SHA256-keyed, two-level directory structure +- Cache key = hash(path + page + format + quality + width) +- Configurable cache directory and max size +- Manual cache clear via settings + +--- + +## Background Jobs + +### Job Types +| Type | Description | +|------|-------------| +| `rebuild` | Incremental scan | +| `full_rebuild` | Full filesystem rescan | +| `rescan` | Deep rescan for new formats | +| `thumbnail_rebuild` | Generate missing thumbnails | +| `thumbnail_regenerate` | Clear and regenerate all thumbnails | +| `cbr_to_cbz` | Convert RAR to ZIP | +| `metadata_batch` | Auto-match series to metadata | +| `metadata_refresh` | Update approved metadata links | + +### Job Lifecycle +- Status flow: `pending` → `running` → `success` | `failed` | `cancelled` +- Intermediate statuses: `extracting_pages`, `generating_thumbnails` +- Real-time progress via **Server-Sent Events** (SSE) +- Per-file error tracking (non-fatal: job continues on errors) +- Cancellation support for pending/running jobs + +### Progress Tracking +- Percentage (0–100), current file, processed/total counts +- Timing: started_at, finished_at, phase2_started_at +- Stats JSON blob with job-specific metrics + +--- + +## Authentication & Security + +### Token System +- **Bootstrap token**: admin token via `API_BOOTSTRAP_TOKEN` env var +- **API tokens**: create, list, revoke with scopes +- Token format: `stl_{prefix}_{secret}` with Argon2 hashing +- Expiration dates, last usage tracking, revocation + +### Access Control +- **Two scopes**: `admin` (full access) and `read` (read-only) +- Route-level middleware enforcement +- Rate limiting: configurable sliding window (default 120 req/s) + +--- + +## Backoffice (Web UI) + +### Dashboard +- Statistics cards: books, series, authors, libraries +- Donut charts: reading status breakdown, format distribution +- Bar charts: books per language +- Per-library reading progress bars +- Top series by book/page count +- Monthly addition timeline +- Metadata coverage stats + +### Pages +- **Libraries**: list, create, delete, configure monitoring and metadata provider +- **Books**: global list with filtering/sorting, detail view with metadata and page rendering +- **Series**: global list, per-library view, detail with metadata management +- **Authors**: list with book/series counts, detail with author's books +- **Jobs**: history, live progress via SSE, error details +- **Tokens**: create, list, revoke API tokens +- **Settings**: image processing, cache, thumbnails, external services (Prowlarr, qBittorrent) + +### Interactive Features +- Real-time search with suggestions +- Metadata search and matching modals +- Prowlarr search modal for missing volumes +- Folder browser/picker for library paths +- Book/series editing forms +- Quick reading status toggles +- CBR to CBZ conversion trigger + +--- + +## API + +### Documentation +- OpenAPI/Swagger UI available at `/swagger-ui` +- Health check (`/health`), readiness (`/ready`), Prometheus metrics (`/metrics`) + +### Public Endpoints (no auth) +- `GET /health`, `GET /ready`, `GET /metrics`, `GET /swagger-ui` + +### Read Endpoints (read scope) +- Libraries, books, series, authors listing and detail +- Book pages and thumbnails +- Reading progress get/update +- Full-text search, collection statistics + +### Admin Endpoints (admin scope) +- Library CRUD and configuration +- Book metadata editing, CBR conversion +- Series metadata editing +- Indexing job management (trigger, cancel, stream) +- API token management +- Metadata operations (search, match, approve, reject, batch, refresh) +- External integrations (Prowlarr, qBittorrent, Komga) +- Application settings and cache management + +--- + +## Database + +### Key Design Decisions +- PostgreSQL with `pg_trgm` for full-text search (no external search engine) +- All deletions cascade from libraries +- Unique constraints: file paths, token prefixes, metadata links (library + series + provider) +- Directory mtime caching for incremental scan optimization +- Connection pool: 10 (API), 20 (indexer) + +### Archive Resilience +- CBZ: fallback streaming reader if central directory corrupted +- CBR: RAR extraction via system `unar`, fallback to CBZ parsing +- PDF: `pdfinfo` for page count, `pdftoppm` for rendering +- EPUB: ZIP-based extraction +- FD exhaustion detection: aborts if too many consecutive IO errors