docs: add comprehensive features list to README and docs/FEATURES.md

Replace the minimal README features section with a concise categorized summary and link to a detailed docs/FEATURES.md covering all features, business rules, API endpoints, and integrations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 14:34:36 +01:00
parent 41228430cf
commit bd74c9e3e3
2 changed files with 359 additions and 19 deletions
--- a/README.md
+++ b/README.md
@@ -81,28 +81,58 @@ The backoffice will be available at http://localhost:7082

 ## Features

-### Libraries Management
- Create and manage multiple libraries
- Configure automatic scanning schedules (hourly, daily, weekly)
- Real-time file watcher for instant indexing
- Full and incremental rebuild options
+> For the full feature list, business rules, and API details, see [docs/FEATURES.md](docs/FEATURES.md).

-### Books Management
- Support for CBZ, CBR, and PDF formats
- Automatic metadata extraction
- Series and volume detection
- Full-text search powered by PostgreSQL
+### Libraries
+- Multi-library management with per-library configuration
+- Incremental and full scanning, real-time filesystem watcher
+- Per-library metadata provider selection (Google Books, ComicVine, BedéThèque, AniList, Open Library)

-### Jobs Monitoring
- Real-time job progress tracking
- Detailed statistics (scanned, indexed, removed, errors)
- Job history and logs
- Cancel pending jobs
+### Books & Series
+- **Formats**: CBZ, CBR, PDF, EPUB
+- Automatic metadata extraction (title, series, volume, authors, page count) from filenames and directory structure
+- Series aggregation with missing volume detection
+- Thumbnail generation (WebP/JPEG/PNG) with lazy generation and bulk rebuild
+- CBR → CBZ conversion

-### Search
- Full-text search across titles, authors, and series
- Library filtering
- Real-time suggestions
+### Reading Progress
+- Per-book tracking: unread / reading / read with current page
+- Series-level aggregated reading status
+- Bulk mark-as-read for series
+
+### Search & Discovery
+- Full-text search across titles, authors, and series (PostgreSQL `pg_trgm`)
+- Author listing with book/series counts
+- Filtering by reading status, series status, format, metadata provider
+
+### External Metadata
+- Search, match, approve/reject workflow with confidence scoring
+- Batch auto-matching and scheduled metadata refresh
+- Field locking to protect manual edits from sync
+
+### External Integrations
+- **Komga**: import reading progress
+- **Prowlarr**: search for missing volumes
+- **qBittorrent**: add torrents directly from search results
+
+### Background Jobs
+- Rebuild, rescan, thumbnail generation, metadata batch, CBR conversion
+- Real-time progress via Server-Sent Events (SSE)
+- Job history, error tracking, cancellation
+
+### Page Rendering
+- On-demand page extraction from all formats
+- Image processing (format, quality, max width, resampling filter)
+- LRU in-memory + disk cache
+
+### Security
+- Token-based auth (`admin` / `read` scopes) with Argon2 hashing
+- Rate limiting, token expiration and revocation
+
+### Web UI (Backoffice)
+- Dashboard with statistics, charts, and reading progress
+- Library, book, series, author management
+- Live job monitoring, metadata search modals, settings panel

 ## Environment Variables

--- a/docs/FEATURES.md
+++ b/docs/FEATURES.md
@@ -0,0 +1,310 @@
+# Stripstream Librarian — Features & Business Rules
+
+## Libraries
+
+### Multi-Library Management
+- Create and manage multiple independent libraries, each with its own root path
+- Enable/disable libraries individually
+- Delete a library cascades to all its books, jobs, and metadata
+
+### Scanning & Indexing
+- **Incremental scan**: uses directory mtime tracking to skip unchanged directories
+- **Full rebuild**: force re-walk all directories, ignoring cached mtimes
+- **Rescan**: deep rescan to discover newly supported formats
+- **Two-phase pipeline**:
+  - Phase 1 (Discovery): fast filename-based metadata extraction (no archive I/O)
+  - Phase 2 (Analysis): extract page counts, first page image from archives
+
+### Real-Time Monitoring
+- **Automatic periodic scanning**: configurable interval (default 5 seconds)
+- **Filesystem watcher**: real-time detection of file changes for instant indexing
+- Each can be toggled per library (`monitor_enabled`, `watcher_enabled`)
+
+---
+
+## Books
+
+### Format Support
+- **CBZ** (ZIP-based comic archives)
+- **CBR** (RAR-based comic archives)
+- **PDF**
+- **EPUB**
+- Automatic format detection from file extension and magic bytes
+
+### Metadata Extraction
+- **Title**: derived from filename or external metadata
+- **Series**: derived from directory structure (first directory level under library root)
+- **Volume**: extracted from filename with pattern detection:
+  - `T##` (Tome) — most common for French comics
+  - `Vol.##`, `Vol ##`, `Volume ##`
+  - `###` (standalone number)
+  - `-## ` (dash-separated)
+- **Author(s)**: single scalar and array support
+- **Page count**: extracted from archive analysis
+- **Language**, **kind** (ebook, comic, bd)
+
+### Thumbnails
+- Generated from the first page of each archive
+- Output format configurable: WebP (default), JPEG, PNG
+- Configurable dimensions (default 300×400)
+- Lazy generation: created on first access if missing
+- Bulk operations: rebuild missing or regenerate all
+
+### CBR to CBZ Conversion
+- Convert RAR archives to ZIP format
+- Tracked as background job with progress
+
+---
+
+## Series
+
+### Automatic Aggregation
+- Series derived from directory structure during scanning
+- Books without series grouped as "unclassified"
+
+### Series Metadata
+- Description, publisher, start year, status (`ongoing`, `ended`, `completed`, `on_hold`, `hiatus`)
+- Total volume count (from external providers)
+- Authors (aggregated from books or metadata)
+
+### Filtering & Discovery
+- Filter by: series name (partial match), reading status, series status, metadata provider linkage
+- Sort by: name, reading status, book count
+- **Missing books detection**: identifies gaps in volume numbering within a series
+
+---
+
+## Reading Progress
+
+### Per-Book Tracking
+- Three states: `unread` (default), `reading`, `read`
+- Current page tracking when status is `reading`
+- `last_read_at` timestamp auto-updated
+
+### Series-Level Status
+- Calculated from book statuses:
+  - All read → series `read`
+  - None read → series `unread`
+  - Mixed → series `reading`
+
+### Bulk Operations
+- Mark entire series as read (updates all books)
+
+---
+
+## Search & Discovery
+
+### Full-Text Search
+- PostgreSQL-based (`ILIKE` + `pg_trgm`)
+- Searches across: book titles, series names, authors (scalar and array fields), series metadata authors
+- Case-insensitive partial matching
+- Library-scoped filtering
+
+### Results
+- Book hits: title, authors, series, volume, language, kind
+- Series hits: name, book count, read count, first book (for linking)
+- Processing time included in response
+
+---
+
+## Authors
+
+- Unique author aggregation from books and series metadata
+- Per-author book and series count
+- Searchable by name (partial match)
+- Sortable by name or book count
+
+---
+
+## External Metadata
+
+### Supported Providers
+| Provider | Focus |
+|----------|-------|
+| Google Books | General books (default fallback) |
+| ComicVine | Comics |
+| BedéThèque | Franco-Belgian comics |
+| AniList | Manga/anime |
+| Open Library | General books |
+
+### Provider Configuration
+- Global default provider with library-level override
+- Fallback provider if primary is unavailable
+
+### Matching Workflow
+1. **Search**: query a provider, get candidates with confidence scores
+2. **Match**: link a series to an external result (status `pending`)
+3. **Approve**: validate and sync metadata to series and books
+4. **Reject**: discard a match
+
+### Batch Processing
+- Auto-match all series in a library via `metadata_batch` job
+- Configurable confidence threshold
+- Result statuses: `auto_matched`, `no_results`, `too_many_results`, `low_confidence`, `already_linked`
+
+### Metadata Refresh
+- Update approved links with latest data from providers
+- Change tracking reports per series/book
+- Non-destructive: only updates when provider has new data
+
+### Field Locking
+- Individual book fields can be locked to prevent external sync from overwriting manual edits
+
+---
+
+## External Integrations
+
+### Komga Sync
+- Import reading progress from a Komga server
+- Matches local series/books by name
+- Detailed sync report: matched, already read, newly marked, unmatched
+
+### Prowlarr (Indexer Search)
+- Search Prowlarr for missing volumes in a series
+- Volume pattern matching against release titles
+- Results: title, size, seeders/leechers, download URL, matched missing volumes
+
+### qBittorrent
+- Add torrents directly from Prowlarr search results
+- Connection test endpoint
+
+---
+
+## Page Rendering & Caching
+
+### Page Extraction
+- Render any page from supported archive formats
+- 1-indexed page numbers
+
+### Image Processing
+- Output formats: original, JPEG, PNG, WebP
+- Quality parameter (1–100)
+- Max width parameter (1–2160 px)
+- Configurable resampling filter: lanczos3, nearest, triangle/bilinear
+- Concurrent render limit (default 8) with semaphore
+
+### Caching
+- **LRU in-memory cache**: 512 entries
+- **Disk cache**: SHA256-keyed, two-level directory structure
+- Cache key = hash(path + page + format + quality + width)
+- Configurable cache directory and max size
+- Manual cache clear via settings
+
+---
+
+## Background Jobs
+
+### Job Types
+| Type | Description |
+|------|-------------|
+| `rebuild` | Incremental scan |
+| `full_rebuild` | Full filesystem rescan |
+| `rescan` | Deep rescan for new formats |
+| `thumbnail_rebuild` | Generate missing thumbnails |
+| `thumbnail_regenerate` | Clear and regenerate all thumbnails |
+| `cbr_to_cbz` | Convert RAR to ZIP |
+| `metadata_batch` | Auto-match series to metadata |
+| `metadata_refresh` | Update approved metadata links |
+
+### Job Lifecycle
+- Status flow: `pending` → `running` → `success` | `failed` | `cancelled`
+- Intermediate statuses: `extracting_pages`, `generating_thumbnails`
+- Real-time progress via **Server-Sent Events** (SSE)
+- Per-file error tracking (non-fatal: job continues on errors)
+- Cancellation support for pending/running jobs
+
+### Progress Tracking
+- Percentage (0–100), current file, processed/total counts
+- Timing: started_at, finished_at, phase2_started_at
+- Stats JSON blob with job-specific metrics
+
+---
+
+## Authentication & Security
+
+### Token System
+- **Bootstrap token**: admin token via `API_BOOTSTRAP_TOKEN` env var
+- **API tokens**: create, list, revoke with scopes
+- Token format: `stl_{prefix}_{secret}` with Argon2 hashing
+- Expiration dates, last usage tracking, revocation
+
+### Access Control
+- **Two scopes**: `admin` (full access) and `read` (read-only)
+- Route-level middleware enforcement
+- Rate limiting: configurable sliding window (default 120 req/s)
+
+---
+
+## Backoffice (Web UI)
+
+### Dashboard
+- Statistics cards: books, series, authors, libraries
+- Donut charts: reading status breakdown, format distribution
+- Bar charts: books per language
+- Per-library reading progress bars
+- Top series by book/page count
+- Monthly addition timeline
+- Metadata coverage stats
+
+### Pages
+- **Libraries**: list, create, delete, configure monitoring and metadata provider
+- **Books**: global list with filtering/sorting, detail view with metadata and page rendering
+- **Series**: global list, per-library view, detail with metadata management
+- **Authors**: list with book/series counts, detail with author's books
+- **Jobs**: history, live progress via SSE, error details
+- **Tokens**: create, list, revoke API tokens
+- **Settings**: image processing, cache, thumbnails, external services (Prowlarr, qBittorrent)
+
+### Interactive Features
+- Real-time search with suggestions
+- Metadata search and matching modals
+- Prowlarr search modal for missing volumes
+- Folder browser/picker for library paths
+- Book/series editing forms
+- Quick reading status toggles
+- CBR to CBZ conversion trigger
+
+---
+
+## API
+
+### Documentation
+- OpenAPI/Swagger UI available at `/swagger-ui`
+- Health check (`/health`), readiness (`/ready`), Prometheus metrics (`/metrics`)
+
+### Public Endpoints (no auth)
+- `GET /health`, `GET /ready`, `GET /metrics`, `GET /swagger-ui`
+
+### Read Endpoints (read scope)
+- Libraries, books, series, authors listing and detail
+- Book pages and thumbnails
+- Reading progress get/update
+- Full-text search, collection statistics
+
+### Admin Endpoints (admin scope)
+- Library CRUD and configuration
+- Book metadata editing, CBR conversion
+- Series metadata editing
+- Indexing job management (trigger, cancel, stream)
+- API token management
+- Metadata operations (search, match, approve, reject, batch, refresh)
+- External integrations (Prowlarr, qBittorrent, Komga)
+- Application settings and cache management
+
+---
+
+## Database
+
+### Key Design Decisions
+- PostgreSQL with `pg_trgm` for full-text search (no external search engine)
+- All deletions cascade from libraries
+- Unique constraints: file paths, token prefixes, metadata links (library + series + provider)
+- Directory mtime caching for incremental scan optimization
+- Connection pool: 10 (API), 20 (indexer)
+
+### Archive Resilience
+- CBZ: fallback streaming reader if central directory corrupted
+- CBR: RAR extraction via system `unar`, fallback to CBZ parsing
+- PDF: `pdfinfo` for page count, `pdftoppm` for rendering
+- EPUB: ZIP-based extraction
+- FD exhaustion detection: aborts if too many consecutive IO errors