docs: add comprehensive features list to README and docs/FEATURES.md
All checks were successful
Deploy with Docker Compose / deploy (push) Successful in 1m1s

Replace the minimal README features section with a concise categorized
summary and link to a detailed docs/FEATURES.md covering all features,
business rules, API endpoints, and integrations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-21 14:34:36 +01:00
parent 41228430cf
commit bd74c9e3e3
2 changed files with 359 additions and 19 deletions

View File

@@ -81,28 +81,58 @@ The backoffice will be available at http://localhost:7082
## Features
### Libraries Management
- Create and manage multiple libraries
- Configure automatic scanning schedules (hourly, daily, weekly)
- Real-time file watcher for instant indexing
- Full and incremental rebuild options
> For the full feature list, business rules, and API details, see [docs/FEATURES.md](docs/FEATURES.md).
### Books Management
- Support for CBZ, CBR, and PDF formats
- Automatic metadata extraction
- Series and volume detection
- Full-text search powered by PostgreSQL
### Libraries
- Multi-library management with per-library configuration
- Incremental and full scanning, real-time filesystem watcher
- Per-library metadata provider selection (Google Books, ComicVine, BedéThèque, AniList, Open Library)
### Jobs Monitoring
- Real-time job progress tracking
- Detailed statistics (scanned, indexed, removed, errors)
- Job history and logs
- Cancel pending jobs
### Books & Series
- **Formats**: CBZ, CBR, PDF, EPUB
- Automatic metadata extraction (title, series, volume, authors, page count) from filenames and directory structure
- Series aggregation with missing volume detection
- Thumbnail generation (WebP/JPEG/PNG) with lazy generation and bulk rebuild
- CBR → CBZ conversion
### Search
- Full-text search across titles, authors, and series
- Library filtering
- Real-time suggestions
### Reading Progress
- Per-book tracking: unread / reading / read with current page
- Series-level aggregated reading status
- Bulk mark-as-read for series
### Search & Discovery
- Full-text search across titles, authors, and series (PostgreSQL `pg_trgm`)
- Author listing with book/series counts
- Filtering by reading status, series status, format, metadata provider
### External Metadata
- Search, match, approve/reject workflow with confidence scoring
- Batch auto-matching and scheduled metadata refresh
- Field locking to protect manual edits from sync
### External Integrations
- **Komga**: import reading progress
- **Prowlarr**: search for missing volumes
- **qBittorrent**: add torrents directly from search results
### Background Jobs
- Rebuild, rescan, thumbnail generation, metadata batch, CBR conversion
- Real-time progress via Server-Sent Events (SSE)
- Job history, error tracking, cancellation
### Page Rendering
- On-demand page extraction from all formats
- Image processing (format, quality, max width, resampling filter)
- LRU in-memory + disk cache
### Security
- Token-based auth (`admin` / `read` scopes) with Argon2 hashing
- Rate limiting, token expiration and revocation
### Web UI (Backoffice)
- Dashboard with statistics, charts, and reading progress
- Library, book, series, author management
- Live job monitoring, metadata search modals, settings panel
## Environment Variables

310
docs/FEATURES.md Normal file
View File

@@ -0,0 +1,310 @@
# Stripstream Librarian — Features & Business Rules
## Libraries
### Multi-Library Management
- Create and manage multiple independent libraries, each with its own root path
- Enable/disable libraries individually
- Delete a library cascades to all its books, jobs, and metadata
### Scanning & Indexing
- **Incremental scan**: uses directory mtime tracking to skip unchanged directories
- **Full rebuild**: force re-walk all directories, ignoring cached mtimes
- **Rescan**: deep rescan to discover newly supported formats
- **Two-phase pipeline**:
- Phase 1 (Discovery): fast filename-based metadata extraction (no archive I/O)
- Phase 2 (Analysis): extract page counts, first page image from archives
### Real-Time Monitoring
- **Automatic periodic scanning**: configurable interval (default 5 seconds)
- **Filesystem watcher**: real-time detection of file changes for instant indexing
- Each can be toggled per library (`monitor_enabled`, `watcher_enabled`)
---
## Books
### Format Support
- **CBZ** (ZIP-based comic archives)
- **CBR** (RAR-based comic archives)
- **PDF**
- **EPUB**
- Automatic format detection from file extension and magic bytes
### Metadata Extraction
- **Title**: derived from filename or external metadata
- **Series**: derived from directory structure (first directory level under library root)
- **Volume**: extracted from filename with pattern detection:
- `T##` (Tome) — most common for French comics
- `Vol.##`, `Vol ##`, `Volume ##`
- `###` (standalone number)
- `-## ` (dash-separated)
- **Author(s)**: single scalar and array support
- **Page count**: extracted from archive analysis
- **Language**, **kind** (ebook, comic, bd)
### Thumbnails
- Generated from the first page of each archive
- Output format configurable: WebP (default), JPEG, PNG
- Configurable dimensions (default 300×400)
- Lazy generation: created on first access if missing
- Bulk operations: rebuild missing or regenerate all
### CBR to CBZ Conversion
- Convert RAR archives to ZIP format
- Tracked as background job with progress
---
## Series
### Automatic Aggregation
- Series derived from directory structure during scanning
- Books without series grouped as "unclassified"
### Series Metadata
- Description, publisher, start year, status (`ongoing`, `ended`, `completed`, `on_hold`, `hiatus`)
- Total volume count (from external providers)
- Authors (aggregated from books or metadata)
### Filtering & Discovery
- Filter by: series name (partial match), reading status, series status, metadata provider linkage
- Sort by: name, reading status, book count
- **Missing books detection**: identifies gaps in volume numbering within a series
---
## Reading Progress
### Per-Book Tracking
- Three states: `unread` (default), `reading`, `read`
- Current page tracking when status is `reading`
- `last_read_at` timestamp auto-updated
### Series-Level Status
- Calculated from book statuses:
- All read → series `read`
- None read → series `unread`
- Mixed → series `reading`
### Bulk Operations
- Mark entire series as read (updates all books)
---
## Search & Discovery
### Full-Text Search
- PostgreSQL-based (`ILIKE` + `pg_trgm`)
- Searches across: book titles, series names, authors (scalar and array fields), series metadata authors
- Case-insensitive partial matching
- Library-scoped filtering
### Results
- Book hits: title, authors, series, volume, language, kind
- Series hits: name, book count, read count, first book (for linking)
- Processing time included in response
---
## Authors
- Unique author aggregation from books and series metadata
- Per-author book and series count
- Searchable by name (partial match)
- Sortable by name or book count
---
## External Metadata
### Supported Providers
| Provider | Focus |
|----------|-------|
| Google Books | General books (default fallback) |
| ComicVine | Comics |
| BedéThèque | Franco-Belgian comics |
| AniList | Manga/anime |
| Open Library | General books |
### Provider Configuration
- Global default provider with library-level override
- Fallback provider if primary is unavailable
### Matching Workflow
1. **Search**: query a provider, get candidates with confidence scores
2. **Match**: link a series to an external result (status `pending`)
3. **Approve**: validate and sync metadata to series and books
4. **Reject**: discard a match
### Batch Processing
- Auto-match all series in a library via `metadata_batch` job
- Configurable confidence threshold
- Result statuses: `auto_matched`, `no_results`, `too_many_results`, `low_confidence`, `already_linked`
### Metadata Refresh
- Update approved links with latest data from providers
- Change tracking reports per series/book
- Non-destructive: only updates when provider has new data
### Field Locking
- Individual book fields can be locked to prevent external sync from overwriting manual edits
---
## External Integrations
### Komga Sync
- Import reading progress from a Komga server
- Matches local series/books by name
- Detailed sync report: matched, already read, newly marked, unmatched
### Prowlarr (Indexer Search)
- Search Prowlarr for missing volumes in a series
- Volume pattern matching against release titles
- Results: title, size, seeders/leechers, download URL, matched missing volumes
### qBittorrent
- Add torrents directly from Prowlarr search results
- Connection test endpoint
---
## Page Rendering & Caching
### Page Extraction
- Render any page from supported archive formats
- 1-indexed page numbers
### Image Processing
- Output formats: original, JPEG, PNG, WebP
- Quality parameter (1100)
- Max width parameter (12160 px)
- Configurable resampling filter: lanczos3, nearest, triangle/bilinear
- Concurrent render limit (default 8) with semaphore
### Caching
- **LRU in-memory cache**: 512 entries
- **Disk cache**: SHA256-keyed, two-level directory structure
- Cache key = hash(path + page + format + quality + width)
- Configurable cache directory and max size
- Manual cache clear via settings
---
## Background Jobs
### Job Types
| Type | Description |
|------|-------------|
| `rebuild` | Incremental scan |
| `full_rebuild` | Full filesystem rescan |
| `rescan` | Deep rescan for new formats |
| `thumbnail_rebuild` | Generate missing thumbnails |
| `thumbnail_regenerate` | Clear and regenerate all thumbnails |
| `cbr_to_cbz` | Convert RAR to ZIP |
| `metadata_batch` | Auto-match series to metadata |
| `metadata_refresh` | Update approved metadata links |
### Job Lifecycle
- Status flow: `pending``running``success` | `failed` | `cancelled`
- Intermediate statuses: `extracting_pages`, `generating_thumbnails`
- Real-time progress via **Server-Sent Events** (SSE)
- Per-file error tracking (non-fatal: job continues on errors)
- Cancellation support for pending/running jobs
### Progress Tracking
- Percentage (0100), current file, processed/total counts
- Timing: started_at, finished_at, phase2_started_at
- Stats JSON blob with job-specific metrics
---
## Authentication & Security
### Token System
- **Bootstrap token**: admin token via `API_BOOTSTRAP_TOKEN` env var
- **API tokens**: create, list, revoke with scopes
- Token format: `stl_{prefix}_{secret}` with Argon2 hashing
- Expiration dates, last usage tracking, revocation
### Access Control
- **Two scopes**: `admin` (full access) and `read` (read-only)
- Route-level middleware enforcement
- Rate limiting: configurable sliding window (default 120 req/s)
---
## Backoffice (Web UI)
### Dashboard
- Statistics cards: books, series, authors, libraries
- Donut charts: reading status breakdown, format distribution
- Bar charts: books per language
- Per-library reading progress bars
- Top series by book/page count
- Monthly addition timeline
- Metadata coverage stats
### Pages
- **Libraries**: list, create, delete, configure monitoring and metadata provider
- **Books**: global list with filtering/sorting, detail view with metadata and page rendering
- **Series**: global list, per-library view, detail with metadata management
- **Authors**: list with book/series counts, detail with author's books
- **Jobs**: history, live progress via SSE, error details
- **Tokens**: create, list, revoke API tokens
- **Settings**: image processing, cache, thumbnails, external services (Prowlarr, qBittorrent)
### Interactive Features
- Real-time search with suggestions
- Metadata search and matching modals
- Prowlarr search modal for missing volumes
- Folder browser/picker for library paths
- Book/series editing forms
- Quick reading status toggles
- CBR to CBZ conversion trigger
---
## API
### Documentation
- OpenAPI/Swagger UI available at `/swagger-ui`
- Health check (`/health`), readiness (`/ready`), Prometheus metrics (`/metrics`)
### Public Endpoints (no auth)
- `GET /health`, `GET /ready`, `GET /metrics`, `GET /swagger-ui`
### Read Endpoints (read scope)
- Libraries, books, series, authors listing and detail
- Book pages and thumbnails
- Reading progress get/update
- Full-text search, collection statistics
### Admin Endpoints (admin scope)
- Library CRUD and configuration
- Book metadata editing, CBR conversion
- Series metadata editing
- Indexing job management (trigger, cancel, stream)
- API token management
- Metadata operations (search, match, approve, reject, batch, refresh)
- External integrations (Prowlarr, qBittorrent, Komga)
- Application settings and cache management
---
## Database
### Key Design Decisions
- PostgreSQL with `pg_trgm` for full-text search (no external search engine)
- All deletions cascade from libraries
- Unique constraints: file paths, token prefixes, metadata links (library + series + provider)
- Directory mtime caching for incremental scan optimization
- Connection pool: 10 (API), 20 (indexer)
### Archive Resilience
- CBZ: fallback streaming reader if central directory corrupted
- CBR: RAR extraction via system `unar`, fallback to CBZ parsing
- PDF: `pdfinfo` for page count, `pdftoppm` for rendering
- EPUB: ZIP-based extraction
- FD exhaustion detection: aborts if too many consecutive IO errors