Files
stripstream-librarian/docs/FEATURES.md
Froidefond Julien bd74c9e3e3
All checks were successful
Deploy with Docker Compose / deploy (push) Successful in 1m1s
docs: add comprehensive features list to README and docs/FEATURES.md
Replace the minimal README features section with a concise categorized
summary and link to a detailed docs/FEATURES.md covering all features,
business rules, API endpoints, and integrations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 14:34:36 +01:00

311 lines
9.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Stripstream Librarian — Features & Business Rules
## Libraries
### Multi-Library Management
- Create and manage multiple independent libraries, each with its own root path
- Enable/disable libraries individually
- Delete a library cascades to all its books, jobs, and metadata
### Scanning & Indexing
- **Incremental scan**: uses directory mtime tracking to skip unchanged directories
- **Full rebuild**: force re-walk all directories, ignoring cached mtimes
- **Rescan**: deep rescan to discover newly supported formats
- **Two-phase pipeline**:
- Phase 1 (Discovery): fast filename-based metadata extraction (no archive I/O)
- Phase 2 (Analysis): extract page counts, first page image from archives
### Real-Time Monitoring
- **Automatic periodic scanning**: configurable interval (default 5 seconds)
- **Filesystem watcher**: real-time detection of file changes for instant indexing
- Each can be toggled per library (`monitor_enabled`, `watcher_enabled`)
---
## Books
### Format Support
- **CBZ** (ZIP-based comic archives)
- **CBR** (RAR-based comic archives)
- **PDF**
- **EPUB**
- Automatic format detection from file extension and magic bytes
### Metadata Extraction
- **Title**: derived from filename or external metadata
- **Series**: derived from directory structure (first directory level under library root)
- **Volume**: extracted from filename with pattern detection:
- `T##` (Tome) — most common for French comics
- `Vol.##`, `Vol ##`, `Volume ##`
- `###` (standalone number)
- `-## ` (dash-separated)
- **Author(s)**: single scalar and array support
- **Page count**: extracted from archive analysis
- **Language**, **kind** (ebook, comic, bd)
### Thumbnails
- Generated from the first page of each archive
- Output format configurable: WebP (default), JPEG, PNG
- Configurable dimensions (default 300×400)
- Lazy generation: created on first access if missing
- Bulk operations: rebuild missing or regenerate all
### CBR to CBZ Conversion
- Convert RAR archives to ZIP format
- Tracked as background job with progress
---
## Series
### Automatic Aggregation
- Series derived from directory structure during scanning
- Books without series grouped as "unclassified"
### Series Metadata
- Description, publisher, start year, status (`ongoing`, `ended`, `completed`, `on_hold`, `hiatus`)
- Total volume count (from external providers)
- Authors (aggregated from books or metadata)
### Filtering & Discovery
- Filter by: series name (partial match), reading status, series status, metadata provider linkage
- Sort by: name, reading status, book count
- **Missing books detection**: identifies gaps in volume numbering within a series
---
## Reading Progress
### Per-Book Tracking
- Three states: `unread` (default), `reading`, `read`
- Current page tracking when status is `reading`
- `last_read_at` timestamp auto-updated
### Series-Level Status
- Calculated from book statuses:
- All read → series `read`
- None read → series `unread`
- Mixed → series `reading`
### Bulk Operations
- Mark entire series as read (updates all books)
---
## Search & Discovery
### Full-Text Search
- PostgreSQL-based (`ILIKE` + `pg_trgm`)
- Searches across: book titles, series names, authors (scalar and array fields), series metadata authors
- Case-insensitive partial matching
- Library-scoped filtering
### Results
- Book hits: title, authors, series, volume, language, kind
- Series hits: name, book count, read count, first book (for linking)
- Processing time included in response
---
## Authors
- Unique author aggregation from books and series metadata
- Per-author book and series count
- Searchable by name (partial match)
- Sortable by name or book count
---
## External Metadata
### Supported Providers
| Provider | Focus |
|----------|-------|
| Google Books | General books (default fallback) |
| ComicVine | Comics |
| BedéThèque | Franco-Belgian comics |
| AniList | Manga/anime |
| Open Library | General books |
### Provider Configuration
- Global default provider with library-level override
- Fallback provider if primary is unavailable
### Matching Workflow
1. **Search**: query a provider, get candidates with confidence scores
2. **Match**: link a series to an external result (status `pending`)
3. **Approve**: validate and sync metadata to series and books
4. **Reject**: discard a match
### Batch Processing
- Auto-match all series in a library via `metadata_batch` job
- Configurable confidence threshold
- Result statuses: `auto_matched`, `no_results`, `too_many_results`, `low_confidence`, `already_linked`
### Metadata Refresh
- Update approved links with latest data from providers
- Change tracking reports per series/book
- Non-destructive: only updates when provider has new data
### Field Locking
- Individual book fields can be locked to prevent external sync from overwriting manual edits
---
## External Integrations
### Komga Sync
- Import reading progress from a Komga server
- Matches local series/books by name
- Detailed sync report: matched, already read, newly marked, unmatched
### Prowlarr (Indexer Search)
- Search Prowlarr for missing volumes in a series
- Volume pattern matching against release titles
- Results: title, size, seeders/leechers, download URL, matched missing volumes
### qBittorrent
- Add torrents directly from Prowlarr search results
- Connection test endpoint
---
## Page Rendering & Caching
### Page Extraction
- Render any page from supported archive formats
- 1-indexed page numbers
### Image Processing
- Output formats: original, JPEG, PNG, WebP
- Quality parameter (1100)
- Max width parameter (12160 px)
- Configurable resampling filter: lanczos3, nearest, triangle/bilinear
- Concurrent render limit (default 8) with semaphore
### Caching
- **LRU in-memory cache**: 512 entries
- **Disk cache**: SHA256-keyed, two-level directory structure
- Cache key = hash(path + page + format + quality + width)
- Configurable cache directory and max size
- Manual cache clear via settings
---
## Background Jobs
### Job Types
| Type | Description |
|------|-------------|
| `rebuild` | Incremental scan |
| `full_rebuild` | Full filesystem rescan |
| `rescan` | Deep rescan for new formats |
| `thumbnail_rebuild` | Generate missing thumbnails |
| `thumbnail_regenerate` | Clear and regenerate all thumbnails |
| `cbr_to_cbz` | Convert RAR to ZIP |
| `metadata_batch` | Auto-match series to metadata |
| `metadata_refresh` | Update approved metadata links |
### Job Lifecycle
- Status flow: `pending``running``success` | `failed` | `cancelled`
- Intermediate statuses: `extracting_pages`, `generating_thumbnails`
- Real-time progress via **Server-Sent Events** (SSE)
- Per-file error tracking (non-fatal: job continues on errors)
- Cancellation support for pending/running jobs
### Progress Tracking
- Percentage (0100), current file, processed/total counts
- Timing: started_at, finished_at, phase2_started_at
- Stats JSON blob with job-specific metrics
---
## Authentication & Security
### Token System
- **Bootstrap token**: admin token via `API_BOOTSTRAP_TOKEN` env var
- **API tokens**: create, list, revoke with scopes
- Token format: `stl_{prefix}_{secret}` with Argon2 hashing
- Expiration dates, last usage tracking, revocation
### Access Control
- **Two scopes**: `admin` (full access) and `read` (read-only)
- Route-level middleware enforcement
- Rate limiting: configurable sliding window (default 120 req/s)
---
## Backoffice (Web UI)
### Dashboard
- Statistics cards: books, series, authors, libraries
- Donut charts: reading status breakdown, format distribution
- Bar charts: books per language
- Per-library reading progress bars
- Top series by book/page count
- Monthly addition timeline
- Metadata coverage stats
### Pages
- **Libraries**: list, create, delete, configure monitoring and metadata provider
- **Books**: global list with filtering/sorting, detail view with metadata and page rendering
- **Series**: global list, per-library view, detail with metadata management
- **Authors**: list with book/series counts, detail with author's books
- **Jobs**: history, live progress via SSE, error details
- **Tokens**: create, list, revoke API tokens
- **Settings**: image processing, cache, thumbnails, external services (Prowlarr, qBittorrent)
### Interactive Features
- Real-time search with suggestions
- Metadata search and matching modals
- Prowlarr search modal for missing volumes
- Folder browser/picker for library paths
- Book/series editing forms
- Quick reading status toggles
- CBR to CBZ conversion trigger
---
## API
### Documentation
- OpenAPI/Swagger UI available at `/swagger-ui`
- Health check (`/health`), readiness (`/ready`), Prometheus metrics (`/metrics`)
### Public Endpoints (no auth)
- `GET /health`, `GET /ready`, `GET /metrics`, `GET /swagger-ui`
### Read Endpoints (read scope)
- Libraries, books, series, authors listing and detail
- Book pages and thumbnails
- Reading progress get/update
- Full-text search, collection statistics
### Admin Endpoints (admin scope)
- Library CRUD and configuration
- Book metadata editing, CBR conversion
- Series metadata editing
- Indexing job management (trigger, cancel, stream)
- API token management
- Metadata operations (search, match, approve, reject, batch, refresh)
- External integrations (Prowlarr, qBittorrent, Komga)
- Application settings and cache management
---
## Database
### Key Design Decisions
- PostgreSQL with `pg_trgm` for full-text search (no external search engine)
- All deletions cascade from libraries
- Unique constraints: file paths, token prefixes, metadata links (library + series + provider)
- Directory mtime caching for incremental scan optimization
- Connection pool: 10 (API), 20 (indexer)
### Archive Resilience
- CBZ: fallback streaming reader if central directory corrupted
- CBR: RAR extraction via system `unar`, fallback to CBZ parsing
- PDF: `pdfinfo` for page count, `pdftoppm` for rendering
- EPUB: ZIP-based extraction
- FD exhaustion detection: aborts if too many consecutive IO errors