All checks were successful
Deploy with Docker Compose / deploy (push) Successful in 1m1s
Replace the minimal README features section with a concise categorized summary and link to a detailed docs/FEATURES.md covering all features, business rules, API endpoints, and integrations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
311 lines
9.9 KiB
Markdown
311 lines
9.9 KiB
Markdown
# Stripstream Librarian — Features & Business Rules
|
||
|
||
## Libraries
|
||
|
||
### Multi-Library Management
|
||
- Create and manage multiple independent libraries, each with its own root path
|
||
- Enable/disable libraries individually
|
||
- Delete a library cascades to all its books, jobs, and metadata
|
||
|
||
### Scanning & Indexing
|
||
- **Incremental scan**: uses directory mtime tracking to skip unchanged directories
|
||
- **Full rebuild**: force re-walk all directories, ignoring cached mtimes
|
||
- **Rescan**: deep rescan to discover newly supported formats
|
||
- **Two-phase pipeline**:
|
||
- Phase 1 (Discovery): fast filename-based metadata extraction (no archive I/O)
|
||
- Phase 2 (Analysis): extract page counts, first page image from archives
|
||
|
||
### Real-Time Monitoring
|
||
- **Automatic periodic scanning**: configurable interval (default 5 seconds)
|
||
- **Filesystem watcher**: real-time detection of file changes for instant indexing
|
||
- Each can be toggled per library (`monitor_enabled`, `watcher_enabled`)
|
||
|
||
---
|
||
|
||
## Books
|
||
|
||
### Format Support
|
||
- **CBZ** (ZIP-based comic archives)
|
||
- **CBR** (RAR-based comic archives)
|
||
- **PDF**
|
||
- **EPUB**
|
||
- Automatic format detection from file extension and magic bytes
|
||
|
||
### Metadata Extraction
|
||
- **Title**: derived from filename or external metadata
|
||
- **Series**: derived from directory structure (first directory level under library root)
|
||
- **Volume**: extracted from filename with pattern detection:
|
||
- `T##` (Tome) — most common for French comics
|
||
- `Vol.##`, `Vol ##`, `Volume ##`
|
||
- `###` (standalone number)
|
||
- `-## ` (dash-separated)
|
||
- **Author(s)**: single scalar and array support
|
||
- **Page count**: extracted from archive analysis
|
||
- **Language**, **kind** (ebook, comic, bd)
|
||
|
||
### Thumbnails
|
||
- Generated from the first page of each archive
|
||
- Output format configurable: WebP (default), JPEG, PNG
|
||
- Configurable dimensions (default 300×400)
|
||
- Lazy generation: created on first access if missing
|
||
- Bulk operations: rebuild missing or regenerate all
|
||
|
||
### CBR to CBZ Conversion
|
||
- Convert RAR archives to ZIP format
|
||
- Tracked as background job with progress
|
||
|
||
---
|
||
|
||
## Series
|
||
|
||
### Automatic Aggregation
|
||
- Series derived from directory structure during scanning
|
||
- Books without series grouped as "unclassified"
|
||
|
||
### Series Metadata
|
||
- Description, publisher, start year, status (`ongoing`, `ended`, `completed`, `on_hold`, `hiatus`)
|
||
- Total volume count (from external providers)
|
||
- Authors (aggregated from books or metadata)
|
||
|
||
### Filtering & Discovery
|
||
- Filter by: series name (partial match), reading status, series status, metadata provider linkage
|
||
- Sort by: name, reading status, book count
|
||
- **Missing books detection**: identifies gaps in volume numbering within a series
|
||
|
||
---
|
||
|
||
## Reading Progress
|
||
|
||
### Per-Book Tracking
|
||
- Three states: `unread` (default), `reading`, `read`
|
||
- Current page tracking when status is `reading`
|
||
- `last_read_at` timestamp auto-updated
|
||
|
||
### Series-Level Status
|
||
- Calculated from book statuses:
|
||
- All read → series `read`
|
||
- None read → series `unread`
|
||
- Mixed → series `reading`
|
||
|
||
### Bulk Operations
|
||
- Mark entire series as read (updates all books)
|
||
|
||
---
|
||
|
||
## Search & Discovery
|
||
|
||
### Full-Text Search
|
||
- PostgreSQL-based (`ILIKE` + `pg_trgm`)
|
||
- Searches across: book titles, series names, authors (scalar and array fields), series metadata authors
|
||
- Case-insensitive partial matching
|
||
- Library-scoped filtering
|
||
|
||
### Results
|
||
- Book hits: title, authors, series, volume, language, kind
|
||
- Series hits: name, book count, read count, first book (for linking)
|
||
- Processing time included in response
|
||
|
||
---
|
||
|
||
## Authors
|
||
|
||
- Unique author aggregation from books and series metadata
|
||
- Per-author book and series count
|
||
- Searchable by name (partial match)
|
||
- Sortable by name or book count
|
||
|
||
---
|
||
|
||
## External Metadata
|
||
|
||
### Supported Providers
|
||
| Provider | Focus |
|
||
|----------|-------|
|
||
| Google Books | General books (default fallback) |
|
||
| ComicVine | Comics |
|
||
| BedéThèque | Franco-Belgian comics |
|
||
| AniList | Manga/anime |
|
||
| Open Library | General books |
|
||
|
||
### Provider Configuration
|
||
- Global default provider with library-level override
|
||
- Fallback provider if primary is unavailable
|
||
|
||
### Matching Workflow
|
||
1. **Search**: query a provider, get candidates with confidence scores
|
||
2. **Match**: link a series to an external result (status `pending`)
|
||
3. **Approve**: validate and sync metadata to series and books
|
||
4. **Reject**: discard a match
|
||
|
||
### Batch Processing
|
||
- Auto-match all series in a library via `metadata_batch` job
|
||
- Configurable confidence threshold
|
||
- Result statuses: `auto_matched`, `no_results`, `too_many_results`, `low_confidence`, `already_linked`
|
||
|
||
### Metadata Refresh
|
||
- Update approved links with latest data from providers
|
||
- Change tracking reports per series/book
|
||
- Non-destructive: only updates when provider has new data
|
||
|
||
### Field Locking
|
||
- Individual book fields can be locked to prevent external sync from overwriting manual edits
|
||
|
||
---
|
||
|
||
## External Integrations
|
||
|
||
### Komga Sync
|
||
- Import reading progress from a Komga server
|
||
- Matches local series/books by name
|
||
- Detailed sync report: matched, already read, newly marked, unmatched
|
||
|
||
### Prowlarr (Indexer Search)
|
||
- Search Prowlarr for missing volumes in a series
|
||
- Volume pattern matching against release titles
|
||
- Results: title, size, seeders/leechers, download URL, matched missing volumes
|
||
|
||
### qBittorrent
|
||
- Add torrents directly from Prowlarr search results
|
||
- Connection test endpoint
|
||
|
||
---
|
||
|
||
## Page Rendering & Caching
|
||
|
||
### Page Extraction
|
||
- Render any page from supported archive formats
|
||
- 1-indexed page numbers
|
||
|
||
### Image Processing
|
||
- Output formats: original, JPEG, PNG, WebP
|
||
- Quality parameter (1–100)
|
||
- Max width parameter (1–2160 px)
|
||
- Configurable resampling filter: lanczos3, nearest, triangle/bilinear
|
||
- Concurrent render limit (default 8) with semaphore
|
||
|
||
### Caching
|
||
- **LRU in-memory cache**: 512 entries
|
||
- **Disk cache**: SHA256-keyed, two-level directory structure
|
||
- Cache key = hash(path + page + format + quality + width)
|
||
- Configurable cache directory and max size
|
||
- Manual cache clear via settings
|
||
|
||
---
|
||
|
||
## Background Jobs
|
||
|
||
### Job Types
|
||
| Type | Description |
|
||
|------|-------------|
|
||
| `rebuild` | Incremental scan |
|
||
| `full_rebuild` | Full filesystem rescan |
|
||
| `rescan` | Deep rescan for new formats |
|
||
| `thumbnail_rebuild` | Generate missing thumbnails |
|
||
| `thumbnail_regenerate` | Clear and regenerate all thumbnails |
|
||
| `cbr_to_cbz` | Convert RAR to ZIP |
|
||
| `metadata_batch` | Auto-match series to metadata |
|
||
| `metadata_refresh` | Update approved metadata links |
|
||
|
||
### Job Lifecycle
|
||
- Status flow: `pending` → `running` → `success` | `failed` | `cancelled`
|
||
- Intermediate statuses: `extracting_pages`, `generating_thumbnails`
|
||
- Real-time progress via **Server-Sent Events** (SSE)
|
||
- Per-file error tracking (non-fatal: job continues on errors)
|
||
- Cancellation support for pending/running jobs
|
||
|
||
### Progress Tracking
|
||
- Percentage (0–100), current file, processed/total counts
|
||
- Timing: started_at, finished_at, phase2_started_at
|
||
- Stats JSON blob with job-specific metrics
|
||
|
||
---
|
||
|
||
## Authentication & Security
|
||
|
||
### Token System
|
||
- **Bootstrap token**: admin token via `API_BOOTSTRAP_TOKEN` env var
|
||
- **API tokens**: create, list, revoke with scopes
|
||
- Token format: `stl_{prefix}_{secret}` with Argon2 hashing
|
||
- Expiration dates, last usage tracking, revocation
|
||
|
||
### Access Control
|
||
- **Two scopes**: `admin` (full access) and `read` (read-only)
|
||
- Route-level middleware enforcement
|
||
- Rate limiting: configurable sliding window (default 120 req/s)
|
||
|
||
---
|
||
|
||
## Backoffice (Web UI)
|
||
|
||
### Dashboard
|
||
- Statistics cards: books, series, authors, libraries
|
||
- Donut charts: reading status breakdown, format distribution
|
||
- Bar charts: books per language
|
||
- Per-library reading progress bars
|
||
- Top series by book/page count
|
||
- Monthly addition timeline
|
||
- Metadata coverage stats
|
||
|
||
### Pages
|
||
- **Libraries**: list, create, delete, configure monitoring and metadata provider
|
||
- **Books**: global list with filtering/sorting, detail view with metadata and page rendering
|
||
- **Series**: global list, per-library view, detail with metadata management
|
||
- **Authors**: list with book/series counts, detail with author's books
|
||
- **Jobs**: history, live progress via SSE, error details
|
||
- **Tokens**: create, list, revoke API tokens
|
||
- **Settings**: image processing, cache, thumbnails, external services (Prowlarr, qBittorrent)
|
||
|
||
### Interactive Features
|
||
- Real-time search with suggestions
|
||
- Metadata search and matching modals
|
||
- Prowlarr search modal for missing volumes
|
||
- Folder browser/picker for library paths
|
||
- Book/series editing forms
|
||
- Quick reading status toggles
|
||
- CBR to CBZ conversion trigger
|
||
|
||
---
|
||
|
||
## API
|
||
|
||
### Documentation
|
||
- OpenAPI/Swagger UI available at `/swagger-ui`
|
||
- Health check (`/health`), readiness (`/ready`), Prometheus metrics (`/metrics`)
|
||
|
||
### Public Endpoints (no auth)
|
||
- `GET /health`, `GET /ready`, `GET /metrics`, `GET /swagger-ui`
|
||
|
||
### Read Endpoints (read scope)
|
||
- Libraries, books, series, authors listing and detail
|
||
- Book pages and thumbnails
|
||
- Reading progress get/update
|
||
- Full-text search, collection statistics
|
||
|
||
### Admin Endpoints (admin scope)
|
||
- Library CRUD and configuration
|
||
- Book metadata editing, CBR conversion
|
||
- Series metadata editing
|
||
- Indexing job management (trigger, cancel, stream)
|
||
- API token management
|
||
- Metadata operations (search, match, approve, reject, batch, refresh)
|
||
- External integrations (Prowlarr, qBittorrent, Komga)
|
||
- Application settings and cache management
|
||
|
||
---
|
||
|
||
## Database
|
||
|
||
### Key Design Decisions
|
||
- PostgreSQL with `pg_trgm` for full-text search (no external search engine)
|
||
- All deletions cascade from libraries
|
||
- Unique constraints: file paths, token prefixes, metadata links (library + series + provider)
|
||
- Directory mtime caching for incremental scan optimization
|
||
- Connection pool: 10 (API), 20 (indexer)
|
||
|
||
### Archive Resilience
|
||
- CBZ: fallback streaming reader if central directory corrupted
|
||
- CBR: RAR extraction via system `unar`, fallback to CBZ parsing
|
||
- PDF: `pdfinfo` for page count, `pdftoppm` for rendering
|
||
- EPUB: ZIP-based extraction
|
||
- FD exhaustion detection: aborts if too many consecutive IO errors
|