Files
stripstream-librarian/docs/FEATURES.md
Froidefond Julien bd74c9e3e3
All checks were successful
Deploy with Docker Compose / deploy (push) Successful in 1m1s
docs: add comprehensive features list to README and docs/FEATURES.md
Replace the minimal README features section with a concise categorized
summary and link to a detailed docs/FEATURES.md covering all features,
business rules, API endpoints, and integrations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 14:34:36 +01:00

9.9 KiB
Raw Permalink Blame History

Stripstream Librarian — Features & Business Rules

Libraries

Multi-Library Management

  • Create and manage multiple independent libraries, each with its own root path
  • Enable/disable libraries individually
  • Delete a library cascades to all its books, jobs, and metadata

Scanning & Indexing

  • Incremental scan: uses directory mtime tracking to skip unchanged directories
  • Full rebuild: force re-walk all directories, ignoring cached mtimes
  • Rescan: deep rescan to discover newly supported formats
  • Two-phase pipeline:
    • Phase 1 (Discovery): fast filename-based metadata extraction (no archive I/O)
    • Phase 2 (Analysis): extract page counts, first page image from archives

Real-Time Monitoring

  • Automatic periodic scanning: configurable interval (default 5 seconds)
  • Filesystem watcher: real-time detection of file changes for instant indexing
  • Each can be toggled per library (monitor_enabled, watcher_enabled)

Books

Format Support

  • CBZ (ZIP-based comic archives)
  • CBR (RAR-based comic archives)
  • PDF
  • EPUB
  • Automatic format detection from file extension and magic bytes

Metadata Extraction

  • Title: derived from filename or external metadata
  • Series: derived from directory structure (first directory level under library root)
  • Volume: extracted from filename with pattern detection:
    • T## (Tome) — most common for French comics
    • Vol.##, Vol ##, Volume ##
    • ### (standalone number)
    • -## (dash-separated)
  • Author(s): single scalar and array support
  • Page count: extracted from archive analysis
  • Language, kind (ebook, comic, bd)

Thumbnails

  • Generated from the first page of each archive
  • Output format configurable: WebP (default), JPEG, PNG
  • Configurable dimensions (default 300×400)
  • Lazy generation: created on first access if missing
  • Bulk operations: rebuild missing or regenerate all

CBR to CBZ Conversion

  • Convert RAR archives to ZIP format
  • Tracked as background job with progress

Series

Automatic Aggregation

  • Series derived from directory structure during scanning
  • Books without series grouped as "unclassified"

Series Metadata

  • Description, publisher, start year, status (ongoing, ended, completed, on_hold, hiatus)
  • Total volume count (from external providers)
  • Authors (aggregated from books or metadata)

Filtering & Discovery

  • Filter by: series name (partial match), reading status, series status, metadata provider linkage
  • Sort by: name, reading status, book count
  • Missing books detection: identifies gaps in volume numbering within a series

Reading Progress

Per-Book Tracking

  • Three states: unread (default), reading, read
  • Current page tracking when status is reading
  • last_read_at timestamp auto-updated

Series-Level Status

  • Calculated from book statuses:
    • All read → series read
    • None read → series unread
    • Mixed → series reading

Bulk Operations

  • Mark entire series as read (updates all books)

Search & Discovery

  • PostgreSQL-based (ILIKE + pg_trgm)
  • Searches across: book titles, series names, authors (scalar and array fields), series metadata authors
  • Case-insensitive partial matching
  • Library-scoped filtering

Results

  • Book hits: title, authors, series, volume, language, kind
  • Series hits: name, book count, read count, first book (for linking)
  • Processing time included in response

Authors

  • Unique author aggregation from books and series metadata
  • Per-author book and series count
  • Searchable by name (partial match)
  • Sortable by name or book count

External Metadata

Supported Providers

Provider Focus
Google Books General books (default fallback)
ComicVine Comics
BedéThèque Franco-Belgian comics
AniList Manga/anime
Open Library General books

Provider Configuration

  • Global default provider with library-level override
  • Fallback provider if primary is unavailable

Matching Workflow

  1. Search: query a provider, get candidates with confidence scores
  2. Match: link a series to an external result (status pending)
  3. Approve: validate and sync metadata to series and books
  4. Reject: discard a match

Batch Processing

  • Auto-match all series in a library via metadata_batch job
  • Configurable confidence threshold
  • Result statuses: auto_matched, no_results, too_many_results, low_confidence, already_linked

Metadata Refresh

  • Update approved links with latest data from providers
  • Change tracking reports per series/book
  • Non-destructive: only updates when provider has new data

Field Locking

  • Individual book fields can be locked to prevent external sync from overwriting manual edits

External Integrations

Komga Sync

  • Import reading progress from a Komga server
  • Matches local series/books by name
  • Detailed sync report: matched, already read, newly marked, unmatched
  • Search Prowlarr for missing volumes in a series
  • Volume pattern matching against release titles
  • Results: title, size, seeders/leechers, download URL, matched missing volumes

qBittorrent

  • Add torrents directly from Prowlarr search results
  • Connection test endpoint

Page Rendering & Caching

Page Extraction

  • Render any page from supported archive formats
  • 1-indexed page numbers

Image Processing

  • Output formats: original, JPEG, PNG, WebP
  • Quality parameter (1100)
  • Max width parameter (12160 px)
  • Configurable resampling filter: lanczos3, nearest, triangle/bilinear
  • Concurrent render limit (default 8) with semaphore

Caching

  • LRU in-memory cache: 512 entries
  • Disk cache: SHA256-keyed, two-level directory structure
  • Cache key = hash(path + page + format + quality + width)
  • Configurable cache directory and max size
  • Manual cache clear via settings

Background Jobs

Job Types

Type Description
rebuild Incremental scan
full_rebuild Full filesystem rescan
rescan Deep rescan for new formats
thumbnail_rebuild Generate missing thumbnails
thumbnail_regenerate Clear and regenerate all thumbnails
cbr_to_cbz Convert RAR to ZIP
metadata_batch Auto-match series to metadata
metadata_refresh Update approved metadata links

Job Lifecycle

  • Status flow: pendingrunningsuccess | failed | cancelled
  • Intermediate statuses: extracting_pages, generating_thumbnails
  • Real-time progress via Server-Sent Events (SSE)
  • Per-file error tracking (non-fatal: job continues on errors)
  • Cancellation support for pending/running jobs

Progress Tracking

  • Percentage (0100), current file, processed/total counts
  • Timing: started_at, finished_at, phase2_started_at
  • Stats JSON blob with job-specific metrics

Authentication & Security

Token System

  • Bootstrap token: admin token via API_BOOTSTRAP_TOKEN env var
  • API tokens: create, list, revoke with scopes
  • Token format: stl_{prefix}_{secret} with Argon2 hashing
  • Expiration dates, last usage tracking, revocation

Access Control

  • Two scopes: admin (full access) and read (read-only)
  • Route-level middleware enforcement
  • Rate limiting: configurable sliding window (default 120 req/s)

Backoffice (Web UI)

Dashboard

  • Statistics cards: books, series, authors, libraries
  • Donut charts: reading status breakdown, format distribution
  • Bar charts: books per language
  • Per-library reading progress bars
  • Top series by book/page count
  • Monthly addition timeline
  • Metadata coverage stats

Pages

  • Libraries: list, create, delete, configure monitoring and metadata provider
  • Books: global list with filtering/sorting, detail view with metadata and page rendering
  • Series: global list, per-library view, detail with metadata management
  • Authors: list with book/series counts, detail with author's books
  • Jobs: history, live progress via SSE, error details
  • Tokens: create, list, revoke API tokens
  • Settings: image processing, cache, thumbnails, external services (Prowlarr, qBittorrent)

Interactive Features

  • Real-time search with suggestions
  • Metadata search and matching modals
  • Prowlarr search modal for missing volumes
  • Folder browser/picker for library paths
  • Book/series editing forms
  • Quick reading status toggles
  • CBR to CBZ conversion trigger

API

Documentation

  • OpenAPI/Swagger UI available at /swagger-ui
  • Health check (/health), readiness (/ready), Prometheus metrics (/metrics)

Public Endpoints (no auth)

  • GET /health, GET /ready, GET /metrics, GET /swagger-ui

Read Endpoints (read scope)

  • Libraries, books, series, authors listing and detail
  • Book pages and thumbnails
  • Reading progress get/update
  • Full-text search, collection statistics

Admin Endpoints (admin scope)

  • Library CRUD and configuration
  • Book metadata editing, CBR conversion
  • Series metadata editing
  • Indexing job management (trigger, cancel, stream)
  • API token management
  • Metadata operations (search, match, approve, reject, batch, refresh)
  • External integrations (Prowlarr, qBittorrent, Komga)
  • Application settings and cache management

Database

Key Design Decisions

  • PostgreSQL with pg_trgm for full-text search (no external search engine)
  • All deletions cascade from libraries
  • Unique constraints: file paths, token prefixes, metadata links (library + series + provider)
  • Directory mtime caching for incremental scan optimization
  • Connection pool: 10 (API), 20 (indexer)

Archive Resilience

  • CBZ: fallback streaming reader if central directory corrupted
  • CBR: RAR extraction via system unar, fallback to CBZ parsing
  • PDF: pdfinfo for page count, pdftoppm for rendering
  • EPUB: ZIP-based extraction
  • FD exhaustion detection: aborts if too many consecutive IO errors