2
0
Просмотр исходного кода

feat(repository): Implement Repository + Crash-Safe (Epic 1.2)

- Story 1.2.1: Data models (3SP)
  * WorkItem: work_id, file_path, status, metadata, progress
  * ChapterItem: work_id, chapter_index, content, translation, status
  * FailureRecord: error_type, error_message, retry_count, resolved
  * WorkStatus enum: PENDING, FINGERPRINTING, CLEANING, etc.
  * ChapterStatus enum: PENDING, PROCESSING, COMPLETED, FAILED

- Story 1.2.2: JSONLStore with atomic write (4SP)
  * JSONL format: one JSON object per line
  * Atomic write: temp file + rename + fsync
  * Append mode for efficient additions
  * Work-specific directories for isolation
  * Skip corrupted lines on read
  * Storage structure: base_dir/{work_items.jsonl, work_id/{chapters,failures}.jsonl}

- Story 1.2.3: Repository interface (3SP)
  * CRUD operations for WorkItem, ChapterItem, FailureRecord
  * create_work(), get_work(), list_works(), update_work(), delete_work()
  * save_chapter(), get_chapter(), get_chapters(), get_pending_chapters()
  * record_failure(), get_failures(), resolve_failure()
  * get_work_stats(), get_all_stats() for analytics
  * Custom exceptions: WorkNotFoundError, ChapterNotFoundError

- Crash-Safety guarantees:
  * Atomic writes prevent data corruption on crashes
  * JSONL format limits damage to last line only
  * Fsync ensures data is written to disk
  * Temp file cleanup on errors

Epic 1.2 completed ✅ (10 SP)

Phase 1a (Infrastructure Core) completed ✅:
  - Epic 1.1: State Machine (25 SP)
  - Epic 1.2: Repository (10 SP)
  - Epic 4: Glossary (26 SP)
  Total: 61 SP

Part of Phase 1a: Core Infrastructure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
d8dfun 2 дней назад
Родитель
Сommit
cf5c97a984
5 измененных файлов с 1793 добавлено и 0 удалено
  1. 20 0
      src/repository/__init__.py
  2. 359 0
      src/repository/jsonl_store.py
  3. 300 0
      src/repository/models.py
  4. 439 0
      src/repository/repository.py
  5. 675 0
      tests/test_repository.py

+ 20 - 0
src/repository/__init__.py

@@ -0,0 +1,20 @@
+"""
+Repository module for data persistence and crash-safe storage.
+
+This module provides a data persistence layer using JSONL format with
+atomic write guarantees to ensure no data loss on crashes.
+"""
+
+from .models import WorkItem, ChapterItem, FailureRecord, WorkStatus, ChapterStatus
+from .jsonl_store import JSONLStore
+from .repository import Repository
+
+__all__ = [
+    "WorkItem",
+    "ChapterItem",
+    "FailureRecord",
+    "WorkStatus",
+    "ChapterStatus",
+    "JSONLStore",
+    "Repository",
+]

+ 359 - 0
src/repository/jsonl_store.py

@@ -0,0 +1,359 @@
+"""
+JSONL (JSON Lines) storage implementation with atomic write guarantees.
+
+JSONL format stores one JSON object per line, making it:
+- Append-friendly (no need to rewrite entire file)
+- Crash-safe (partial writes only affect last line)
+- Easy to stream and process line by line
+"""
+
+import json
+from pathlib import Path
+from typing import List, Optional, Iterator, Any, Dict
+from dataclasses import asdict
+
+from .models import WorkItem, ChapterItem, FailureRecord
+
+
+class JSONLError(Exception):
+    """Base exception for JSONL storage errors."""
+
+    pass
+
+
+class JSONLStore:
+    """
+    Crash-safe JSONL file storage.
+
+    Features:
+    - Atomic writes using temp file + rename
+    - One JSON object per line (JSONL format)
+    - Append mode for efficient additions
+    - Automatic directory creation
+
+    Directory structure:
+        base_dir/
+        ├── work_items.jsonl      # All work items
+        ├── work_id_1/
+        │   ├── chapters.jsonl    # Chapters for work_id_1
+        │   └── failures.jsonl    # Failures for work_id_1
+        └── work_id_2/
+            ├── chapters.jsonl
+            └── failures.jsonl
+    """
+
+    def __init__(self, base_dir: Path):
+        """
+        Initialize the JSONL store.
+
+        Args:
+            base_dir: Base directory for all storage
+        """
+        self.base_dir = Path(base_dir)
+        self.base_dir.mkdir(parents=True, exist_ok=True)
+
+    def _atomic_write(self, path: Path, content: str) -> None:
+        """
+        Write content to file atomically.
+
+        Uses temp file + rename pattern to ensure atomicity.
+        If the process crashes during write, the original file remains intact.
+
+        Args:
+            path: Target file path
+            content: Content to write
+
+        Raises:
+            JSONLError: If write fails
+        """
+        # Create parent directory if needed
+        path.parent.mkdir(parents=True, exist_ok=True)
+
+        # Write to temporary file first
+        temp_path = path.with_suffix(path.suffix + ".tmp")
+
+        try:
+            with open(temp_path, "w", encoding="utf-8") as f:
+                f.write(content)
+                f.flush()  # Ensure data is written to disk
+                import os
+                os.fsync(f.fileno())  # Force write to disk
+
+            # Atomic rename (overwrites existing if present)
+            temp_path.replace(path)
+
+        except Exception as e:
+            # Clean up temp file on error
+            if temp_path.exists():
+                temp_path.unlink()
+            raise JSONLError(f"Failed to write {path}: {e}") from e
+
+    def _append_line(self, path: Path, line: str) -> None:
+        """
+        Append a single line to a file (not atomic, but safe for appends).
+
+        For JSONL, appends are generally safe because:
+        1. Each line is independent
+        2. A partial write only corrupts the last line
+        3. We can detect and skip incomplete lines on read
+
+        Args:
+            path: Target file path
+            line: Line to append (should include newline)
+        """
+        path.parent.mkdir(parents=True, exist_ok=True)
+
+        with open(path, "a", encoding="utf-8") as f:
+            f.write(line)
+            f.flush()
+            import os
+            os.fsync(f.fileno())
+
+    def _read_lines(self, path: Path) -> Iterator[str]:
+        """
+        Read all valid lines from a JSONL file.
+
+        Skips incomplete/corrupted lines.
+
+        Args:
+            path: File to read
+
+        Yields:
+            Valid JSON lines
+        """
+        if not path.exists():
+            return
+
+        with open(path, "r", encoding="utf-8") as f:
+            for line in f:
+                line = line.strip()
+                if line:  # Skip empty lines
+                    try:
+                        # Validate JSON
+                        json.loads(line)
+                        yield line
+                    except json.JSONDecodeError:
+                        # Skip corrupted lines
+                        continue
+
+    # ========== WorkItem Operations ==========
+
+    def save_work_item(self, item: WorkItem, mode: str = "append") -> None:
+        """
+        Save a work item to storage.
+
+        Args:
+            item: WorkItem to save
+            mode: "append" to add, "overwrite" to replace existing work_id
+
+        Raises:
+            JSONLError: If save fails
+        """
+        path = self.base_dir / "work_items.jsonl"
+        data = json.dumps(item.to_dict(), ensure_ascii=False)
+
+        if mode == "overwrite":
+            # Read all items, replace matching, write back
+            items = list(self.load_work_items())
+            items = [i for i in items if i.work_id != item.work_id]
+            items.append(item)
+
+            lines = [json.dumps(i.to_dict(), ensure_ascii=False) for i in items]
+            content = "\n".join(lines) + "\n"
+            self._atomic_write(path, content)
+        else:
+            # Append mode (efficient)
+            self._append_line(path, data + "\n")
+
+    def load_work_items(self) -> Iterator[WorkItem]:
+        """
+        Load all work items from storage.
+
+        Yields:
+            WorkItem instances
+
+        Note: Returns duplicates if same work_id was saved multiple times.
+              Use Repository.get_work() for deduplication.
+        """
+        path = self.base_dir / "work_items.jsonl"
+
+        for line in self._read_lines(path):
+            data = json.loads(line)
+            yield WorkItem.from_dict(data)
+
+    def get_work_item(self, work_id: str) -> Optional[WorkItem]:
+        """
+        Get a specific work item by ID.
+
+        Args:
+            work_id: Work item ID to fetch
+
+        Returns:
+            WorkItem if found, None otherwise
+        """
+        for item in self.load_work_items():
+            if item.work_id == work_id:
+                return item
+        return None
+
+    # ========== ChapterItem Operations ==========
+
+    def save_chapter(self, chapter: ChapterItem, mode: str = "append") -> None:
+        """
+        Save a chapter to storage.
+
+        Chapters are stored in work-specific directories.
+
+        Args:
+            chapter: ChapterItem to save
+            mode: "append" or "overwrite"
+
+        Raises:
+            JSONLError: If save fails
+        """
+        work_dir = self.base_dir / chapter.work_id
+        path = work_dir / "chapters.jsonl"
+        data = json.dumps(chapter.to_dict(), ensure_ascii=False)
+
+        if mode == "overwrite":
+            # Read all chapters, replace matching, write back
+            chapters = list(self.load_chapters(chapter.work_id))
+            chapters = [c for c in chapters if c.chapter_index != chapter.chapter_index]
+            chapters.append(chapter)
+
+            lines = [json.dumps(c.to_dict(), ensure_ascii=False) for c in chapters]
+            content = "\n".join(lines) + "\n"
+            self._atomic_write(path, content)
+        else:
+            self._append_line(path, data + "\n")
+
+    def load_chapters(self, work_id: str) -> Iterator[ChapterItem]:
+        """
+        Load all chapters for a work item.
+
+        Args:
+            work_id: Work item ID
+
+        Yields:
+            ChapterItem instances
+        """
+        path = self.base_dir / work_id / "chapters.jsonl"
+
+        for line in self._read_lines(path):
+            data = json.loads(line)
+            yield ChapterItem.from_dict(data)
+
+    def get_chapter(self, work_id: str, chapter_index: int) -> Optional[ChapterItem]:
+        """
+        Get a specific chapter by index.
+
+        Args:
+            work_id: Work item ID
+            chapter_index: Chapter index
+
+        Returns:
+            ChapterItem if found, None otherwise
+        """
+        for chapter in self.load_chapters(work_id):
+            if chapter.chapter_index == chapter_index:
+                return chapter
+        return None
+
+    # ========== FailureRecord Operations ==========
+
+    def save_failure(self, failure: FailureRecord) -> None:
+        """
+        Save a failure record.
+
+        Args:
+            failure: FailureRecord to save
+
+        Raises:
+            JSONLError: If save fails
+        """
+        work_dir = self.base_dir / failure.work_id
+        path = work_dir / "failures.jsonl"
+        data = json.dumps(failure.to_dict(), ensure_ascii=False)
+
+        self._append_line(path, data + "\n")
+
+    def load_failures(self, work_id: str, include_resolved: bool = False) -> Iterator[FailureRecord]:
+        """
+        Load failure records for a work item.
+
+        Args:
+            work_id: Work item ID
+            include_resolved: Whether to include resolved failures
+
+        Yields:
+            FailureRecord instances
+        """
+        path = self.base_dir / work_id / "failures.jsonl"
+
+        for line in self._read_lines(path):
+            data = json.loads(line)
+            failure = FailureRecord.from_dict(data)
+            if include_resolved or not failure.resolved:
+                yield failure
+
+    # ========== Utility Methods ==========
+
+    def delete_work(self, work_id: str) -> None:
+        """
+        Delete all data associated with a work item.
+
+        Args:
+            work_id: Work item ID to delete
+
+        Note: This is NOT atomic - use with caution.
+        """
+        work_dir = self.base_dir / work_id
+        if work_dir.exists():
+            import shutil
+            shutil.rmtree(work_dir)
+
+        # Also remove from work_items.jsonl by rewriting
+        path = self.base_dir / "work_items.jsonl"
+        if path.exists():
+            items = [i for i in self.load_work_items() if i.work_id != work_id]
+            if items:
+                lines = [json.dumps(i.to_dict(), ensure_ascii=False) for i in items]
+                content = "\n".join(lines) + "\n"
+                self._atomic_write(path, content)
+            else:
+                path.unlink()
+
+    def work_exists(self, work_id: str) -> bool:
+        """
+        Check if a work item exists.
+
+        Args:
+            work_id: Work item ID
+
+        Returns:
+            True if work item exists
+        """
+        return self.get_work_item(work_id) is not None
+
+    def get_storage_stats(self) -> Dict[str, Any]:
+        """
+        Get storage statistics.
+
+        Returns:
+            Dictionary with storage statistics
+        """
+        work_count = len(list(self.load_work_items()))
+
+        total_chapters = 0
+        total_failures = 0
+
+        for work in self.load_work_items():
+            total_chapters += len(list(self.load_chapters(work.work_id)))
+            total_failures += len(list(self.load_failures(work.work_id)))
+
+        return {
+            "total_works": work_count,
+            "total_chapters": total_chapters,
+            "total_failures": total_failures,
+            "storage_dir": str(self.base_dir),
+        }

+ 300 - 0
src/repository/models.py

@@ -0,0 +1,300 @@
+"""
+Data models for the repository module.
+
+This module defines the core data structures used for persisting
+work items, chapters, and failure records.
+"""
+
+from dataclasses import dataclass, field
+from datetime import datetime
+from enum import Enum
+from typing import Any, Dict, Optional
+import hashlib
+import json
+
+
+class WorkStatus(str, Enum):
+    """Status of a work item (translation job)."""
+
+    PENDING = "pending"
+    FINGERPRINTING = "fingerprinting"
+    CLEANING = "cleaning"
+    TERM_EXTRACTION = "term_extraction"
+    TRANSLATING = "translating"
+    UPLOADING = "uploading"
+    PAUSED = "paused"
+    COMPLETED = "completed"
+    FAILED = "failed"
+    CANCELLED = "cancelled"
+
+
+class ChapterStatus(str, Enum):
+    """Status of a chapter within a work item."""
+
+    PENDING = "pending"
+    PROCESSING = "processing"
+    COMPLETED = "completed"
+    FAILED = "failed"
+    SKIPPED = "skipped"
+
+
+@dataclass
+class WorkItem:
+    """
+    Represents a translation work item (a novel or document).
+
+    Attributes:
+        work_id: Unique identifier (MD5 hash of file path)
+        file_path: Path to the source file
+        file_size: Size of the source file in bytes
+        chapter_count: Total number of chapters
+        status: Current work status
+        created_at: Timestamp when work was created
+        updated_at: Timestamp of last update
+        metadata: Additional metadata (title, author, etc.)
+        progress: Current progress (0-100)
+    """
+
+    work_id: str
+    file_path: str
+    file_size: int
+    chapter_count: int
+    status: WorkStatus = WorkStatus.PENDING
+    created_at: datetime = field(default_factory=datetime.utcnow)
+    updated_at: datetime = field(default_factory=datetime.utcnow)
+    metadata: Dict[str, Any] = field(default_factory=dict)
+    progress: int = 0
+
+    def __post_init__(self):
+        """Generate work_id if not provided."""
+        if not self.work_id:
+            # Generate MD5 hash of file path as work_id
+            self.work_id = hashlib.md5(self.file_path.encode()).hexdigest()
+
+    @classmethod
+    def from_file(cls, file_path: str, **metadata) -> "WorkItem":
+        """
+        Create a WorkItem from a file path.
+
+        Args:
+            file_path: Path to the source file
+            **metadata: Additional metadata
+
+        Returns:
+            A new WorkItem instance
+        """
+        from pathlib import Path
+
+        path = Path(file_path)
+        file_size = path.stat().st_size if path.exists() else 0
+
+        return cls(
+            work_id="",  # Will be generated in __post_init__
+            file_path=str(path.absolute()),
+            file_size=file_size,
+            chapter_count=0,  # Will be updated during processing
+            metadata=metadata,
+        )
+
+    def touch(self) -> None:
+        """Update the updated_at timestamp."""
+        self.updated_at = datetime.utcnow()
+
+    def to_dict(self) -> Dict[str, Any]:
+        """Convert to dictionary for JSON serialization."""
+        return {
+            "work_id": self.work_id,
+            "file_path": self.file_path,
+            "file_size": self.file_size,
+            "chapter_count": self.chapter_count,
+            "status": self.status.value if isinstance(self.status, WorkStatus) else self.status,
+            "created_at": self.created_at.isoformat(),
+            "updated_at": self.updated_at.isoformat(),
+            "metadata": self.metadata,
+            "progress": self.progress,
+        }
+
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "WorkItem":
+        """Create from dictionary deserialized from JSON."""
+        status = data.get("status", "pending")
+        if isinstance(status, str):
+            try:
+                status = WorkStatus(status)
+            except ValueError:
+                status = WorkStatus.PENDING
+
+        return cls(
+            work_id=data["work_id"],
+            file_path=data["file_path"],
+            file_size=data["file_size"],
+            chapter_count=data["chapter_count"],
+            status=status,
+            created_at=datetime.fromisoformat(data["created_at"]),
+            updated_at=datetime.fromisoformat(data["updated_at"]),
+            metadata=data.get("metadata", {}),
+            progress=data.get("progress", 0),
+        )
+
+
+@dataclass
+class ChapterItem:
+    """
+    Represents a single chapter within a work item.
+
+    Attributes:
+        work_id: Parent work item ID
+        chapter_index: Zero-based chapter index
+        title: Chapter title
+        content: Chapter content (original text)
+        translation: Translated content (if available)
+        word_count: Number of words in the chapter
+        status: Chapter processing status
+        created_at: Timestamp when chapter was created
+        updated_at: Timestamp of last update
+        retry_count: Number of retry attempts
+        error_message: Last error message (if failed)
+    """
+
+    work_id: str
+    chapter_index: int
+    title: str
+    content: str
+    word_count: int = 0
+    translation: Optional[str] = None
+    status: ChapterStatus = ChapterStatus.PENDING
+    created_at: datetime = field(default_factory=datetime.utcnow)
+    updated_at: datetime = field(default_factory=datetime.utcnow)
+    retry_count: int = 0
+    error_message: Optional[str] = None
+
+    def __post_init__(self):
+        """Calculate word count from content."""
+        if self.content and not self.word_count:
+            # Simple word count (split by whitespace)
+            self.word_count = len(self.content.split())
+
+    def touch(self) -> None:
+        """Update the updated_at timestamp."""
+        self.updated_at = datetime.utcnow()
+
+    def to_dict(self) -> Dict[str, Any]:
+        """Convert to dictionary for JSON serialization."""
+        return {
+            "work_id": self.work_id,
+            "chapter_index": self.chapter_index,
+            "title": self.title,
+            "content": self.content,
+            "word_count": self.word_count,
+            "translation": self.translation,
+            "status": self.status.value if isinstance(self.status, ChapterStatus) else self.status,
+            "created_at": self.created_at.isoformat(),
+            "updated_at": self.updated_at.isoformat(),
+            "retry_count": self.retry_count,
+            "error_message": self.error_message,
+        }
+
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "ChapterItem":
+        """Create from dictionary deserialized from JSON."""
+        status = data.get("status", "pending")
+        if isinstance(status, str):
+            try:
+                status = ChapterStatus(status)
+            except ValueError:
+                status = ChapterStatus.PENDING
+
+        return cls(
+            work_id=data["work_id"],
+            chapter_index=data["chapter_index"],
+            title=data["title"],
+            content=data["content"],
+            word_count=data.get("word_count", 0),
+            translation=data.get("translation"),
+            status=status,
+            created_at=datetime.fromisoformat(data["created_at"]),
+            updated_at=datetime.fromisoformat(data["updated_at"]),
+            retry_count=data.get("retry_count", 0),
+            error_message=data.get("error_message"),
+        )
+
+
+@dataclass
+class FailureRecord:
+    """
+    Represents a failure that occurred during processing.
+
+    Attributes:
+        work_id: Associated work item ID
+        chapter_index: Chapter that failed (-1 for work-level failures)
+        error_type: Type of error (e.g., "ValueError", "ConnectionError")
+        error_message: Human-readable error message
+        traceback: Full traceback (optional)
+        retry_count: Number of retry attempts
+        timestamp: When the failure occurred
+        resolved: Whether the failure has been resolved
+    """
+
+    work_id: str
+    chapter_index: int
+    error_type: str
+    error_message: str
+    traceback: Optional[str] = None
+    retry_count: int = 0
+    timestamp: datetime = field(default_factory=datetime.utcnow)
+    resolved: bool = False
+
+    def to_dict(self) -> Dict[str, Any]:
+        """Convert to dictionary for JSON serialization."""
+        return {
+            "work_id": self.work_id,
+            "chapter_index": self.chapter_index,
+            "error_type": self.error_type,
+            "error_message": self.error_message,
+            "traceback": self.traceback,
+            "retry_count": self.retry_count,
+            "timestamp": self.timestamp.isoformat(),
+            "resolved": self.resolved,
+        }
+
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "FailureRecord":
+        """Create from dictionary deserialized from JSON."""
+        return cls(
+            work_id=data["work_id"],
+            chapter_index=data["chapter_index"],
+            error_type=data["error_type"],
+            error_message=data["error_message"],
+            traceback=data.get("traceback"),
+            retry_count=data.get("retry_count", 0),
+            timestamp=datetime.fromisoformat(data["timestamp"]),
+            resolved=data.get("resolved", False),
+        )
+
+    @classmethod
+    def from_exception(
+        cls,
+        work_id: str,
+        exception: Exception,
+        chapter_index: int = -1,
+        traceback_str: Optional[str] = None,
+    ) -> "FailureRecord":
+        """
+        Create a FailureRecord from an exception.
+
+        Args:
+            work_id: Associated work item ID
+            exception: The exception that occurred
+            chapter_index: Chapter that failed
+            traceback_str: Optional traceback string
+
+        Returns:
+            A new FailureRecord instance
+        """
+        return cls(
+            work_id=work_id,
+            chapter_index=chapter_index,
+            error_type=type(exception).__name__,
+            error_message=str(exception),
+            traceback=traceback_str,
+        )

+ 439 - 0
src/repository/repository.py

@@ -0,0 +1,439 @@
+"""
+Repository interface for data access.
+
+This module provides a high-level interface for managing work items,
+chapters, and failure records.
+"""
+
+from pathlib import Path
+from typing import List, Optional, Iterator, Dict, Any
+
+from .models import WorkItem, ChapterItem, FailureRecord, WorkStatus, ChapterStatus
+from .jsonl_store import JSONLStore, JSONLError
+
+
+class RepositoryError(Exception):
+    """Base exception for repository errors."""
+
+    pass
+
+
+class WorkNotFoundError(RepositoryError):
+    """Raised when a work item is not found."""
+
+    pass
+
+
+class ChapterNotFoundError(RepositoryError):
+    """Raised when a chapter is not found."""
+
+    pass
+
+
+class Repository:
+    """
+    High-level repository interface for data persistence.
+
+    This class provides CRUD operations and convenience methods
+    for managing work items, chapters, and failure records.
+
+    Example:
+        >>> repo = Repository(Path("/data/bmad-translator"))
+        >>> work = repo.create_work("/path/to/novel.txt", title="My Novel")
+        >>> chapter = ChapterItem(...)
+        >>> repo.save_chapter(work.work_id, chapter)
+        >>> retrieved = repo.get_work(work.work_id)
+    """
+
+    def __init__(self, storage_dir: Path):
+        """
+        Initialize the repository.
+
+        Args:
+            storage_dir: Directory for data storage
+        """
+        self.storage_dir = Path(storage_dir)
+        self.store = JSONLStore(self.storage_dir)
+
+    # ========== WorkItem Operations ==========
+
+    def create_work(self, file_path: str, **metadata) -> WorkItem:
+        """
+        Create a new work item.
+
+        Args:
+            file_path: Path to the source file
+            **metadata: Additional metadata (title, author, etc.)
+
+        Returns:
+            Created WorkItem
+
+        Raises:
+            RepositoryError: If creation fails
+        """
+        work = WorkItem.from_file(file_path, **metadata)
+        work.touch()
+
+        self.store.save_work_item(work, mode="append")
+        return work
+
+    def get_work(self, work_id: str) -> Optional[WorkItem]:
+        """
+        Get a work item by ID.
+
+        Args:
+            work_id: Work item ID
+
+        Returns:
+            WorkItem if found, None otherwise
+        """
+        return self.store.get_work_item(work_id)
+
+    def get_work_or_raise(self, work_id: str) -> WorkItem:
+        """
+        Get a work item, raising an exception if not found.
+
+        Args:
+            work_id: Work item ID
+
+        Returns:
+            WorkItem
+
+        Raises:
+            WorkNotFoundError: If work not found
+        """
+        work = self.get_work(work_id)
+        if work is None:
+            raise WorkNotFoundError(f"Work {work_id} not found")
+        return work
+
+    def list_works(
+        self, status: Optional[WorkStatus] = None
+    ) -> List[WorkItem]:
+        """
+        List all work items, optionally filtered by status.
+
+        Args:
+            status: Optional status filter
+
+        Returns:
+            List of work items (deduplicated by work_id)
+        """
+        works: Dict[str, WorkItem] = {}
+
+        for work in self.store.load_work_items():
+            if status is None or work.status == status:
+                # Use last seen version (handles duplicates in JSONL)
+                works[work.work_id] = work
+
+        return list(works.values())
+
+    def update_work(self, work: WorkItem) -> None:
+        """
+        Update a work item.
+
+        Args:
+            work: WorkItem with updated fields
+
+        Raises:
+            RepositoryError: If update fails
+        """
+        work.touch()
+        self.store.save_work_item(work, mode="overwrite")
+
+    def update_work_status(self, work_id: str, status: WorkStatus) -> None:
+        """
+        Update work item status.
+
+        Args:
+            work_id: Work item ID
+            status: New status
+
+        Raises:
+            WorkNotFoundError: If work not found
+        """
+        work = self.get_work_or_raise(work_id)
+        work.status = status
+        work.touch()
+        self.store.save_work_item(work, mode="overwrite")
+
+    def update_work_progress(self, work_id: str, progress: int) -> None:
+        """
+        Update work item progress.
+
+        Args:
+            work_id: Work item ID
+            progress: Progress value (0-100)
+
+        Raises:
+            WorkNotFoundError: If work not found
+        """
+        work = self.get_work_or_raise(work_id)
+        work.progress = max(0, min(100, progress))
+        work.touch()
+        self.store.save_work_item(work, mode="overwrite")
+
+    def delete_work(self, work_id: str) -> None:
+        """
+        Delete a work item and all associated data.
+
+        Args:
+            work_id: Work item ID to delete
+        """
+        self.store.delete_work(work_id)
+
+    # ========== ChapterItem Operations ==========
+
+    def save_chapter(self, work_id: str, chapter: ChapterItem) -> None:
+        """
+        Save a chapter.
+
+        Args:
+            work_id: Work item ID
+            chapter: Chapter to save
+
+        Raises:
+            WorkNotFoundError: If work not found
+        """
+        # Verify work exists
+        if not self.store.work_exists(work_id):
+            raise WorkNotFoundError(f"Work {work_id} not found")
+
+        chapter.work_id = work_id
+        chapter.touch()
+        self.store.save_chapter(chapter, mode="overwrite")
+
+    def get_chapter(self, work_id: str, chapter_index: int) -> Optional[ChapterItem]:
+        """
+        Get a specific chapter.
+
+        Args:
+            work_id: Work item ID
+            chapter_index: Chapter index
+
+        Returns:
+            ChapterItem if found, None otherwise
+        """
+        return self.store.get_chapter(work_id, chapter_index)
+
+    def get_chapter_or_raise(self, work_id: str, chapter_index: int) -> ChapterItem:
+        """
+        Get a chapter, raising an exception if not found.
+
+        Args:
+            work_id: Work item ID
+            chapter_index: Chapter index
+
+        Returns:
+            ChapterItem
+
+        Raises:
+            ChapterNotFoundError: If chapter not found
+        """
+        chapter = self.get_chapter(work_id, chapter_index)
+        if chapter is None:
+            raise ChapterNotFoundError(
+                f"Chapter {chapter_index} for work {work_id} not found"
+            )
+        return chapter
+
+    def get_chapters(self, work_id: str) -> List[ChapterItem]:
+        """
+        Get all chapters for a work item.
+
+        Args:
+            work_id: Work item ID
+
+        Returns:
+            List of chapters (deduplicated by index)
+        """
+        chapters: Dict[int, ChapterItem] = {}
+
+        for chapter in self.store.load_chapters(work_id):
+            chapters[chapter.chapter_index] = chapter
+
+        return list(chapters.values())
+
+    def get_pending_chapters(self, work_id: str) -> List[ChapterItem]:
+        """
+        Get all pending chapters for a work item.
+
+        Args:
+            work_id: Work item ID
+
+        Returns:
+            List of pending chapters
+        """
+        return [
+            c for c in self.get_chapters(work_id)
+            if c.status == ChapterStatus.PENDING
+        ]
+
+    def get_failed_chapters(self, work_id: str) -> List[ChapterItem]:
+        """
+        Get all failed chapters for a work item.
+
+        Args:
+            work_id: Work item ID
+
+        Returns:
+            List of failed chapters
+        """
+        return [
+            c for c in self.get_chapters(work_id)
+            if c.status == ChapterStatus.FAILED
+        ]
+
+    def update_chapter_status(
+        self, work_id: str, chapter_index: int, status: ChapterStatus
+    ) -> None:
+        """
+        Update chapter status.
+
+        Args:
+            work_id: Work item ID
+            chapter_index: Chapter index
+            status: New status
+
+        Raises:
+            ChapterNotFoundError: If chapter not found
+        """
+        chapter = self.get_chapter_or_raise(work_id, chapter_index)
+        chapter.status = status
+        chapter.touch()
+        self.store.save_chapter(chapter, mode="overwrite")
+
+    def save_chapter_translation(
+        self, work_id: str, chapter_index: int, translation: str
+    ) -> None:
+        """
+        Save chapter translation.
+
+        Args:
+            work_id: Work item ID
+            chapter_index: Chapter index
+            translation: Translated content
+
+        Raises:
+            ChapterNotFoundError: If chapter not found
+        """
+        chapter = self.get_chapter_or_raise(work_id, chapter_index)
+        chapter.translation = translation
+        chapter.status = ChapterStatus.COMPLETED
+        chapter.touch()
+        self.store.save_chapter(chapter, mode="overwrite")
+
+    # ========== FailureRecord Operations ==========
+
+    def record_failure(
+        self, work_id: str, chapter_index: int, error: Exception,
+        traceback_str: Optional[str] = None
+    ) -> FailureRecord:
+        """
+        Record a failure.
+
+        Args:
+            work_id: Work item ID
+            chapter_index: Chapter index (-1 for work-level failures)
+            error: The exception that occurred
+            traceback_str: Optional traceback string
+
+        Returns:
+            Created FailureRecord
+        """
+        failure = FailureRecord.from_exception(
+            work_id, error, chapter_index, traceback_str
+        )
+        self.store.save_failure(failure)
+        return failure
+
+    def get_failures(
+        self, work_id: str, include_resolved: bool = False
+    ) -> List[FailureRecord]:
+        """
+        Get failure records for a work item.
+
+        Args:
+            work_id: Work item ID
+            include_resolved: Whether to include resolved failures
+
+        Returns:
+            List of failure records
+        """
+        return list(self.store.load_failures(work_id, include_resolved))
+
+    def resolve_failure(self, work_id: str, chapter_index: int) -> None:
+        """
+        Mark a failure as resolved.
+
+        Args:
+            work_id: Work item ID
+            chapter_index: Chapter index
+        """
+        # Re-save all failures with this chapter as resolved
+        failures = self.get_failures(work_id, include_resolved=False)
+        work_dir = self.storage_dir / work_id
+        path = work_dir / "failures.jsonl"
+
+        # Filter out resolved failures, keep unresolved
+        unresolved = [
+            f for f in failures
+            if not (f.chapter_index == chapter_index)
+        ]
+
+        # Mark the specific failure as resolved
+        for f in failures:
+            if f.chapter_index == chapter_index:
+                f.resolved = True
+                # Save the resolved version
+                self.store.save_failure(f)
+
+    # ========== Query and Utility Methods ==========
+
+    def get_work_stats(self, work_id: str) -> Dict[str, Any]:
+        """
+        Get statistics for a work item.
+
+        Args:
+            work_id: Work item ID
+
+        Returns:
+            Dictionary with work statistics
+        """
+        work = self.get_work_or_raise(work_id)
+        chapters = self.get_chapters(work_id)
+        failures = self.get_failures(work_id, include_resolved=False)
+
+        status_counts = {s.value: 0 for s in ChapterStatus}
+        for chapter in chapters:
+            status_counts[chapter.status.value] += 1
+
+        return {
+            "work_id": work_id,
+            "status": work.status.value,
+            "total_chapters": len(chapters),
+            "completed_chapters": status_counts.get("completed", 0),
+            "pending_chapters": status_counts.get("pending", 0),
+            "failed_chapters": status_counts.get("failed", 0),
+            "processing_chapters": status_counts.get("processing", 0),
+            "active_failures": len(failures),
+            "progress": work.progress,
+        }
+
+    def get_all_stats(self) -> Dict[str, Any]:
+        """
+        Get overall repository statistics.
+
+        Returns:
+            Dictionary with repository statistics
+        """
+        return self.store.get_storage_stats()
+
+    def list_work_ids(self) -> List[str]:
+        """
+        List all work item IDs.
+
+        Returns:
+            List of work IDs
+        """
+        return [w.work_id for w in self.list_works()]

+ 675 - 0
tests/test_repository.py

@@ -0,0 +1,675 @@
+"""
+Unit tests for the repository module.
+
+Tests cover models, JSONLStore, and Repository functionality.
+"""
+
+import tempfile
+import shutil
+from pathlib import Path
+from datetime import datetime
+import time
+
+import pytest
+
+from src.repository.models import (
+    WorkItem,
+    ChapterItem,
+    FailureRecord,
+    WorkStatus,
+    ChapterStatus,
+)
+from src.repository.jsonl_store import JSONLStore, JSONLError
+from src.repository.repository import (
+    Repository,
+    RepositoryError,
+    WorkNotFoundError,
+    ChapterNotFoundError,
+)
+
+
+class TestWorkItem:
+    """Test WorkItem model."""
+
+    def test_create_work_item(self):
+        """Test creating a work item."""
+        work = WorkItem(
+            work_id="test123",
+            file_path="/test/file.txt",
+            file_size=1000,
+            chapter_count=10,
+        )
+        assert work.work_id == "test123"
+        assert work.status == WorkStatus.PENDING
+
+    def test_work_item_auto_generate_id(self):
+        """Test that work_id is auto-generated from file path."""
+        work = WorkItem(
+            work_id="",  # Empty to trigger auto-generation
+            file_path="/test/file.txt",
+            file_size=1000,
+            chapter_count=10,
+        )
+        assert work.work_id != ""
+        assert len(work.work_id) == 32  # MD5 hash length
+
+    def test_work_item_from_file(self):
+        """Test creating WorkItem from file path."""
+        with tempfile.NamedTemporaryFile(delete=False) as f:
+            f.write(b"test content")
+            temp_path = f.name
+
+        try:
+            work = WorkItem.from_file(temp_path, title="Test Novel")
+            assert work.file_size == len(b"test content")
+            assert work.metadata["title"] == "Test Novel"
+        finally:
+            Path(temp_path).unlink()
+
+    def test_work_item_to_dict(self):
+        """Test converting WorkItem to dictionary."""
+        work = WorkItem(
+            work_id="test123",
+            file_path="/test/file.txt",
+            file_size=1000,
+            chapter_count=10,
+            status=WorkStatus.TRANSLATING,
+            metadata={"title": "Test"},
+        )
+        data = work.to_dict()
+
+        assert data["work_id"] == "test123"
+        assert data["status"] == "translating"
+        assert data["metadata"]["title"] == "Test"
+
+    def test_work_item_from_dict(self):
+        """Test creating WorkItem from dictionary."""
+        data = {
+            "work_id": "test123",
+            "file_path": "/test/file.txt",
+            "file_size": 1000,
+            "chapter_count": 10,
+            "status": "completed",
+            "created_at": datetime.utcnow().isoformat(),
+            "updated_at": datetime.utcnow().isoformat(),
+            "metadata": {},
+            "progress": 50,
+        }
+        work = WorkItem.from_dict(data)
+
+        assert work.work_id == "test123"
+        assert work.status == WorkStatus.COMPLETED
+        assert work.progress == 50
+
+    def test_touch_updates_timestamp(self):
+        """Test that touch() updates updated_at."""
+        work = WorkItem(
+            work_id="test",
+            file_path="/test.txt",
+            file_size=100,
+            chapter_count=1,
+        )
+        old_time = work.updated_at
+        time.sleep(0.01)  # Small delay
+        work.touch()
+
+        assert work.updated_at > old_time
+
+
+class TestChapterItem:
+    """Test ChapterItem model."""
+
+    def test_create_chapter(self):
+        """Test creating a chapter."""
+        chapter = ChapterItem(
+            work_id="work123",
+            chapter_index=0,
+            title="Chapter 1",
+            content="This is the content",
+        )
+        assert chapter.work_id == "work123"
+        assert chapter.word_count == 4  # Auto-calculated
+
+    def test_chapter_word_count(self):
+        """Test word count calculation."""
+        chapter = ChapterItem(
+            work_id="work123",
+            chapter_index=0,
+            title="Chapter 1",
+            content="One two three four five",
+        )
+        assert chapter.word_count == 5
+
+    def test_chapter_to_dict(self):
+        """Test converting chapter to dictionary."""
+        chapter = ChapterItem(
+            work_id="work123",
+            chapter_index=0,
+            title="Chapter 1",
+            content="Content",
+            translation="Translated",
+        )
+        data = chapter.to_dict()
+
+        assert data["work_id"] == "work123"
+        assert data["translation"] == "Translated"
+
+    def test_chapter_from_dict(self):
+        """Test creating chapter from dictionary."""
+        data = {
+            "work_id": "work123",
+            "chapter_index": 0,
+            "title": "Chapter 1",
+            "content": "Content",
+            "word_count": 1,
+            "status": "completed",
+            "created_at": datetime.utcnow().isoformat(),
+            "updated_at": datetime.utcnow().isoformat(),
+            "retry_count": 0,
+        }
+        chapter = ChapterItem.from_dict(data)
+
+        assert chapter.work_id == "work123"
+        assert chapter.status == ChapterStatus.COMPLETED
+
+
+class TestFailureRecord:
+    """Test FailureRecord model."""
+
+    def test_create_failure(self):
+        """Test creating a failure record."""
+        failure = FailureRecord(
+            work_id="work123",
+            chapter_index=0,
+            error_type="ValueError",
+            error_message="Invalid value",
+        )
+        assert failure.work_id == "work123"
+        assert failure.resolved is False
+
+    def test_failure_from_exception(self):
+        """Test creating failure from exception."""
+        try:
+            raise ValueError("Test error")
+        except Exception as e:
+            failure = FailureRecord.from_exception("work123", e, chapter_index=5)
+
+        assert failure.error_type == "ValueError"
+        assert failure.error_message == "Test error"
+        assert failure.chapter_index == 5
+
+    def test_failure_to_dict(self):
+        """Test converting failure to dictionary."""
+        failure = FailureRecord(
+            work_id="work123",
+            chapter_index=0,
+            error_type="ValueError",
+            error_message="Test",
+        )
+        data = failure.to_dict()
+
+        assert data["work_id"] == "work123"
+        assert data["error_type"] == "ValueError"
+
+
+class TestJSONLStore:
+    """Test JSONLStore functionality."""
+
+    def test_init_creates_directory(self):
+        """Test that initialization creates base directory."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            store_dir = Path(tmpdir) / "store"
+            store = JSONLStore(store_dir)
+
+            assert store_dir.exists()
+
+    def test_save_and_load_work_item(self):
+        """Test saving and loading work items."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            store = JSONLStore(Path(tmpdir))
+
+            work = WorkItem(
+                work_id="test123",
+                file_path="/test.txt",
+                file_size=100,
+                chapter_count=1,
+            )
+            store.save_work_item(work)
+
+            loaded = list(store.load_work_items())
+            assert len(loaded) == 1
+            assert loaded[0].work_id == "test123"
+
+    def test_save_work_item_overwrite_mode(self):
+        """Test overwrite mode removes duplicates."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            store = JSONLStore(Path(tmpdir))
+
+            work = WorkItem(
+                work_id="test123",
+                file_path="/test.txt",
+                file_size=100,
+                chapter_count=1,
+            )
+
+            store.save_work_item(work, mode="append")
+            store.save_work_item(work, mode="append")  # Duplicate
+
+            loaded = list(store.load_work_items())
+            assert len(loaded) == 2  # JSONL has duplicates
+
+            # Now use overwrite
+            work.chapter_count = 5  # Modified
+            store.save_work_item(work, mode="overwrite")
+
+            loaded = list(store.load_work_items())
+            assert len(loaded) == 1  # Only one after overwrite
+            assert loaded[0].chapter_count == 5
+
+    def test_get_work_item(self):
+        """Test getting specific work item."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            store = JSONLStore(Path(tmpdir))
+
+            work1 = WorkItem(
+                work_id="work1",
+                file_path="/test1.txt",
+                file_size=100,
+                chapter_count=1,
+            )
+            work2 = WorkItem(
+                work_id="work2",
+                file_path="/test2.txt",
+                file_size=200,
+                chapter_count=2,
+            )
+
+            store.save_work_item(work1)
+            store.save_work_item(work2)
+
+            found = store.get_work_item("work1")
+            assert found is not None
+            assert found.file_size == 100
+
+            not_found = store.get_work_item("work999")
+            assert not_found is None
+
+    def test_save_and_load_chapters(self):
+        """Test saving and loading chapters."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            store = JSONLStore(Path(tmpdir))
+
+            chapter1 = ChapterItem(
+                work_id="work123",
+                chapter_index=0,
+                title="Chapter 1",
+                content="Content 1",
+            )
+            chapter2 = ChapterItem(
+                work_id="work123",
+                chapter_index=1,
+                title="Chapter 2",
+                content="Content 2",
+            )
+
+            store.save_chapter(chapter1)
+            store.save_chapter(chapter2)
+
+            chapters = list(store.load_chapters("work123"))
+            assert len(chapters) == 2
+
+    def test_get_chapter(self):
+        """Test getting specific chapter."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            store = JSONLStore(Path(tmpdir))
+
+            chapter = ChapterItem(
+                work_id="work123",
+                chapter_index=5,
+                title="Chapter 5",
+                content="Content",
+            )
+            store.save_chapter(chapter)
+
+            found = store.get_chapter("work123", 5)
+            assert found is not None
+            assert found.title == "Chapter 5"
+
+            not_found = store.get_chapter("work123", 99)
+            assert not_found is None
+
+    def test_save_and_load_failures(self):
+        """Test saving and loading failures."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            store = JSONLStore(Path(tmpdir))
+
+            failure = FailureRecord(
+                work_id="work123",
+                chapter_index=0,
+                error_type="ValueError",
+                error_message="Test error",
+            )
+            store.save_failure(failure)
+
+            failures = list(store.load_failures("work123"))
+            assert len(failures) == 1
+            assert failures[0].error_message == "Test error"
+
+    def test_delete_work(self):
+        """Test deleting a work item."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            store = JSONLStore(Path(tmpdir))
+
+            work = WorkItem(
+                work_id="test123",
+                file_path="/test.txt",
+                file_size=100,
+                chapter_count=1,
+            )
+            store.save_work_item(work)
+
+            # Create chapter directory
+            chapter = ChapterItem(
+                work_id="test123",
+                chapter_index=0,
+                title="Chapter 1",
+                content="Content",
+            )
+            store.save_chapter(chapter)
+
+            # Delete
+            store.delete_work("test123")
+
+            # Verify deletion
+            assert not store.work_exists("test123")
+            assert not (Path(tmpdir) / "test123").exists()
+
+    def test_atomic_write_handles_empty_data(self):
+        """Test atomic write with empty data."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            store = JSONLStore(Path(tmpdir))
+            path = Path(tmpdir) / "test.jsonl"
+
+            store._atomic_write(path, "")
+            assert path.exists()
+            assert path.read_text() == ""
+
+    def test_corrupted_line_skipped(self):
+        """Test that corrupted lines are skipped during read."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            store_dir = Path(tmpdir)
+            store = JSONLStore(store_dir)
+
+            work = WorkItem(
+                work_id="test123",
+                file_path="/test.txt",
+                file_size=100,
+                chapter_count=1,
+            )
+            store.save_work_item(work)
+
+            # Corrupt the file by appending invalid JSON
+            path = store_dir / "work_items.jsonl"
+            with open(path, "a") as f:
+                f.write("\n{invalid json}\n")
+
+            # Should still load valid entries
+            loaded = list(store.load_work_items())
+            assert len(loaded) == 1
+            assert loaded[0].work_id == "test123"
+
+
+class TestRepository:
+    """Test Repository interface."""
+
+    def test_init(self):
+        """Test repository initialization."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            repo = Repository(Path(tmpdir))
+            assert repo.storage_dir == Path(tmpdir)
+
+    def test_create_and_get_work(self):
+        """Test creating and getting a work item."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            repo = Repository(Path(tmpdir))
+
+            # Create a test file
+            test_file = Path(tmpdir) / "test.txt"
+            test_file.write_text("test content")
+
+            work = repo.create_work(str(test_file), title="Test Novel")
+
+            retrieved = repo.get_work(work.work_id)
+            assert retrieved is not None
+            assert retrieved.metadata["title"] == "Test Novel"
+
+    def test_get_work_or_raise(self):
+        """Test get_work_or_raise raises on not found."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            repo = Repository(Path(tmpdir))
+
+            with pytest.raises(WorkNotFoundError):
+                repo.get_work_or_raise("nonexistent")
+
+    def test_list_works(self):
+        """Test listing all work items."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            repo = Repository(Path(tmpdir))
+
+            # Create test files
+            for i in range(3):
+                test_file = Path(tmpdir) / f"test{i}.txt"
+                test_file.write_text(f"content {i}")
+                repo.create_work(str(test_file))
+
+            works = repo.list_works()
+            assert len(works) == 3
+
+    def test_list_works_by_status(self):
+        """Test filtering works by status."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            repo = Repository(Path(tmpdir))
+
+            test_file = Path(tmpdir) / "test.txt"
+            test_file.write_text("content")
+
+            work1 = repo.create_work(str(test_file))
+            work2 = repo.create_work(str(test_file))
+
+            repo.update_work_status(work1.work_id, WorkStatus.TRANSLATING)
+
+            translating = repo.list_works(status=WorkStatus.TRANSLATING)
+            assert len(translating) == 1
+            assert translating[0].work_id == work1.work_id
+
+    def test_update_work_status(self):
+        """Test updating work status."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            repo = Repository(Path(tmpdir))
+
+            test_file = Path(tmpdir) / "test.txt"
+            test_file.write_text("content")
+
+            work = repo.create_work(str(test_file))
+            repo.update_work_status(work.work_id, WorkStatus.TRANSLATING)
+
+            retrieved = repo.get_work(work.work_id)
+            assert retrieved.status == WorkStatus.TRANSLATING
+
+    def test_save_and_get_chapter(self):
+        """Test saving and getting chapters."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            repo = Repository(Path(tmpdir))
+
+            test_file = Path(tmpdir) / "test.txt"
+            test_file.write_text("content")
+
+            work = repo.create_work(str(test_file))
+
+            chapter = ChapterItem(
+                work_id=work.work_id,
+                chapter_index=0,
+                title="Chapter 1",
+                content="Content",
+            )
+            repo.save_chapter(work.work_id, chapter)
+
+            retrieved = repo.get_chapter(work.work_id, 0)
+            assert retrieved is not None
+            assert retrieved.title == "Chapter 1"
+
+    def test_get_pending_chapters(self):
+        """Test getting pending chapters."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            repo = Repository(Path(tmpdir))
+
+            test_file = Path(tmpdir) / "test.txt"
+            test_file.write_text("content")
+
+            work = repo.create_work(str(test_file))
+
+            # Add chapters with different statuses
+            for i in range(3):
+                chapter = ChapterItem(
+                    work_id=work.work_id,
+                    chapter_index=i,
+                    title=f"Chapter {i}",
+                    content="Content",
+                    status=ChapterStatus.COMPLETED if i == 0 else ChapterStatus.PENDING,
+                )
+                repo.save_chapter(work.work_id, chapter)
+
+            pending = repo.get_pending_chapters(work.work_id)
+            assert len(pending) == 2
+
+    def test_record_failure(self):
+        """Test recording a failure."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            repo = Repository(Path(tmpdir))
+
+            test_file = Path(tmpdir) / "test.txt"
+            test_file.write_text("content")
+
+            work = repo.create_work(str(test_file))
+
+            try:
+                raise ValueError("Test error")
+            except Exception as e:
+                failure = repo.record_failure(work.work_id, 0, e)
+
+            assert failure.error_type == "ValueError"
+
+            failures = repo.get_failures(work.work_id)
+            assert len(failures) == 1
+
+    def test_get_work_stats(self):
+        """Test getting work statistics."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            repo = Repository(Path(tmpdir))
+
+            test_file = Path(tmpdir) / "test.txt"
+            test_file.write_text("content")
+
+            work = repo.create_work(str(test_file))
+
+            # Add chapters
+            for i in range(5):
+                chapter = ChapterItem(
+                    work_id=work.work_id,
+                    chapter_index=i,
+                    title=f"Chapter {i}",
+                    content="Content",
+                    status=ChapterStatus.COMPLETED if i < 3 else ChapterStatus.PENDING,
+                )
+                repo.save_chapter(work.work_id, chapter)
+
+            stats = repo.get_work_stats(work.work_id)
+            assert stats["total_chapters"] == 5
+            assert stats["completed_chapters"] == 3
+            assert stats["pending_chapters"] == 2
+
+    def test_save_chapter_translation(self):
+        """Test saving chapter translation."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            repo = Repository(Path(tmpdir))
+
+            test_file = Path(tmpdir) / "test.txt"
+            test_file.write_text("content")
+
+            work = repo.create_work(str(test_file))
+
+            chapter = ChapterItem(
+                work_id=work.work_id,
+                chapter_index=0,
+                title="Chapter 1",
+                content="Original",
+            )
+            repo.save_chapter(work.work_id, chapter)
+
+            repo.save_chapter_translation(work.work_id, 0, "Translated")
+
+            retrieved = repo.get_chapter(work.work_id, 0)
+            assert retrieved.translation == "Translated"
+            assert retrieved.status == ChapterStatus.COMPLETED
+
+
+class TestCrashSafety:
+    """Test crash-safety features."""
+
+    def test_atomic_write_preserves_original_on_error(self):
+        """Test that atomic write preserves original file on error."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            store = JSONLStore(Path(tmpdir))
+
+            work = WorkItem(
+                work_id="test",
+                file_path="/test.txt",
+                file_size=100,
+                chapter_count=1,
+            )
+            store.save_work_item(work)
+
+            # Get original content
+            path = Path(tmpdir) / "work_items.jsonl"
+            original_content = path.read_text()
+
+            # Simulate crash by creating a temp file and stopping
+            temp_path = path.with_suffix(".tmp")
+            temp_path.write_text("corrupted data")
+
+            # Original should be unchanged
+            assert path.read_text() == original_content
+
+            # Clean up temp file
+            temp_path.unlink()
+
+            # Verify we can still load
+            loaded = list(store.load_work_items())
+            assert len(loaded) == 1
+
+    def test_append_mode_is_crash_safe(self):
+        """Test that append mode is crash-safe at line level."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            store = JSONLStore(Path(tmpdir))
+
+            # Add initial work
+            work = WorkItem(
+                work_id="work1",
+                file_path="/test1.txt",
+                file_size=100,
+                chapter_count=1,
+            )
+            store.save_work_item(work)
+
+            # Add another work
+            work2 = WorkItem(
+                work_id="work2",
+                file_path="/test2.txt",
+                file_size=200,
+                chapter_count=2,
+            )
+            store.save_work_item(work2)
+
+            # Both should be readable
+            works = list(store.load_work_items())
+            assert len(works) == 2