---
stepsCompleted: [1, 2, 3, 4, 5]
inputDocuments: ['prd.md']
workflowType: 'architecture'
project_name: '223-236-template-6'
user_name: 'User'
date: '2026-03-13'
status: 'complete'
version: '1.0'
---

# Architecture Decision Document

_This document builds collaboratively through step-by-step discovery. Sections are appended as we work through each architectural decision together._

## Project Context Analysis

### Requirements Overview

**Functional Requirements:**

项目包含 52 个功能需求，分为六个核心模块：

1. **指纹模块 (FR1-FR8)**: 章节指纹查重，支持批量检测与人工审核
2. **清洗模块 (FR9-FR16)**: 正则替换规则引擎、格式标准化、HTML/Markdown 处理
3. **术语模块 (FR17-FR24)**: 术语库管理、智能提取、锁定标记 (§Ti§)
4. **翻译模块 (FR25-FR33)**: M2M100 模型推理、GPU/CPU 自适应、批处理优化
5. **上传模块 (FR34-FR40)**: 平台 API 对接、失败重试、CU 扣费
6. **任务调度器 (FR41-FR47)**: 流水线编排、并发控制、断点续传
7. **系统集成 (FR48-FR52)**: 配置管理、日志系统、版本检测

**Non-Functional Requirements:**

| 类别 | 关键要求 | 架构影响 |
|------|----------|----------|
| 性能 | 3000-5000 词/分钟 (RTX 3060) | 需要批处理优化、GPU 内存管理 |
| 可靠性 | Crash-Safe 原子写 | 所有持久化操作需使用 .tmp + fsync + rename 模式 |
| 安全性 | 零数据泄露 | 全流程本地处理，禁止数据外传 |
| 兼容性 | NVIDIA GTX 1650+ (4GB+ VRAM) | 需要优雅的 GPU 降级策略 |
| 许可证 | 零授权费依赖 | 所有依赖必须为标准库或 MIT 协议 |

**Scale & Complexity:**

- Primary domain: Desktop Application + AI Inference
- Complexity level: Medium
- Estimated architectural components: 7 major components

### Technical Constraints & Dependencies

**硬约束:**
- 必须使用 MIT 协议库（排除 GPL 污染）
- 必须 100% 本地处理（无云 API 调用）
- 必须支持 Crash-Safe 原子写

**外部依赖:**
- CTranslate2 (MIT): 模型推理引擎
- facebook/m2m100_418M: 翻译模型
- PyQt6: GUI 框架
- PyTorch (CUDA): GPU 加速

**集成接口:**
- 指纹查重 API: POST /api/fingerprint/check
- 平台上传 API: 章节提交接口
- CU 扣费 API: 按字数计费

### Cross-Cutting Concerns Identified

1. **Crash-Safe 持久化**: 影响所有写操作（进度、清洗结果、翻译结果）
2. **GPU 资源管理**: 翻译模块独占，需协调与其他模块的并发
3. **术语一致性**: 术语锁定机制需跨越清洗→翻译流程传递
4. **进度可见性**: 六个阶段进度需统一展示
5. **错误恢复**: 每个模块的失败处理与断点续传
6. **许可证合规性**: 所有新增依赖需验证许可证类型

## Starter Template Evaluation

### Primary Technology Domain

**Python Desktop Application** (PyQt6 + CTranslate2 GPU Inference)

基于项目需求分析，这是一个本地桌面应用，需要：
- GUI框架：PyQt6
- AI推理引擎：CTranslate2 (GPU加速)
- 系统集成：文件I/O、网络API调用

### Starter Options Considered

由于Python桌面应用领域没有统一的"启动模板"生态系统，我们评估了以下选项：

| 选项 | 优点 | 缺点 | 适用性 |
|------|------|------|--------|
| **从零构建** | 完全控制，无技术债 | 需要手动配置所有工具 | ✅ 推荐 - 特定需求较多 |
| **Python Boilerplate** | 标准结构，包含测试/代码质量 | 针对Web/服务端优化 | ⚠️ 部分适用 |
| **Cookiecutter模板** | 快速启动，最佳实践 | 需要定制化修改 | ⚠️ 部分适用 |

### Selected Approach: 自定义项目结构 (基于2025年最佳实践)

**Rationale for Selection:**

本项目有以下独特约束，标准模板无法满足：
1. **Crash-Safe 原子写机制**：需要在所有持久化点实现
2. **GPU 资源管理**：CTranslate2 需要特定配置
3. **零授权费约束**：需要严格验证所有依赖的许可证
4. **六模块流水线架构**：需要特定的模块划分

**项目初始化命令：**

```bash
# 1. 创建项目目录结构
mkdir -p xling-matrix-assistant/src/xling_matrix/{core,modules,ui,infrastructure}
mkdir -p xling-matrix-assistant/tests/{unit,integration}
mkdir -p xling-matrix-assistant/{data,models,logs,docs}

# 2. 创建虚拟环境
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# 3. 安装核心依赖
pip install PyQt6 ctranslate2 torch numpy requests pyyaml

# 4. 安装开发工具
pip install pytest pytest-qt pytest-cov black ruff mypy
```

**Architectural Decisions Established:**

**Language & Runtime:**
- Python 3.11+ (类型注解支持，性能优化)
- 类型检查：mypy (严格模式)
- 代码格式化：black
- 代码检查：ruff

**项目结构 (src layout):**

```
xling-matrix-assistant/
├── src/
│   └── xling_matrix/
│       ├── __init__.py
│       ├── __main__.py           # 应用入口点
│       ├── core/                 # 核心领域模型
│       │   ├── __init__.py
│       │   ├── models.py         # 数据模型
│       │   ├── state.py          # 状态机
│       │   └── pipeline.py       # 流水线编排
│       ├── modules/              # 六大核心模块
│       │   ├── __init__.py
│       │   ├── fingerprint/      # FR1-FR8
│       │   ├── cleaning/         # FR9-FR16
│       │   ├── terminology/      # FR17-FR24
│       │   ├── translation/      # FR25-FR33
│       │   └── upload/           # FR34-FR40
│       ├── ui/                   # PyQt6 GUI
│       │   ├── __init__.py
│       │   ├── main_window.py
│       │   ├── widgets/          # 自定义控件
│       │   └── dialogs/          # 对话框
│       └── infrastructure/       # 基础设施层
│           ├── __init__.py
│           ├── storage.py        # Crash-Safe 持久化
│           ├── gpu_manager.py    # GPU 资源管理
│           ├── api_client.py     # 外部 API 客户端
│           └── logger.py         # 日志系统
├── tests/
│   ├── unit/
│   └── integration/
├── models/                       # 翻译模型存储
├── data/                         # 用户数据目录
├── logs/                         # 日志目录
├── pyproject.toml                # 项目配置
├── pyproject.toml                # 打包配置
└── README.md
```

**Build Tooling & Packaging:**

```toml
# pyproject.toml
[project]
name = "xling-matrix-assistant"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
    "PyQt6>=6.6.0",
    "ctranslate2>=4.0.0",
    "torch>=2.1.0",
    "numpy>=1.24.0",
    "requests>=2.31.0",
    "pyyaml>=6.0.0",
]

[project.optional-dependencies]
dev = ["pytest>=7.4.0", "pytest-qt>=4.2.0", "pytest-cov>=4.1.0", "black>=23.12.0", "ruff>=0.1.0", "mypy>=1.7.0"]

[project.scripts]
xling-matrix = "xling_matrix.__main__:main"

[tool.black]
line-length = 100
target-version = ["py311"]

[tool.ruff]
line-length = 100
select = ["E", "F", "I", "N", "W"]

[tool.mypy]
python_version = "3.11"
strict = true
```

**Testing Framework:**
- pytest (单元测试)
- pytest-qt (PyQt6 测试工具)
- pytest-cov (覆盖率报告)

**Development Experience:**
- 虚拟环境隔离
- 类型检查 (mypy strict)
- 即时重载 (开发模式)
- 调试配置 (VS Code / PyCharm)

**GPU Inference Configuration (CTranslate2):**

```python
# 推荐配置
import ctranslate2

translator = ctranslate2.Translator(
    "models/m2m100_418m_ct2/",
    device="cuda",                    # GPU 加速
    device_index=0,                   # 主 GPU
    compute_type="float16",           # Tensor Core 优化
    inter_threads=4,                  # 并发批处理
)

# 批处理优化
batch_size = 16  # 根据显存调整 (RTX 3060: 16-32)
```

**Note:** 项目初始化应作为第一个实现故事执行。

## Core Architectural Decisions

### Decision Priority Analysis

**Critical Decisions (Block Implementation):**

1. **Crash-Safe 原子写机制**: 采用 .tmp + fsync + rename 模式，所有持久化操作必须遵循
2. **数据文件格式**: 使用 JSON 格式存储进度、清洗结果、翻译结果、术语库
3. **GPU 推理配置**: CTranslate2 + float16 + 批处理优化
4. **六模块流水线架构**: Fingerprint → Cleaning → Terminology → Translation → Upload

**Important Decisions (Shape Architecture):**

1. **PyQt6 ModelView 架构**: 使用 Qt Model/View 分离，实现数据驱动UI更新
2. **Repository 模式**: 抽象数据持久化层，统一 Crash-Safe 机制
3. **Observer 模式**: 进度事件通知机制，解耦业务逻辑与UI
4. **打包策略**: PyInstaller 打包为可执行文件

**Deferred Decisions (Post-MVP):**

1. **自动更新机制**: Growth 阶段功能，使用第三方库 (如 PyUpdater)
2. **插件系统**: Vision 阶段功能，允许扩展自定义模块
3. **云同步**: Vision 阶段功能，可选的云端备份

### Data Architecture

**数据存储策略:**

| 数据文件 | 格式 | 访问模式 | Crash-Safe 实现 |
|---------|------|-----------|----------------|
| progress.json | JSON | 读写频繁 | 原子替换 + 锁机制 |
| novel_cleaned.json | JSON | 写入一次 | 原子写入 |
| terms_temp.json | JSON | 读写频繁 | 原子替换 + 锁机制 |
| novel_translated.json | JSON | 写入一次 | 原子写入 |
| upload_failed.jsonl | JSONL | 追加写入 | 原子追加 + 锁机制 |
| terms_library.json | JSON | 读写频繁 | 原子替换 + 锁机制 |

**数据验证策略:**

- **Pydantic 模型**: 定义数据模型的类型约束
- **运行时验证**: 所有外部输入必须经过验证
- **Schema 迁移**: 版本化数据格式，支持自动升级

**Crash-Safe 持久化实现:**

```python
# infrastructure/storage.py
import os
import fcntl

class AtomicWriter:
    """Crash-Safe 原子写工具"""

    @staticmethod
    def write(filepath: str, data: dict | str) -> None:
        tmp_path = f"{filepath}.tmp"

        # 写入临时文件
        with open(tmp_path, 'w', encoding='utf-8') as f:
            if isinstance(data, dict):
                json.dump(data, f, ensure_ascii=False, indent=2)
            else:
                f.write(data)
            f.flush()  # 强制写入磁盘
            os.fsync(f.fileno())  # 强制同步

        # 原子重命名
        os.replace(tmp_path, filepath)
```

### Authentication & Security

**不适用**: 本地桌面应用，无需认证/授权机制

**数据安全:**
- 所有数据 100% 本地存储
- 禁止任何网络数据上传（除平台API上传外）
- GPU 模型本地推理，无云端API调用

### API & Communication Patterns

**外部 API 集成:**

```python
# infrastructure/api_client.py
import requests
from typing import Dict, Optional

class PlatformAPIClient:
    """平台 API 客户端"""

    def __init__(self, base_url: str, api_key: str):
        self.base_url = base_url
        self.api_key = api_key
        self.timeout = 30  # 30秒超时

    def check_fingerprint(self, text: str) -> Dict:
        """指纹查重 API"""
        response = requests.post(
            f"{self.base_url}/api/fingerprint/check",
            json={"text": text},
            headers={"Authorization": f"Bearer {self.api_key}"},
            timeout=self.timeout
        )
        response.raise_for_status()
        return response.json()

    def upload_chapter(self, chapter_data: Dict) -> Dict:
        """章节上传 API"""
        response = requests.post(
            f"{self.base_url}/api/chapters",
            json=chapter_data,
            headers={"Authorization": f"Bearer {self.api_key}"},
            timeout=self.timeout
        )
        response.raise_for_status()
        return response.json()

    def deduct_cu(self, word_count: int) -> Dict:
        """CU 扣费 API"""
        response = requests.post(
            f"{self.base_url}/api/cu/deduct",
            json={"words": word_count},
            headers={"Authorization": f"Bearer {self.api_key}"},
            timeout=self.timeout
        )
        response.raise_for_status()
        return response.json()
```

**重试策略:**
- 指数退避重试
- 最大重试次数：3次
- 超时配置：30秒

### Frontend Architecture

**PyQt6 ModelView 架构:**

```python
# ui/models/task_model.py
from PyQt6.QtCore import QAbstractTableModel, Qt

class TaskModel(QAbstractTableModel):
    """任务数据模型"""

    def __init__(self):
        super().__init__()
        self._tasks = []

    def rowCount(self, parent=None):
        return len(self._tasks)

    def columnCount(self, parent=None):
        return 5  # work_id, status, progress, start_time, end_time

    def data(self, index, role=Qt.ItemDataRole.DisplayRole):
        if not index.isValid() or role != Qt.ItemDataRole.DisplayRole:
            return None
        return self._tasks[index.row()][index.column()]

    def update_task(self, work_id: str, status: str, progress: int):
        """更新任务状态"""
        row = self._find_row(work_id)
        if row is not None:
            self._tasks[row]['status'] = status
            self._tasks[row]['progress'] = progress
            self.dataChanged.emit(self.index(row, 0), self.index(row, 4))

# ui/main_window.py
from PyQt6.QtWidgets import QMainWindow, QTableView
from ui.models.task_model import TaskModel

class MainWindow(QMainWindow):
    def __init__(self):
        super().__init__()
        self.task_model = TaskModel()

        self.task_table = QTableView()
        self.task_table.setModel(self.task_model)
```

**进度通知机制 (Observer 模式):**

```python
# core/events.py
from PyQt6.QtCore import QObject, pyqtSignal

class ProgressEmitter(QObject):
    """进度事件发射器"""

    stage_progress = pyqtSignal(str, int)  # (work_id, percentage)
    stage_completed = pyqtSignal(str, str)    # (work_id, stage_name)
    stage_failed = pyqtSignal(str, str, str)  # (work_id, stage_name, error)
    task_finished = pyqtSignal(str)           # (work_id)

# modules/cleaning/cleaner.py
from core.events import ProgressEmitter

class TextCleaner:
    def __init__(self, emitter: ProgressEmitter):
        self.emitter = emitter

    def clean(self, text: str, work_id: str) -> str:
        # 执行清洗
        cleaned = self._apply_rules(text)

        # 发送进度通知
        self.emitter.stage_progress.emit(work_id, 100)
        self.emitter.stage_completed.emit(work_id, "cleaning")

        return cleaned
```

### Infrastructure & Deployment

**打包策略:**

```python
# pyproject.toml
[build-system]
requires = ["setuptools>=68.0", "wheel", "pyinstaller>=6.0"]
build-backend = "setuptools.build_meta"

[tool.pyinstaller]
name = "序灵Matrix助手"
console = true
onefile = true
icon = "assets/icon.ico"
add-data = [
    ("models/*", "models/"),
    ("assets/*", "assets/")
]
hiddenimports = [
    "PyQt6.sip",
    "ctranslate2"
]
```

**环境配置:**

| 配置项 | 位置 | 说明 |
|--------|------|------|
| 配置文件 | `~/.config/xling-matrix/config.yaml` | API密钥、GPU设置 |
| 数据目录 | `~/Documents/xling-matrix/` | 输入/输出文件 |
| 日志目录 | `~/Documents/xling-matrix/logs/` | 运行日志 |
| 模型目录 | `~/.local/share/xling-matrix/models/` | 翻译模型 |

**日志系统:**

```python
# infrastructure/logger.py
import logging
from pathlib import Path

def setup_logger(name: str, log_dir: Path) -> logging.Logger:
    logger = logging.getLogger(name)
    logger.setLevel(logging.INFO)

    # 文件处理器
    file_handler = logging.FileHandler(
        log_dir / f"{name}.log",
        encoding='utf-8'
    )
    file_handler.setFormatter(
        logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
    )
    logger.addHandler(file_handler)

    # 控制台处理器
    console_handler = logging.StreamHandler()
    console_handler.setFormatter(
        logging.Formatter('%(levelname)s: %(message)s')
    )
    logger.addHandler(console_handler)

    return logger
```

### Decision Impact Analysis

**Implementation Sequence:**

1. Crash-Safe 持久化层 → 所有模块的基础
2. PyQt6 ModelView 架构 → UI 层的基础
3. 六个核心模块 → 业务逻辑实现
4. GPU 推理优化 → 性能优化
5. API 集成与上传 → 外部对接

**Cross-Component Dependencies:**

```
                    ┌─────────────┐
                    │   GUI UI   │
                    └──────┬──────┘
                           │ Observer
                    ┌──────▼──────┐
                    │  Scheduler   │
                    └──────┬──────┘
                           │
        ┌──────────────────┼──────────────────┐
        │                  │                  │
   ┌────▼────┐     ┌────▼────┐     ┌────▼────┐
   │Fingerprint│    │Cleaning  │    │Translation│
   └────┬────┘     └────┬────┘     └────┬────┘
        │                  │                  │
        └────────┬────────┴──────────────────┘
                 │
         ┌───────▼────────┐
         │  Storage Layer  │
         │ (Crash-Safe)  │
         └─────────────────┘
```

## Implementation Patterns & Consistency Rules

### Pattern Categories Defined

**Critical Conflict Points Identified:**
8 个领域需要一致性规则以确保 AI 代理代码兼容

### Core Design Patterns

**1. Pipeline 模式（翻译流水线）**

所有翻译任务必须通过统一的 Pipeline 执行：

```python
# core/pipeline.py
from dataclasses import dataclass
from typing import Protocol

@dataclass
class PipelineContext:
    """流水线上下文"""
    work_id: str
    input_file: str
    output_dir: str
    current_stage: str
    error: str | None = None
    metadata: dict = None

class PipelineStage(Protocol):
    """流水线阶段协议"""

    def name(self) -> str:
        """返回阶段名称"""
        ...

    def execute(self, context: PipelineContext) -> PipelineContext:
        """执行阶段逻辑"""
        ...

class TranslationPipeline:
    """翻译流水线"""

    def __init__(self):
        self.stages: list[PipelineStage] = []

    def add_stage(self, stage: PipelineStage) -> None:
        self.stages.append(stage)

    def execute(self, context: PipelineContext) -> PipelineContext:
        for stage in self.stages:
            context.current_stage = stage.name()
            try:
                context = stage.execute(context)
                if context.error:
                    context.error = f"{stage.name()}: {context.error}"
                    return context
            except Exception as e:
                context.error = f"{stage.name()}: {str(e)}"
                return context
        return context
```

**2. State Machine（任务状态）**

任务状态转换必须遵循状态机规则：

```python
# core/state.py
from enum import Enum
from dataclasses import dataclass

class TaskState(Enum):
    """任务状态枚举"""
    PENDING = "pending"
    RUNNING = "running"
    PAUSED = "paused"
    SUCCESS = "success"
    FAILED = "failed"

@dataclass
class TaskTransition:
    """状态转换"""
    from_state: TaskState
    to_state: TaskState
    is_valid: bool
    error: str | None = None

class TaskStateMachine:
    """任务状态机"""

    # 允许的状态转换
    VALID_TRANSITIONS = {
        TaskState.PENDING: [TaskState.RUNNING],
        TaskState.RUNNING: [TaskState.PAUSED, TaskState.SUCCESS, TaskState.FAILED],
        TaskState.PAUSED: [TaskState.RUNNING, TaskState.FAILED],
        TaskState.SUCCESS: [],  # 终态
        TaskState.FAILED: [TaskState.PENDING],  # 可重试
    }

    def can_transition(self, from_state: TaskState, to_state: TaskState) -> bool:
        return to_state in self.VALID_TRANSITIONS.get(from_state, [])

    def transition(self, current: TaskState, target: TaskState) -> TaskTransition:
        if not self.can_transition(current, target):
            valid_targets = ", ".join(s.value for s in self.VALID_TRANSITIONS.get(current, []))
            return TaskTransition(current, target, False,
                f"Invalid transition: {current.value} -> {target.value}. Valid targets: {valid_targets}")
        return TaskTransition(current, target, True)
```

**3. Repository 模式（数据持久化）**

所有数据访问必须通过 Repository 接口：

```python
# core/repository.py
from abc import ABC, abstractmethod
from typing import TypeVar, Generic

T = TypeVar('T')

class Repository(ABC, Generic[T]):
    """Repository 接口"""

    @abstractmethod
    def save(self, entity: T) -> None:
        """保存实体"""
        pass

    @abstractmethod
    def load(self, id: str) -> T | None:
        """加载实体"""
        pass

# infrastructure/repositories/progress_repository.py
from core.repository import Repository
from infrastructure.storage import AtomicWriter

class CrashSafeProgressRepository(Repository[Progress]):
    """Crash-Safe 进度仓储"""

    def __init__(self, file_path: str):
        self.file_path = file_path

    def save(self, progress: Progress) -> None:
        AtomicWriter.write(self.file_path, progress.to_dict())

    def load(self, work_id: str) -> Progress | None:
        if not os.path.exists(self.file_path):
            return None
        with open(self.file_path, 'r', encoding='utf-8') as f:
            data = json.load(f)
            return Progress.from_dict(data.get(work_id))
```

**4. Observer 模式（进度通知）**

使用 PyQt6 信号槽机制实现进度通知：

```python
# core/events.py
from PyQt6.QtCore import QObject, pyqtSignal
from typing import Protocol

class ProgressObserver(Protocol):
    """进度观察者协议"""

    def on_stage_start(self, work_id: str, stage: str) -> None:
        """阶段开始"""
        ...

    def on_stage_progress(self, work_id: str, stage: str, percent: int) -> None:
        """阶段进度"""
        ...

    def on_stage_complete(self, work_id: str, stage: str) -> None:
        """阶段完成"""
        ...

    def on_stage_error(self, work_id: str, stage: str, error: str) -> None:
        """阶段错误"""
        ...

class ProgressEmitter(QObject):
    """进度事件发射器"""

    # 定义信号
    stage_started = pyqtSignal(str, str)      # (work_id, stage)
    stage_progress = pyqtSignal(str, str, int)  # (work_id, stage, percent)
    stage_completed = pyqtSignal(str, str)    # (work_id, stage)
    stage_failed = pyqtSignal(str, str, str)  # (work_id, stage, error)
    task_finished = pyqtSignal(str, str)      # (work_id, final_state)

# 使用示例
class TranslationStage:
    def __init__(self, emitter: ProgressEmitter):
        self.emitter = emitter

    def execute(self, context: PipelineContext) -> PipelineContext:
        self.emitter.stage_started.emit(context.work_id, "translation")

        try:
            for i, batch in enumerate(batches):
                # 执行翻译
                self._translate_batch(batch)
                progress = int((i + 1) / len(batches) * 100)
                self.emitter.stage_progress.emit(context.work_id, "translation", progress)

            self.emitter.stage_completed.emit(context.work_id, "translation")
            return context
        except Exception as e:
            self.emitter.stage_failed.emit(context.work_id, "translation", str(e))
            context.error = str(e)
            return context
```

### Naming Patterns

**代码命名约定:**

| 类别 | 约定 | 示例 |
|------|------|------|
| 类名 | PascalCase | `TranslationPipeline`, `TaskStateMachine` |
| 函数名 | snake_case | `execute_pipeline()`, `load_progress()` |
| 变量名 | snake_case | `work_id`, `batch_size` |
| 常量 | UPPER_SNAKE_CASE | `MAX_BATCH_SIZE`, `DEFAULT_TIMEOUT` |
| 私有成员 | 前缀下划线 | `_internal_state`, `_helper()` |
| 协议/接口 | PascalCase + Protocol 后缀 | `ProgressObserver`, `Repository` |

**文件命名约定:**

| 类型 | 命名 | 示例 |
|------|------|------|
| 模块文件 | snake_case.py | `translation_stage.py`, `progress_repository.py` |
| 测试文件 | test_<module>.py | `test_pipeline.py`, `test_translation.py` |
| 包目录 | snake_case | `translation/`, `cleaning/` |

### Structure Patterns

**项目组织原则:**

```
src/xling_matrix/
├── core/              # 核心领域模型（无依赖）
│   ├── models.py      # 数据模型
│   ├── state.py       # 状态机
│   ├── pipeline.py    # 流水线
│   ├── events.py      # 事件系统
│   └── repository.py  # Repository 接口
│
├── modules/           # 业务模块（依赖 core）
│   └── <module>/
│       ├── __init__.py
│       ├── <module>_stage.py    # 阶段实现
│       ├── <module>_service.py  # 服务逻辑
│       └── models.py            # 模块特定模型
│
├── ui/                # UI 层（依赖 core）
│   ├── main_window.py
│   ├── widgets/
│   └── dialogs/
│
└── infrastructure/    # 基础设施（可依赖任何层）
    ├── storage/
    ├── gpu/
    ├── network/
    └── logging/
```

**测试组织原则:**

```
tests/
├── unit/              # 单元测试
│   ├── test_core/
│   │   ├── test_pipeline.py
│   │   ├── test_state.py
│   │   └── test_events.py
│   └── test_modules/
│       ├── test_translation.py
│       └── test_cleaning.py
│
├── integration/       # 集成测试
│   ├── test_workflow_integration.py
│   └── test_api_integration.py
│
└── fixtures/          # 测试数据
    ├── sample_novels/
    └── expected_outputs/
```

### Format Patterns

**数据文件格式:**

所有 JSON 文件必须遵循以下格式：

```python
# 通用 JSON 结构
{
    "version": "1.0",           # 数据版本
    "work_id": "uuid",          # 工作ID
    "timestamp": "ISO-8601",    # 时间戳
    "data": { ... }             # 实际数据
}
```

**进度文件格式 (progress.json):**

```json
{
    "version": "1.0",
    "work_id": "abc123",
    "state": "running",
    "current_stage": "translation",
    "stages": {
        "fingerprint": {"status": "success", "progress": 100},
        "cleaning": {"status": "success", "progress": 100},
        "terminology": {"status": "success", "progress": 100},
        "translation": {"status": "running", "progress": 45},
        "upload": {"status": "pending", "progress": 0}
    },
    "created_at": "2026-03-13T12:00:00Z",
    "updated_at": "2026-03-13T12:30:00Z"
}
```

**错误响应格式:**

```python
# 统一错误格式
@dataclass
class ErrorInfo:
    code: str           # 错误代码 (如 "STAGE_FAILED", "GPU_OOM")
    message: str        # 用户友好的错误消息
    detail: str | None  # 详细错误信息（日志级别）
    stage: str | None   # 失败的阶段

# 错误代码规范
class ErrorCode:
    # 通用错误
    UNKNOWN_ERROR = "UNKNOWN_ERROR"
    INVALID_INPUT = "INVALID_INPUT"
    FILE_NOT_FOUND = "FILE_NOT_FOUND"

    # 阶段错误
    FINGERPRINT_FAILED = "FINGERPRINT_FAILED"
    CLEANING_FAILED = "CLEANING_FAILED"
    TERMINOLOGY_FAILED = "TERMINOLOGY_FAILED"
    TRANSLATION_FAILED = "TRANSLATION_FAILED"
    UPLOAD_FAILED = "UPLOAD_FAILED"

    # GPU 错误
    GPU_NOT_AVAILABLE = "GPU_NOT_AVAILABLE"
    GPU_OOM = "GPU_OOM"

    # 网络错误
    API_CONNECTION_FAILED = "API_CONNECTION_FAILED"
    API_TIMEOUT = "API_TIMEOUT"
```

### Communication Patterns

**事件命名约定:**

```python
# 事件命名格式: <subject>_<action>
class Events:
    # 阶段事件
    STAGE_STARTED = "stage.started"
    STAGE_PROGRESS = "stage.progress"
    STAGE_COMPLETED = "stage.completed"
    STAGE_FAILED = "stage.failed"

    # 任务事件
    TASK_CREATED = "task.created"
    TASK_STARTED = "task.started"
    TASK_PAUSED = "task.paused"
    TASK_RESUMED = "task.resumed"
    TASK_FINISHED = "task.finished"
```

**日志级别使用:**

| 级别 | 用途 | 示例 |
|------|------|------|
| DEBUG | 详细调试信息 | `"Batch size: 16, GPU memory: 3.2GB"` |
| INFO | 正常操作流程 | `"Stage 'translation' started for work_id: abc123"` |
| WARNING | 可恢复的问题 | `"GPU memory low, reducing batch size to 8"` |
| ERROR | 操作失败但可恢复 | `"API request failed, retrying (1/3)"` |
| CRITICAL | 严重错误需人工介入 | `"GPU OOM, cannot continue"` |

### Process Patterns

**Crash-Safe 写入模式（强制执行）:**

```python
# 所有持久化操作必须使用此模式
from infrastructure.storage import AtomicWriter

# 正确示例
def save_progress(progress: Progress) -> None:
    AtomicWriter.write("progress.json", progress.to_dict())

# 错误示例 - 禁止直接写入
def save_progress_WRONG(progress: Progress) -> None:
    with open("progress.json", "w") as f:
        json.dump(progress.to_dict(), f)  # ❌ 非 Crash-Safe
```

**错误处理模式:**

```python
# 统一错误处理流程
def execute_stage(context: PipelineContext) -> PipelineContext:
    try:
        # 业务逻辑
        result = do_work(context)
        return context
    except GPUOutOfMemoryError as e:
        # 特定错误处理
        return handle_gpu_oom(context, e)
    except APIError as e:
        # 重试逻辑
        return retry_with_backoff(context, e)
    except Exception as e:
        # 通用错误处理
        context.error = str(e)
        logger.error(f"Stage failed: {e}", exc_info=True)
        return context
```

**GPU 资源管理模式:**

```python
# infrastructure/gpu/manager.py
import ctranslate2
from typing import ContextManager

class GPUManager:
    """GPU 资源管理器"""

    _instance = None
    _translator = None

    @classmethod
    def get_instance(cls) -> 'GPUManager':
        if cls._instance is None:
            cls._instance = cls()
        return cls._instance

    def initialize(self, model_path: str) -> None:
        """初始化 GPU 翻译器"""
        if self._translator is None:
            self._translator = ctranslate2.Translator(
                model_path,
                device=self._detect_device(),
                device_index=0,
                compute_type="float16",
                inter_threads=4
            )

    def _detect_device(self) -> str:
        """检测可用设备"""
        try:
            import torch
            if torch.cuda.is_available():
                return "cuda"
        except:
            pass
        return "cpu"  # 降级到 CPU

    def translate_batch(self, tokens: list[list[str]]) -> list[list[str]]:
        """执行批处理翻译"""
        return self._translator.translate_batch(tokens)
```

### Enforcement Guidelines

**All AI Agents MUST:**

1. **使用 Crash-Safe 写入**: 所有持久化操作必须通过 `AtomicWriter`
2. **遵循状态机规则**: 状态转换必须通过 `TaskStateMachine` 验证
3. **使用 Repository 接口**: 数据访问必须实现 `Repository` 协议
4. **通过信号通知进度**: 使用 `ProgressEmitter` 发送进度事件
5. **遵循命名约定**: 代码命名必须符合定义的约定
6. **返回统一错误格式**: 所有错误必须返回 `ErrorInfo` 结构

## Complete Project Structure

### Directory Layout

```
xling-matrix-assistant/
├── src/
│   └── xling_matrix/
│       ├── __init__.py
│       ├── __main__.py
│       │
│       ├── core/                          # 核心领域层
│       │   ├── __init__.py
│       │   ├── models.py                  # 数据模型定义
│       │   ├── state.py                   # 状态机实现
│       │   ├── pipeline.py                # 流水线编排
│       │   ├── events.py                  # 事件系统
│       │   └── repository.py              # Repository 接口
│       │
│       ├── modules/                       # 业务模块层
│       │   ├── __init__.py
│       │   │
│       │   ├── fingerprint/               # 指纹模块 (FR1-FR8)
│       │   │   ├── __init__.py
│       │   │   ├── fingerprint_stage.py   # 指纹查重阶段
│       │   │   ├── fingerprint_service.py # 指纹服务
│       │   │   └── models.py              # 指纹数据模型
│       │   │
│       │   ├── cleaning/                  # 清洗模块 (FR9-FR16)
│       │   │   ├── __init__.py
│       │   │   ├── cleaning_stage.py
│       │   │   ├── rule_engine.py         # 正则替换引擎
│       │   │   ├── formatter.py           # 格式标准化
│       │   │   └── models.py
│       │   │
│       │   ├── terminology/               # 术语模块 (FR17-FR24)
│       │   │   ├── __init__.py
│       │   │   ├── terminology_stage.py
│       │   │   ├── extractor.py           # 术语提取器
│       │   │   ├── library.py             # 术语库管理
│       │   │   └── models.py
│       │   │
│       │   ├── translation/               # 翻译模块 (FR25-FR33)
│       │   │   ├── __init__.py
│       │   │   ├── translation_stage.py
│       │   │   ├── translator.py          # CTranslate2 封装
│       │   │   ├── batch_processor.py     # 批处理优化
│       │   │   └── models.py
│       │   │
│       │   └── upload/                    # 上传模块 (FR34-FR40)
│       │       ├── __init__.py
│       │       ├── upload_stage.py
│       │       ├── uploader.py            # 平台上传
│       │       └── models.py
│       │
│       ├── ui/                            # 表示层
│       │   ├── __init__.py
│       │   ├── main_window.py             # 主窗口
│       │   ├── widgets/
│       │   │   ├── __init__.py
│       │   │   ├── task_list_widget.py    # 任务列表
│       │   │   ├── progress_widget.py     # 进度显示
│       │   │   └── log_widget.py          # 日志显示
│       │   ├── dialogs/
│       │   │   ├── __init__.py
│       │   │   ├── new_task_dialog.py     # 新建任务对话框
│       │   │   ├── settings_dialog.py     # 设置对话框
│       │   │   └── fingerprint_dialog.py  # 指纹审核对话框
│       │   └── models/
│       │       ├── __init__.py
│       │       └── task_model.py          # 任务数据模型
│       │
│       └── infrastructure/                # 基础设施层
│           ├── __init__.py
│           ├── storage/
│           │   ├── __init__.py
│           │   ├── atomic_writer.py       # Crash-Safe 写入
│           │   └── file_lock.py           # 文件锁机制
│           ├── gpu/
│           │   ├── __init__.py
│           │   └── manager.py             # GPU 资源管理
│           ├── network/
│           │   ├── __init__.py
│           │   ├── api_client.py          # 平台 API 客户端
│           │   └── retry.py               # 重试策略
│           └── logging/
│               ├── __init__.py
│               └── logger.py              # 日志配置
│
├── tests/
│   ├── __init__.py
│   ├── conftest.py                        # pytest 配置
│   │
│   ├── unit/
│   │   ├── test_core/
│   │   │   ├── __init__.py
│   │   │   ├── test_pipeline.py
│   │   │   ├── test_state.py
│   │   │   └── test_events.py
│   │   ├── test_modules/
│   │   │   ├── test_fingerprint.py
│   │   │   ├── test_cleaning.py
│   │   │   ├── test_terminology.py
│   │   │   ├── test_translation.py
│   │   │   └── test_upload.py
│   │   └── test_infrastructure/
│   │       ├── test_storage.py
│   │       ├── test_gpu_manager.py
│   │       └── test_api_client.py
│   │
│   ├── integration/
│   │   ├── __init__.py
│   │   ├── test_workflow.py               # 完整工作流测试
│   │   └── test_api_integration.py
│   │
│   └── fixtures/
│       ├── novels/
│       │   └── sample_chinese.txt
│       └── expected/
│           └── sample_translated.json
│
├── models/                                # 翻译模型文件
│   └── m2m100_418m_ct2/
│
├── assets/                                # 资源文件
│   ├── icons/
│   │   └── app_icon.ico
│   └── config/
│       └── default_config.yaml
│
├── docs/
│   ├── architecture.md                    # 架构文档
│   ├── api.md                             # API 文档
│   └── user_guide.md                      # 用户指南
│
├── pyproject.toml                         # 项目配置
├── README.md
├── LICENSE
└── .gitignore
```

### Module Dependencies

```
┌─────────────────────────────────────────────────────────────┐
│                        UI Layer                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │ MainWindow   │  │   Widgets    │  │   Dialogs    │      │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘      │
└─────────┼──────────────────┼──────────────────┼─────────────┘
          │                  │                  │
          ▼                  ▼                  ▼
┌─────────────────────────────────────────────────────────────┐
│                     Application Layer                        │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │  Scheduler   │  │  Workflows   │  │ State Machine│      │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘      │
└─────────┼──────────────────┼──────────────────┼─────────────┘
          │                  │                  │
          ▼                  ▼                  ▼
┌─────────────────────────────────────────────────────────────┐
│                      Domain Layer                            │
│  ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌─────────┐  │
│  │Fingerprint │ │  Cleaning  │ │Terminology │ │Translation│ │
│  └────┬───────┘ └────┬───────┘ └────┬───────┘ └────┬────┘  │
│       └────────────────┴────────────────┴────────┘         │
│                          │                                  │
│                    ┌─────▼─────┐                            │
│                    │   Core    │                            │
│                    │(Pipeline, │                            │
│                    │ Events,   │                            │
│                    │ Models)   │                            │
│                    └───────────┘                            │
└─────────────────────────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                  Infrastructure Layer                        │
│  ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌─────────┐  │
│  │  Storage   │ │     GPU    │ │  Network   │ │ Logging │  │
│  │(Crash-Safe)│ │  Manager   │ │  API Client│ │         │  │
│  └────────────┘ └────────────┘ └────────────┘ └─────────┘  │
└─────────────────────────────────────────────────────────────┘
```

## API Interface Design

### Internal Module APIs

**Pipeline Stage 接口:**

```python
# core/pipeline.py
class PipelineStage(Protocol):
    """流水线阶段协议 - 所有阶段必须实现"""

    def name(self) -> str:
        """返回阶段唯一标识符"""
        ...

    def execute(self, context: PipelineContext) -> PipelineContext:
        """执行阶段逻辑

        Args:
            context: 流水线上下文

        Returns:
            更新后的上下文。如果失败，设置 context.error
        """
        ...

    def estimate_progress(self, context: PipelineContext) -> int:
        """估算当前进度百分比"""
        ...
```

**Repository 接口:**

```python
# core/repository.py
class Repository(ABC, Generic[T]):
    """数据仓储接口 - 所有数据访问必须实现"""

    @abstractmethod
    def save(self, entity: T) -> None:
        """保存实体（使用 Crash-Safe 写入）"""
        pass

    @abstractmethod
    def load(self, id: str) -> T | None:
        """加载实体"""
        pass

    @abstractmethod
    def delete(self, id: str) -> bool:
        """删除实体"""
        pass
```

### External API Integration

**平台 API 客户端接口:**

```python
# infrastructure/network/platform_api.py
class PlatformAPIClient:
    """平台 API 客户端 - 对接外部平台"""

    BASE_URL: str
    API_KEY: str
    TIMEOUT: int = 30

    # 指纹查重 API
    def check_fingerprint(self, text: str) -> FingerprintResult:
        """检查文本指纹

        Args:
            text: 待检查的文本

        Returns:
            FingerprintResult: 包含相似度、匹配章节等

        Raises:
            APIConnectionError: 网络连接失败
            APITimeoutError: 请求超时
            APIError: API 返回错误
        """
        ...

    # 章节上传 API
    def upload_chapter(self, chapter: ChapterData) -> UploadResult:
        """上传翻译章节

        Args:
            chapter: 章节数据（标题、内容、字数等）

        Returns:
            UploadResult: 包含章节 ID、URL 等

        Raises:
            APIConnectionError, APITimeoutError, APIError
        """
        ...

    # CU 扣费 API
    def deduct_cu(self, word_count: int) -> DeductResult:
        """扣除 CU

        Args:
            word_count: 字数

        Returns:
            DeductResult: 包含剩余 CU
        """
        ...

    # 健康检查
    def health_check(self) -> bool:
        """检查 API 连接状态"""
        ...
```

**API 请求/响应格式:**

**1. 指纹查重 API**

```http
POST /api/fingerprint/check
Content-Type: application/json
Authorization: Bearer {api_key}

# Request
{
  "fingerprint": "md5hash",
  "sample": "第一章样本文本...",
  "work_id": "uuid"
}

# Response
{
  "exists": false,
  "work_id": "uuid",
  "similarity": 0.0,
  "matches": []
}
```

**2. 上传章节 API**

```http
POST /api/chapters
Content-Type: application/json
Authorization: Bearer {api_key}

# Request
{
  "work_id": "uuid",
  "chapter_id": "Chapter 0001",
  "title": "第一章 开始",
  "content_en": "Chapter 1 The Beginning...",
  "word_count": 1234,
  "source_language": "zh",
  "target_language": "en"
}

# Response
{
  "success": true,
  "chapter_id": "Chapter 0001",
  "chapter_url": "https://platform.com/novels/uuid/chapters/Chapter%200001",
  "uploaded_at": "2026-03-13T12:00:00Z"
}
```

**3. CU 扣费 API**

```http
POST /api/cu/deduct
Content-Type: application/json
Authorization: Bearer {api_key}

# Request
{
  "work_id": "uuid",
  "words": 1234,
  "chapter_id": "Chapter 0001"
}

# Response
{
  "success": true,
  "deducted": 12.34,
  "balance": 987.66,
  "transaction_id": "txn_abc123"
}
```

**错误响应格式:**

```http
# Error Response
{
  "error": {
    "code": "INVALID_API_KEY",
    "message": "API密钥无效或已过期",
    "detail": "请联系客服获取新的API密钥"
  }
}
```

## Data Model Design

### Core Data Models

```python
# core/models.py
from dataclasses import dataclass, field
from datetime import datetime
from typing import Literal
from enum import Enum

class TaskState(Enum):
    PENDING = "pending"
    RUNNING = "running"
    PAUSED = "paused"
    SUCCESS = "success"
    FAILED = "failed"

class StageStatus(Enum):
    PENDING = "pending"
    RUNNING = "running"
    SUCCESS = "success"
    FAILED = "failed"
    SKIPPED = "skipped"

@dataclass
class StageProgress:
    """阶段进度"""
    status: StageStatus
    progress: int  # 0-100
    error: str | None = None
    started_at: datetime | None = None
    completed_at: datetime | None = None

@dataclass
class Progress:
    """任务进度"""
    work_id: str
    state: TaskState
    current_stage: str
    stages: dict[str, StageProgress] = field(default_factory=dict)
    input_file: str = ""
    output_dir: str = ""
    created_at: datetime = field(default_factory=datetime.now)
    updated_at: datetime = field(default_factory=datetime.now)

    def to_dict(self) -> dict:
        """序列化为字典"""
        return {
            "version": "1.0",
            "work_id": self.work_id,
            "state": self.state.value,
            "current_stage": self.current_stage,
            "stages": {
                name: {
                    "status": stage.status.value,
                    "progress": stage.progress,
                    "error": stage.error,
                    "started_at": stage.started_at.isoformat() if stage.started_at else None,
                    "completed_at": stage.completed_at.isoformat() if stage.completed_at else None,
                }
                for name, stage in self.stages.items()
            },
            "input_file": self.input_file,
            "output_dir": self.output_dir,
            "created_at": self.created_at.isoformat(),
            "updated_at": self.updated_at.isoformat(),
        }

    @classmethod
    def from_dict(cls, data: dict) -> 'Progress':
        """从字典反序列化"""
        stages = {
            name: StageProgress(
                status=StageStatus(stage["status"]),
                progress=stage["progress"],
                error=stage.get("error"),
                started_at=datetime.fromisoformat(stage["started_at"]) if stage.get("started_at") else None,
                completed_at=datetime.fromisoformat(stage["completed_at"]) if stage.get("completed_at") else None,
            )
            for name, stage in data.get("stages", {}).items()
        }
        return cls(
            work_id=data["work_id"],
            state=TaskState(data["state"]),
            current_stage=data["current_stage"],
            stages=stages,
            input_file=data.get("input_file", ""),
            output_dir=data.get("output_dir", ""),
            created_at=datetime.fromisoformat(data["created_at"]),
            updated_at=datetime.fromisoformat(data["updated_at"]),
        )

@dataclass
class Term:
    """术语条目"""
    source: str           # 原文
    target: str           # 译文
    category: str = ""    # 分类
    locked: bool = False  # 是否锁定

@dataclass
class TerminologyLibrary:
    """术语库"""
    version: str = "1.0"
    terms: list[Term] = field(default_factory=list)

@dataclass
class ChapterData:
    """章节数据"""
    title: str
    content: str
    word_count: int
    source_language: str = "zh"
    target_language: str = "en"

@dataclass
class Chapter:
    """章节实体"""
    chapter_id: str        # "Chapter 0001"
    part_index: int        # 卷索引
    title_src: str         # 原文标题
    content: str           # 原文内容
    content_en: str | None = None  # 译文内容
    word_count: int = 0
    translated_at: datetime | None = None

@dataclass
class Term:
    """术语条目"""
    source: str              # 原文
    translation: str | None  # 译文
    count: int = 0           # 出现次数
    chapters: int = 0        # 涉及章节数
    locked: bool = False     # 是否锁定

    def to_dict(self) -> dict:
        return {
            "source": self.source,
            "translation": self.translation,
            "count": self.count,
            "chapters": self.chapters,
            "locked": self.locked
        }

    @classmethod
    def from_dict(cls, data: dict) -> 'Term':
        return cls(
            source=data["source"],
            translation=data.get("translation"),
            count=data.get("count", 0),
            chapters=data.get("chapters", 0),
            locked=data.get("locked", False)
        )
```

### Extended Data Models

**指纹数据模型:**

```python
@dataclass
class FingerprintData:
    """指纹查重数据"""
    work_id: str
    fingerprint: str       # MD5 hash
    sample: str            # 样本文本（前1000字）
    exists: bool = False
    similarity: float = 0.0
    matches: list[str] = field(default_factory=list)  # 匹配的 work_id 列表

@dataclass
class FingerprintResult:
    """指纹查重结果"""
    exists: bool
    work_id: str
    similarity: float
    matches: list[dict] = field(default_factory=list)
    # matches format: [{"work_id": "uuid", "similarity": 0.95, "chapter": "Chapter 0001"}]
```

**上传队列模型:**

```python
@dataclass
class UploadQueueItem:
    """上传队列项"""
    work_id: str
    chapter_id: str
    title: str
    content_en: str
    word_count: int
    retry_count: int = 0
    max_retries: int = 3
    created_at: datetime = field(default_factory=datetime.now)

@dataclass
class UploadFailedItem:
    """上传失败项（JSONL 格式）"""
    work_id: str
    chapter_id: str
    error_code: str
    error_message: str
    failed_at: datetime = field(default_factory=datetime.now)
    retry_count: int = 0
```

## Security Design

### Data Protection

**1. 本地数据存储策略:**

- 所有用户数据 100% 存储在本地
- 不上传任何原文到云端（除平台 API 上传翻译结果外）
- 配置文件（API 密钥）使用系统密钥环存储

**2. API 密钥管理:**

```python
# infrastructure/security/secret_manager.py
import keyring
from typing import Optional

class SecretManager:
    """密钥管理器 - 使用系统密钥环"""

    SERVICE_NAME = "xling-matrix-assistant"

    def set_api_key(self, key: str) -> None:
        """存储 API 密钥"""
        keyring.set_password(self.SERVICE_NAME, "platform_api", key)

    def get_api_key(self) -> Optional[str]:
        """获取 API 密钥"""
        return keyring.get_password(self.SERVICE_NAME, "platform_api")

    def delete_api_key(self) -> None:
        """删除 API 密钥"""
        keyring.delete_password(self.SERVICE_NAME, "platform_api")
```

**3. 文件权限控制:**

```python
# infrastructure/storage/permissions.py
import os
import stat

def set_secure_permissions(filepath: str) -> None:
    """设置安全的文件权限（仅用户可读写）"""
    os.chmod(filepath, stat.S_IRUSR | stat.S_IWUSR)
```

### License Compliance

**依赖许可证验证:**

```python
# tools/license_checker.py
import subprocess
import json

ALLOWED_LICENSES = {"MIT", "Apache-2.0", "BSD-3-Clause", "PSF-2.0"}
BANNED_LICENSES = {"GPL", "AGPL", "LGPL", "SSPL", "CPAL"}

def check_dependency_licenses() -> dict:
    """检查所有依赖的许可证"""
    result = subprocess.run(
        ["pip", "show", "--json"],
        capture_output=True,
        text=True
    )
    packages = json.loads(result.stdout)

    issues = []
    for pkg in packages:
        license_ = pkg.get("License", "UNKNOWN")
        if any(banned in license_ for banned in BANNED_LICENSES):
            issues.append({
                "package": pkg["Name"],
                "license": license_,
                "severity": "BLOCKING",
                "reason": "Contains GPL contamination"
            })
        elif license_ not in ALLOWED_LICENSES and license_ != "UNKNOWN":
            issues.append({
                "package": pkg["Name"],
                "license": license_,
                "severity": "WARNING",
                "reason": "License not in whitelist"
            })

    return {"valid": len(issues) == 0, "issues": issues}
```

**4. 许可证管理（Growth 阶段）:**

```python
# infrastructure/security/license_manager.py
import hashlib
import platform
import requests

class LicenseManager:
    """许可证管理器 - 硬件绑定与在线激活验证"""

    def generate_fingerprint(self) -> str:
        """生成硬件指纹（用于软件绑定）"""
        # 获取硬件信息
        machine_id = platform.node()
        cpu_info = platform.processor()
        mac_address = self._get_mac_address()

        # 组合生成指纹
        fingerprint_data = f"{machine_id}:{cpu_info}:{mac_address}"
        return hashlib.md5(fingerprint_data.encode()).hexdigest()

    def _get_mac_address(self) -> str:
        """获取本机 MAC 地址"""
        try:
            import uuid
            return ':'.join(['{:02x}'.format((uuid.getnode() >> elements) & 0xff)
                                   for elements in range(0, 2*6, 8)][::-1])
        except:
            return "unknown"

    def verify_activation(self, activation_key: str) -> bool:
        """在线验证激活密钥

        Args:
            activation_key: 用户输入的激活密钥

        Returns:
            bool: 激活是否有效
        """
        try:
            fingerprint = self.generate_fingerprint()
            response = requests.post(
                "https://license.xling-matrix.com/verify",
                json={
                    "activation_key": activation_key,
                    "fingerprint": fingerprint,
                    "version": "0.1.0"
                },
                timeout=10
            )
            response.raise_for_status()
            return response.json().get("valid", False)
        except Exception:
            return False

    def check_expiration(self, activation_key: str) -> dict | None:
        """检查激活是否过期"""
        try:
            response = requests.post(
                "https://license.xling-matrix.com/check",
                json={"activation_key": activation_key},
                timeout=10
            )
            response.raise_for_status()
            data = response.json()
            return {
                "expired": data.get("expired", False),
                "expires_at": data.get("expires_at"),
                "days_remaining": data.get("days_remaining")
            }
        except Exception:
            return None

    def activate_offline(self, activation_key: str, max_credits: int = 1000) -> bool:
        """离线激活（本地验证签名）"""
        # TODO: 实现离线激活逻辑（需要服务器生成签名密钥对）
        return True
```

**激活状态存储:**

```python
# infrastructure/storage/license_storage.py
import json
from pathlib import Path

class LicenseStorage:
    """激活状态存储"""

    ACTIVATION_FILE = Path.home() / ".config" / "xling-matrix" / "activation.json"

    def save_activation(self, activation_key: str, fingerprint: str) -> None:
        """保存激活信息"""
        data = {
            "activation_key": activation_key,
            "fingerprint": fingerprint,
            "activated_at": datetime.now().isoformat(),
            "version": "0.1.0"
        }
        self.ACTIVATION_FILE.parent.mkdir(parents=True, exist_ok=True)
        AtomicWriter.write(str(self.ACTIVATION_FILE), data)

    def load_activation(self) -> dict | None:
        """加载激活信息"""
        if not self.ACTIVATION_FILE.exists():
            return None
        with open(self.ACTIVATION_FILE, 'r', encoding='utf-8') as f:
            return json.load(f)

    def is_activated(self) -> bool:
        """检查是否已激活"""
        activation = self.load_activation()
        if not activation:
            return False
        # 检查硬件指纹是否匹配
        current_fingerprint = LicenseManager().generate_fingerprint()
        return activation.get("fingerprint") == current_fingerprint
```

**许可证验证流程:**

```
启动 → 检查本地激活 → [无激活] 显示激活对话框
                ↓
         [有激活] 检查硬件指纹 → [不匹配] 重新激活
                ↓
         [匹配] 检查过期 → [已过期] 提示续费
                ↓
         [有效] 验证 CU 余额 → [不足] 提示充值
                ↓
         [有效] 允许使用
```

## Performance Optimization

### GPU Optimization

**1. 批处理策略:**

```python
# modules/translation/batch_processor.py
class BatchProcessor:
    """批处理优化器"""

    def __init__(self, max_tokens: int = 4096):
        self.max_tokens = max_tokens

    def create_batches(self, texts: list[str]) -> list[list[str]]:
        """将文本分割为最优批次

        策略：
        1. 按 token 数量分组
        2. 每批接近 max_tokens 但不超过
        3. 相邻文本尽量在同一批（保持上下文）
        """
        batches = []
        current_batch = []
        current_tokens = 0

        for text in texts:
            tokens = self._count_tokens(text)
            if current_tokens + tokens > self.max_tokens and current_batch:
                batches.append(current_batch)
                current_batch = [text]
                current_tokens = tokens
            else:
                current_batch.append(text)
                current_tokens += tokens

        if current_batch:
            batches.append(current_batch)

        return batches

    def _count_tokens(self, text: str) -> int:
        """估算 token 数量（中文约 1.5 字符/token）"""
        return int(len(text) / 1.5)
```

**2. 动态批次大小调整:**

```python
# infrastructure/gpu/batch_optimizer.py
import torch

class BatchSizeOptimizer:
    """动态批次大小优化器"""

    def __init__(self, initial_size: int = 16):
        self.current_size = initial_size
        self.min_size = 4
        self.max_size = 32

    def adjust_for_memory(self, oom_occurred: bool) -> int:
        """根据显存使用情况调整批次大小"""
        if oom_occurred:
            self.current_size = max(self.min_size, self.current_size // 2)
        else:
            # 逐步增加以找到最优值
            self.current_size = min(self.max_size, int(self.current_size * 1.2))
        return self.current_size

    def get_memory_info(self) -> dict:
        """获取 GPU 显存信息"""
        if not torch.cuda.is_available():
            return {"available": False}
        return {
            "available": True,
            "total_gb": torch.cuda.get_device_properties(0).total_memory / 1e9,
            "allocated_gb": torch.cuda.memory_allocated(0) / 1e9,
            "free_gb": (torch.cuda.get_device_properties(0).total_memory -
                      torch.cuda.memory_allocated(0)) / 1e9,
        }
```

### I/O Optimization

**1. 增量进度保存:**

```python
# infrastructure/storage/incremental.py
class IncrementalProgressSaver:
    """增量进度保存器 - 减少磁盘写入"""

    def __init__(self, threshold: int = 5):
        self.threshold = threshold  # 进度变化超过 5% 才保存
        self.last_saved = 0

    def should_save(self, current_progress: int) -> bool:
        return abs(current_progress - self.last_saved) >= self.threshold

    def mark_saved(self, progress: int) -> None:
        self.last_saved = progress
```

**2. 文件读取优化:**

```python
# infrastructure/storage/chunked_reader.py
class ChunkedFileReader:
    """分块文件读取器 - 支持大文件"""

    def __init__(self, chunk_size: int = 8192):
        self.chunk_size = chunk_size

    def read_by_paragraphs(self, filepath: str) -> list[str]:
        """按段落读取文件（更适合小说）"""
        with open(filepath, 'r', encoding='utf-8') as f:
            content = f.read()
        # 按双换行符分割段落
        paragraphs = [p.strip() for p in content.split('\n\n') if p.strip()]
        return paragraphs
```

## Deployment Architecture

### Application Packaging

**1. PyInstaller 配置:**

```python
# build/pyinstaller_spec.py
import sys
from PyInstaller.utils.hooks import collect_data_files, collect_submodules

block_cipher = None

datas = [
    ('models', 'models'),
    ('assets', 'assets'),
]

hiddenimports = [
    'PyQt6.sip',
    'ctranslate2',
    'torch',
]

a = Analysis(
    ['src/xling_matrix/__main__.py'],
    pathex=[],
    binaries=[],
    datas=datas,
    hiddenimports=hiddenimports,
    hookspath=[],
    hooksconfig={},
    runtime_hooks=[],
    excludes=[],
    win_no_prefer_redirects=False,
    win_private_assemblies=False,
    cipher=block_cipher,
    noarchive=False,
)

pyz = PYZ(a.pure, a.zipped_data, cipher=block_cipher)

exe = EXE(
    pyz,
    a.scripts,
    a.binaries,
    a.zipfiles,
    a.datas,
    [],
    name='序灵Matrix助手',
    debug=False,
    bootloader_ignore_signals=False,
    strip=False,
    upx=True,
    upx_exclude=[],
    runtime_tmpdir=None,
    console=True,
    disable_windowed_traceback=False,
    argv_emulation=False,
    target_arch=None,
    codesign_identity=None,
    entitlements_file=None,
    icon='assets/icons/app_icon.ico',
)
```

**2. 安装程序配置:**

```
安装结构:
序灵Matrix助手/
├── 序灵Matrix助手.exe          # 主程序
├── models/                      # 翻译模型（首次运行时下载）
│   └── m2m100_418m_ct2/
├── configs/
│   └── default_config.yaml
└── README.txt

用户数据目录:
~/Documents/xling-matrix/        # Windows
~/Documents/xling-matrix/        # macOS
~/xling-matrix/                  # Linux
```

### Distribution Strategy

**1. 版本管理:**

```python
# core/version.py
__version__ = "0.1.0"
__build__ = "20260313"

def get_version() -> str:
    return f"{__version__}+{__build__}"
```

**2. 更新检查 (Growth 阶段):**

```python
# infrastructure/update/update_checker.py
import requests

class UpdateChecker:
    """更新检查器"""

    UPDATE_URL = "https://updates.xling-matrix.com/version.json"

    def check_for_updates(self, current_version: str) -> dict | None:
        """检查是否有新版本"""
        try:
            response = requests.get(self.UPDATE_URL, timeout=5)
            response.raise_for_status()
            data = response.json()

            if self._is_newer(current_version, data["latest_version"]):
                return {
                    "has_update": True,
                    "latest_version": data["latest_version"],
                    "download_url": data["download_url"],
                    "release_notes": data.get("release_notes", ""),
                }
        except Exception:
            pass
        return None

    def _is_newer(self, current: str, latest: str) -> bool:
        """比较版本号"""
        from packaging import version
        return version.parse(latest) > version.parse(current)
```

---

## Architecture Summary

本架构设计文档定义了序灵 Matrix 助手的完整技术架构，包括：

- **分层架构**: Presentation / Application / Domain / Infrastructure
- **核心设计模式**: Pipeline, State Machine, Repository, Observer
- **Crash-Safe 机制**: 原子写入确保数据安全
- **GPU 加速**: CTranslate2 + 动态批处理优化
- **六模块流水线**: Fingerprint → Cleaning → Terminology → Translation → Upload
- **本地优先**: 100% 本地处理，零数据泄露
- **零授权费**: 仅使用 MIT 协议依赖

本架构确保所有 AI 代理可以协同工作，编写一致、兼容的代码。

---

**文档版本**: 1.0
**最后更新**: 2026-03-13
**状态**: 完成 ✅