初始版本

2026-04-23 16:30:57 +08:00
commit 0d0683a6e6
538 changed files with 113042 additions and 0 deletions
--- a/public/docs/rag.md
+++ b/public/docs/rag.md
@@ -0,0 +1,269 @@
+# RAG 知识库搭建
+
+> 上传文档、向量化存储，构建企业级知识库问答系统。
+
+## 🧠 什么是 RAG？
+
+RAG（Retrieval-Augmented Generation，检索增强生成）是一种结合「知识检索」和「AI 生成」的技术：
+
+1. **检索**：从知识库中找到相关内容
+2. **增强**：将检索结果作为上下文
+3. **生成**：让 AI 基于上下文生成答案
+
+```
+用户问题 → 检索相关文档 → 拼接到 Prompt → AI 生成回答
+```
+
+## 📋 适用场景
+
+- 📄 企业内部文档问答
+- 🎯 产品 FAQ 自动化
+- 📚 培训资料检索
+- 🔍 合同/政策查询
+- 🏥 专业知识库
+
+## 🚀 搭建步骤
+
+### 第一步：创建知识库
+
+```typescript
+const knowledgeBase = await client.ai.createKnowledgeBase({
+  name: '产品使用手册',
+  description: '公司产品相关文档',
+  // 向量化模型
+  embeddingModel: 'text-embedding-3-small',
+  // 分块策略
+  chunking: {
+    type: 'recursive',
+    chunkSize: 1000,
+    chunkOverlap: 200
+  }
+})
+
+console.log('知识库 ID:', knowledgeBase.id)
+```
+
+### 第二步：上传文档
+
+支持格式：PDF、Word、TXT、Markdown、HTML
+
+```typescript
+// 上传单个文件
+const doc = await client.ai.uploadDocument(knowledgeBase.id, {
+  file: './docs/user-guide.pdf',
+  metadata: {
+    category: 'manual',
+    version: '2.0',
+    language: 'zh-CN'
+  }
+})
+
+console.log('文档 ID:', doc.id)
+console.log('状态:', doc.status)  // processing → ready
+
+// 批量上传
+const docs = await client.ai.uploadDocuments(knowledgeBase.id, [
+  { file: './docs/faq.md', metadata: { category: 'faq' } },
+  { file: './docs/api.md', metadata: { category: 'api' } },
+  { file: './docs/tutorial.md', metadata: { category: 'tutorial' } }
+])
+```
+
+### 第三步：等待处理
+
+文档需要经过：
+1. **解析**：提取文本内容
+2. **分块**：按策略切分成小块
+3. **向量化**：转成数学向量
+
+```typescript
+// 查询文档状态
+const status = await client.ai.getDocumentStatus(doc.id)
+console.log('处理状态:', status.processing_status)
+console.log('块数量:', status.chunk_count)
+```
+
+### 第四步：配置检索
+
+```typescript
+await client.ai.configureRetrieval(knowledgeBase.id, {
+  // 检索参数
+  retrieval: {
+    topK: 5,              // 返回前 5 条
+    scoreThreshold: 0.7,  // 相似度阈值
+    hybridSearch: true    // 混合搜索（关键词+向量）
+  },
+  // Rerank 配置
+  rerank: {
+    enabled: true,
+    model: 'bge-reranker'
+  }
+})
+```
+
+### 第五步：问答
+
+```typescript
+const response = await client.ai.ask({
+  knowledgeBaseId: knowledgeBase.id,
+  question: '如何创建新项目？',
+  // 可选参数
+  options: {
+    maxTokens: 1000,
+    temperature: 0.7,
+    includeSources: true  // 返回引用的文档片段
+  }
+})
+
+console.log('回答:', response.answer)
+console.log('引用:', response.citations)
+```
+
+### 响应示例
+
+```json
+{
+  "answer": "创建新项目的步骤如下：\n1. 点击「新建项目」按钮\n2. 填写项目名称和描述\n3. 选择项目模板\n4. 点击「创建」完成",
+  "citations": [
+    {
+      "document_id": "doc_abc123",
+      "chunk_text": "点击「新建项目」按钮，进入项目创建页面...",
+      "score": 0.95
+    }
+  ],
+  "model": "gpt-4",
+  "tokens_used": 850
+}
+```
+
+## 🔧 高级配置
+
+### 自定义分块策略
+
+```typescript
+const kb = await client.ai.createKnowledgeBase({
+  name: '技术文档',
+  chunking: {
+    type: 'custom',
+    // 按标题分块
+    delimiters: ['# ', '## ', '### '],
+    maxTokens: 500
+  }
+})
+```
+
+### 元数据过滤
+
+```typescript
+const response = await client.ai.ask({
+  knowledgeBaseId: knowledgeBase.id,
+  question: '退款政策是什么？',
+  filters: {
+    category: 'policy',
+    version: '>=2.0'
+  }
+})
+```
+
+### 多知识库查询
+
+```typescript
+const response = await client.ai.ask({
+  knowledgeBaseIds: ['kb-product', 'kb-faq', 'kb-manual'],
+  question: '这个功能如何使用？',
+  // 权重配置
+  weights: {
+    'kb-product': 0.5,
+    'kb-faq': 0.3,
+    'kb-manual': 0.2
+  }
+})
+```
+
+## 📊 知识库管理
+
+### 查看统计
+
+```typescript
+const stats = await client.ai.getKnowledgeBaseStats(knowledgeBase.id)
+
+console.log('文档数:', stats.document_count)
+console.log('块数:', stats.chunk_count)
+console.log('总 token 数:', stats.total_tokens)
+```
+
+### 更新文档
+
+```typescript
+// 重新上传会自动更新
+await client.ai.updateDocument(doc.id, {
+  file: './docs/user-guide-v3.pdf'
+})
+```
+
+### 删除文档
+
+```typescript
+await client.ai.deleteDocument(doc.id)
+```
+
+## 🧪 最佳实践
+
+### 文档准备
+
+1. ✅ 清理格式，移除无关内容
+2. ✅ 添加目录和标题结构
+3. ✅ QA 格式文档效果最好
+4. ❌ 避免过长的无结构文本
+5. ❌ 避免大量表格（难以正确分块）
+
+### 检索优化
+
+```typescript
+// 调整检索参数
+const optimalConfig = {
+  topK: 3,               // 不要太多，可能引入噪音
+  scoreThreshold: 0.75,  // 设置合理阈值
+  enableRerank: true     // 启用重排序
+}
+```
+
+### Prompt 工程
+
+```typescript
+const response = await client.ai.ask({
+  knowledgeBaseId: knowledgeBase.id,
+  question: '...',
+  systemPrompt: `你是一个专业客服，请基于给定的知识库内容回答用户问题。
+  - 如果知识库中没有相关内容，请如实告知
+  - 回答要简洁、专业、易懂
+  - 适当引用原文帮助用户理解`
+})
+```
+
+## ❓ 常见问题
+
+### Q: 检索不到相关内容？
+
+1. 检查文档是否处理完成（状态为 `ready`）
+2. 降低 `scoreThreshold` 阈值
+3. 尝试 `hybridSearch: true` 混合搜索
+4. 增加 `topK` 获取更多候选
+
+### Q: 回答不准确？
+
+1. 检查源文档质量
+2. 调整 `chunkSize`，太小可能丢失上下文
+3. 启用 `rerank` 提高相关性
+4. 优化 system prompt
+
+### Q: 处理速度慢？
+
+- 小文档使用更快的 embedding 模型
+- 避开高峰期处理大批量文档
+- 使用异步处理（Webhooks 通知）
+
+---
+
+**上一步：** [REST API 完整参考](./api-reference.md)  
+**下一步：** [AI 工作流配置](./workflow.md)