# API Authentication Source: https://asyncfunc.mintlify.app/endpoints/authentication Authentication and security configuration for the DeepWiki-Open API including authorization modes and access control The DeepWiki-Open API supports multiple authentication methods to secure your documentation generation service. This guide covers all authentication options, security configurations, and best practices. ## Authentication Overview DeepWiki-Open uses a multi-layered authentication approach: API provider keys configured via environment variables Optional secret code protection for wiki generation Personal access tokens for private repository access Input validation and rate limiting protection ## Environment-Based Authentication ### API Provider Authentication The API automatically validates configured AI provider credentials: Uses `GOOGLE_API_KEY` from environment automatically **Validation endpoint:** ```bash curl -X GET "http://localhost:8001/auth/validate/google" ``` **Response:** ```json { "provider": "google", "status": "valid", "models_available": [ "gemini-2.0-flash", "gemini-1.5-flash", "gemini-1.0-pro" ], "quota": { "requests_per_minute": 15, "tokens_per_minute": 32000, "usage_today": "12%" } } ``` Uses `OPENAI_API_KEY` from environment automatically **Validation endpoint:** ```bash curl -X GET "http://localhost:8001/auth/validate/openai" ``` **Response:** ```json { "provider": "openai", "status": "valid", "models_available": [ "gpt-4o", "gpt-4.1", "o1", "o4-mini" ], "quota": { "tier": "tier-2", "requests_per_minute": 500, "tokens_per_minute": 30000, "current_usage": "8%" } } ``` Uses Azure credentials from environment **Required environment variables:** ```env AZURE_OPENAI_API_KEY=your_api_key AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com AZURE_OPENAI_VERSION=2024-02-15-preview ``` **Validation:** ```bash curl -X GET "http://localhost:8001/auth/validate/azure" ``` ### Authentication Status Check overall authentication status: ```bash cURL curl -X GET "http://localhost:8001/auth/status" ``` ```python Python import requests response = requests.get("http://localhost:8001/auth/status") auth_status = response.json() ``` ```javascript JavaScript const response = await fetch('http://localhost:8001/auth/status'); const authStatus = await response.json(); ``` ```json Response { "authentication": { "overall_status": "healthy", "providers": { "google": { "configured": true, "valid": true, "models_count": 3 }, "openai": { "configured": true, "valid": true, "models_count": 4 }, "openrouter": { "configured": false, "valid": null, "models_count": 0 }, "azure": { "configured": true, "valid": true, "models_count": 2 }, "ollama": { "configured": true, "valid": true, "models_count": 3, "local": true } }, "recommended_provider": "google", "fallback_providers": ["openai", "azure"] }, "authorization": { "mode": "disabled", "required": false } } ``` ## Authorization Mode Optional access control requiring a secret code for wiki generation. ### Enabling Authorization Mode Set authorization environment variables: ```env DEEPWIKI_AUTH_MODE=true DEEPWIKI_AUTH_CODE=your-secret-code-here ``` Choose a strong, unique authorization code. This provides basic access control. Authorization mode requires a server restart: ```bash # Stop current server # Restart with new configuration python -m api.main ``` Server logs should show: "Authorization mode: ENABLED" ```bash curl -X GET "http://localhost:8001/auth/status" ``` Should show: ```json { "authorization": { "mode": "enabled", "required": true } } ``` ### Using Authorization Mode When authorization is enabled, all wiki generation requests must include the auth code: ```bash cURL with Auth curl -X POST "http://localhost:8001/wiki/generate" \ -H "Content-Type: application/json" \ -d '{ "repo_url": "https://github.com/microsoft/vscode", "model_provider": "google", "auth_code": "your-secret-code-here" }' ``` ```python Python with Auth import requests response = requests.post("http://localhost:8001/wiki/generate", json={ "repo_url": "https://github.com/microsoft/vscode", "model_provider": "google", "auth_code": "your-secret-code-here" }) ``` ```javascript JavaScript with Auth const response = await fetch('http://localhost:8001/wiki/generate', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ repo_url: 'https://github.com/microsoft/vscode', model_provider: 'google', auth_code: 'your-secret-code-here' }) }); ``` ### Authorization Validation Test authorization codes before use: ```bash Validate Auth Code curl -X POST "http://localhost:8001/auth/validate" \ -H "Content-Type: application/json" \ -d '{ "auth_code": "your-secret-code-here" }' ``` ```json Valid Code { "valid": true, "message": "Authorization code accepted" } ``` ```json Invalid Code { "valid": false, "message": "Invalid authorization code", "error": "AUTHORIZATION_FAILED" } ``` ## Repository Access Tokens For accessing private repositories, provide personal access tokens. ### Supported Platforms **Token creation:** 1. Go to GitHub Settings → Developer settings → Personal access tokens 2. Generate new token (classic or fine-grained) 3. Select scopes: `repo` (full repository access) **Token format:** `ghp_xxxxxxxxxxxxxxxxxxxx` (classic) or `github_pat_xxxx` (fine-grained) **Usage in requests:** ```json { "repo_url": "https://github.com/company/private-repo", "access_token": "ghp_xxxxxxxxxxxxxxxxxxxx", "model_provider": "google" } ``` **Token creation:** 1. Go to GitLab User Settings → Access Tokens 2. Create personal access token 3. Select scopes: `read_repository` **Token format:** `glpat-xxxxxxxxxxxxxxxxxxxx` **Usage in requests:** ```json { "repo_url": "https://gitlab.com/company/private-repo", "access_token": "glpat-xxxxxxxxxxxxxxxxxxxx", "model_provider": "google" } ``` **Token creation:** 1. Go to BitBucket Account Settings → App passwords 2. Create app password 3. Select permissions: `Repositories: Read` **Token format:** App-specific password **Usage in requests:** ```json { "repo_url": "https://bitbucket.org/company/private-repo", "access_token": "your-app-password", "model_provider": "google" } ``` ### Token Validation Validate repository access tokens: ```bash Validate GitHub Token curl -X POST "http://localhost:8001/auth/validate-token" \ -H "Content-Type: application/json" \ -d '{ "platform": "github", "token": "ghp_xxxxxxxxxxxxxxxxxxxx" }' ``` ```json Valid Token { "valid": true, "platform": "github", "scopes": ["repo", "user"], "user": "your-username", "expires_at": "2024-12-31T23:59:59Z" } ``` ```json Invalid Token { "valid": false, "platform": "github", "error": "BAD_CREDENTIALS", "message": "Token is invalid or expired" } ``` ## Security Best Practices ### Environment Security **Secure storage:** ```bash # Use environment files with restricted permissions chmod 600 .env # Never commit API keys to version control echo ".env" >> .gitignore # Use different keys for development and production # .env.development vs .env.production ``` **Key rotation:** ```bash # Regular key rotation schedule # 1. Generate new API keys # 2. Test with new keys # 3. Update production environment # 4. Revoke old keys # 5. Monitor for any issues ``` **Monitoring:** ```javascript { "api_key_monitoring": { "check_quota_daily": true, "alert_on_high_usage": true, "track_failed_requests": true, "log_security_events": true } } ``` **HTTPS configuration:** ```nginx # Nginx configuration for HTTPS server { listen 443 ssl; server_name deepwiki.yourdomain.com; ssl_certificate /path/to/certificate.crt; ssl_certificate_key /path/to/private.key; location / { proxy_pass http://localhost:8001; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } } ``` **Firewall rules:** ```bash # Only allow necessary ports sudo ufw allow 22 # SSH sudo ufw allow 80 # HTTP (redirect to HTTPS) sudo ufw allow 443 # HTTPS sudo ufw deny 8001 # Block direct API access sudo ufw enable ``` **Input sanitization:** ```python # Example validation middleware def validate_repo_url(url: str) -> bool: allowed_domains = [ 'github.com', 'gitlab.com', 'bitbucket.org' ] parsed = urlparse(url) return ( parsed.scheme in ['https'] and parsed.netloc in allowed_domains and len(parsed.path.split('/')) >= 3 ) ``` **Rate limiting:** ```javascript { "rate_limits": { "requests_per_minute": 10, "requests_per_hour": 100, "requests_per_day": 1000, "burst_allowance": 5 } } ``` ### Production Security ```env # Production security settings DEEPWIKI_AUTH_MODE=true DEEPWIKI_AUTH_CODE=complex-random-string-here # Additional security NODE_ENV=production LOG_LEVEL=WARNING LOG_SENSITIVE_DATA=false ``` ```bash # Restrict API access to authorized IPs # Using firewall or reverse proxy iptables -A INPUT -p tcp --dport 8001 -s 192.168.1.0/24 -j ACCEPT iptables -A INPUT -p tcp --dport 8001 -j DROP ``` ```javascript { "security_monitoring": { "failed_auth_attempts": { "threshold": 10, "window": "1h", "action": "alert" }, "unusual_usage_patterns": { "detect_anomalies": true, "baseline_days": 7 }, "api_abuse_detection": { "large_repository_attempts": true, "rapid_fire_requests": true } } } ``` ## Error Handling ### Authentication Errors ```json Missing API Keys { "error": { "code": "MISSING_API_KEYS", "message": "No valid AI provider API keys configured", "details": { "configured_providers": [], "required_providers": ["google", "openai", "openrouter"], "suggestion": "Configure at least one AI provider API key" } } } ``` ```json Invalid Authorization { "error": { "code": "AUTHORIZATION_REQUIRED", "message": "Authorization code required but not provided", "details": { "auth_mode": "enabled", "provided_code": null, "suggestion": "Include 'auth_code' in request body" } } } ``` ```json Repository Access Denied { "error": { "code": "REPOSITORY_ACCESS_DENIED", "message": "Cannot access private repository", "details": { "repo_url": "https://github.com/company/private-repo", "platform": "github", "suggestion": "Provide valid access token with repository permissions" } } } ``` ### Token Validation Errors ```json { "error": { "code": "TOKEN_EXPIRED", "message": "Repository access token has expired", "details": { "platform": "github", "expired_at": "2024-01-15T10:30:00Z", "suggestion": "Generate new personal access token" } } } ``` **Resolution:** 1. Generate new personal access token 2. Update request with new token 3. Consider using longer-lived tokens for automation ```json { "error": { "code": "INSUFFICIENT_PERMISSIONS", "message": "Token lacks required repository permissions", "details": { "platform": "github", "required_scopes": ["repo"], "current_scopes": ["user"], "suggestion": "Create token with 'repo' scope for private repositories" } } } ``` **Resolution:** 1. Create new token with correct scopes 2. For GitHub: include `repo` scope for private repositories 3. For GitLab: include `read_repository` scope ## Integration Examples ### Middleware Authentication ```python Python Middleware from functools import wraps from flask import request, jsonify def require_auth(f): @wraps(f) def decorated_function(*args, **kwargs): auth_code = request.json.get('auth_code') if not validate_auth_code(auth_code): return jsonify({ 'error': 'Invalid or missing authorization code' }), 401 return f(*args, **kwargs) return decorated_function @app.route('/wiki/generate', methods=['POST']) @require_auth def generate_wiki(): # Wiki generation logic pass ``` ```javascript Node.js Middleware const authenticateRequest = (req, res, next) => { const { auth_code } = req.body; if (!validateAuthCode(auth_code)) { return res.status(401).json({ error: 'Invalid or missing authorization code' }); } next(); }; app.post('/wiki/generate', authenticateRequest, (req, res) => { // Wiki generation logic }); ``` ### Client-Side Authentication ```javascript React Hook import { useState, useEffect } from 'react'; export function useDeepWikiAuth() { const [authStatus, setAuthStatus] = useState(null); useEffect(() => { checkAuthStatus(); }, []); const checkAuthStatus = async () => { try { const response = await fetch('/api/auth/status'); const status = await response.json(); setAuthStatus(status); } catch (error) { console.error('Auth check failed:', error); } }; const validateAuthCode = async (code) => { try { const response = await fetch('/api/auth/validate', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ auth_code: code }) }); return response.ok; } catch (error) { return false; } }; return { authStatus, validateAuthCode, checkAuthStatus }; } ``` ## Next Steps Learn about wiki generation and management endpoints Implement comprehensive security measures Deploy with proper authentication in production Set up authentication monitoring and alerts # Chat API Endpoints Source: https://asyncfunc.mintlify.app/endpoints/chat-endpoints Complete API reference for chat functionality with streaming, research, and real-time features # Chat API Endpoints DeepWikiOpen provides powerful chat capabilities with streaming responses, deep research mode, conversation management, and real-time WebSocket connections. ## Authentication All chat endpoints require authentication using Bearer tokens: ```bash Authorization: Bearer ``` ## Rate Limits * **Standard Chat**: 100 requests per minute * **Research Mode**: 20 requests per minute * **WebSocket**: 1000 messages per minute * **History Operations**: 200 requests per minute *** ## POST /chat/stream Stream chat responses with RAG (Retrieval-Augmented Generation) capabilities. The user's message or question Unique identifier for conversation continuity. Auto-generated if not provided. AI model to use for response generation Options: `gpt-4-turbo`, `gpt-3.5-turbo`, `claude-3-opus`, `claude-3-sonnet` Response creativity (0.0 to 2.0) Maximum response length (1 to 4096) Specific knowledge sources to query Options: `wikipedia`, `academic_papers`, `news`, `documentation`, `all` Streaming configuration options Include source citations in stream Include processing metadata ### Request Example ```bash curl curl -X POST "https://api.deepwikiopen.com/chat/stream" \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{ "message": "Explain quantum computing and its applications", "conversation_id": "conv_123", "model": "gpt-4-turbo", "temperature": 0.7, "context_sources": ["wikipedia", "academic_papers"], "stream_options": { "include_sources": true, "include_metadata": true } }' ``` ```python Python import requests import json def stream_chat(message): url = "https://api.deepwikiopen.com/chat/stream" headers = { "Authorization": "Bearer your-api-key", "Content-Type": "application/json" } data = { "message": message, "model": "gpt-4-turbo", "temperature": 0.7, "context_sources": ["wikipedia", "academic_papers"], "stream_options": { "include_sources": True } } response = requests.post(url, headers=headers, json=data, stream=True) for line in response.iter_lines(): if line: decoded_line = line.decode('utf-8') if decoded_line.startswith('data: '): try: chunk = json.loads(decoded_line[6:]) if chunk.get('type') == 'content': print(chunk['content'], end='', flush=True) except json.JSONDecodeError: pass stream_chat("Explain quantum computing and its applications") ``` ```javascript JavaScript async function streamChat(message) { const response = await fetch('https://api.deepwikiopen.com/chat/stream', { method: 'POST', headers: { 'Authorization': 'Bearer your-api-key', 'Content-Type': 'application/json' }, body: JSON.stringify({ message: message, model: 'gpt-4-turbo', temperature: 0.7, context_sources: ['wikipedia', 'academic_papers'], stream_options: { include_sources: true } }) }); const reader = response.body.getReader(); const decoder = new TextDecoder(); try { while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { try { const data = JSON.parse(line.slice(6)); if (data.type === 'content') { process.stdout.write(data.content); } } catch (e) { // Skip invalid JSON } } } } } finally { reader.releaseLock(); } } streamChat('Explain quantum computing and its applications'); ``` ### Streaming Response Format (SSE) The response is streamed using Server-Sent Events (SSE) format: Type of stream chunk Values: `start`, `content`, `sources`, `metadata`, `end`, `error` Text content (for `content` type chunks) Retrieved knowledge sources (for `sources` type chunks) Source document title Source URL or identifier Relevant text excerpt Relevance score (0.0 to 1.0) Processing metadata (for `metadata` type chunks) Total tokens consumed Processing time in milliseconds Number of sources searched ### Example Stream Response ``` data: {"type": "start", "conversation_id": "conv_123", "timestamp": "2024-01-15T10:30:00Z"} data: {"type": "content", "content": "Quantum computing is a revolutionary"} data: {"type": "content", "content": " approach to computation that leverages"} data: {"type": "sources", "sources": [ { "title": "Quantum Computing - Wikipedia", "url": "https://en.wikipedia.org/wiki/Quantum_computing", "snippet": "Quantum computing is a type of computation whose operations can harness the phenomena of quantum mechanics...", "relevance_score": 0.92 } ]} data: {"type": "content", "content": " quantum mechanical phenomena..."} data: {"type": "metadata", "metadata": {"tokens_used": 156, "processing_time": 1250, "sources_queried": 5}} data: {"type": "end", "conversation_id": "conv_123", "total_tokens": 156} ``` *** ## POST /chat/research Trigger deep research mode for comprehensive, multi-source analysis. Research question or topic Depth of research analysis Options: `quick`, `standard`, `deep`, `comprehensive` Types of sources to include in research Options: `academic`, `wikipedia`, `news`, `documentation`, `books`, `patents` Filter sources by publication date Start date (ISO 8601 format) End date (ISO 8601 format) Maximum number of sources to analyze (5 to 100) Response format structure Options: `structured`, `narrative`, `bullet_points`, `academic` ### Request Example ```bash curl curl -X POST "https://api.deepwikiopen.com/chat/research" \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{ "query": "Latest developments in CRISPR gene editing 2024", "research_depth": "deep", "source_types": ["academic", "news", "patents"], "time_range": { "start_date": "2024-01-01", "end_date": "2024-12-31" }, "max_sources": 50, "output_format": "structured" }' ``` ```python Python import requests def deep_research(query): url = "https://api.deepwikiopen.com/chat/research" headers = { "Authorization": "Bearer your-api-key", "Content-Type": "application/json" } data = { "query": query, "research_depth": "deep", "source_types": ["academic", "news", "patents"], "time_range": { "start_date": "2024-01-01", "end_date": "2024-12-31" }, "max_sources": 50, "output_format": "structured" } response = requests.post(url, headers=headers, json=data) return response.json() result = deep_research("Latest developments in CRISPR gene editing 2024") print(result) ``` ```javascript JavaScript async function deepResearch(query) { const response = await fetch('https://api.deepwikiopen.com/chat/research', { method: 'POST', headers: { 'Authorization': 'Bearer your-api-key', 'Content-Type': 'application/json' }, body: JSON.stringify({ query: query, research_depth: 'deep', source_types: ['academic', 'news', 'patents'], time_range: { start_date: '2024-01-01', end_date: '2024-12-31' }, max_sources: 50, output_format: 'structured' }) }); return await response.json(); } deepResearch('Latest developments in CRISPR gene editing 2024') .then(result => console.log(result)); ``` ### Response Format Unique identifier for the research session Original research query Executive summary of findings Main discoveries and insights Key insight or discovery Confidence score (0.0 to 1.0) Sources supporting this finding Comprehensive analysis text All sources analyzed Source title Author names Publication date Source URL Relevance to query (0.0 to 1.0) Important points from source Research session metadata Total sources processed Time taken in seconds Overall confidence in findings *** ## GET /chat/history Retrieve conversation history for a specific conversation or user. Specific conversation ID to retrieve User ID to get all conversations for Maximum number of messages to return (1 to 1000) Number of messages to skip for pagination Filter messages after this date (ISO 8601) Filter messages before this date (ISO 8601) ### Request Example ```bash curl curl -X GET "https://api.deepwikiopen.com/chat/history?conversation_id=conv_123&limit=100" \ -H "Authorization: Bearer your-api-key" ``` ```python Python import requests def get_chat_history(conversation_id, limit=50): url = f"https://api.deepwikiopen.com/chat/history" headers = {"Authorization": "Bearer your-api-key"} params = { "conversation_id": conversation_id, "limit": limit } response = requests.get(url, headers=headers, params=params) return response.json() history = get_chat_history("conv_123", limit=100) ``` ```javascript JavaScript async function getChatHistory(conversationId, limit = 50) { const params = new URLSearchParams({ conversation_id: conversationId, limit: limit.toString() }); const response = await fetch(`https://api.deepwikiopen.com/chat/history?${params}`, { headers: { 'Authorization': 'Bearer your-api-key' } }); return await response.json(); } getChatHistory('conv_123', 100); ``` ### Response Format Conversation identifier Array of conversation messages Unique message identifier Message sender role Values: `user`, `assistant`, `system` Message content Message timestamp (ISO 8601) Additional message data AI model that generated response Tokens consumed for this message Knowledge sources used Pagination information Total messages in conversation Whether more messages are available Offset for next page *** ## DELETE /chat/history Clear conversation history for a specific conversation or user. Specific conversation to clear (optional if user\_id provided) Clear all conversations for user (optional if conversation\_id provided) Only clear messages before this date (ISO 8601) ### Request Example ```bash curl curl -X DELETE "https://api.deepwikiopen.com/chat/history" \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{ "conversation_id": "conv_123" }' ``` ```python Python import requests def clear_chat_history(conversation_id): url = "https://api.deepwikiopen.com/chat/history" headers = { "Authorization": "Bearer your-api-key", "Content-Type": "application/json" } data = {"conversation_id": conversation_id} response = requests.delete(url, headers=headers, json=data) return response.json() result = clear_chat_history("conv_123") ``` ```javascript JavaScript async function clearChatHistory(conversationId) { const response = await fetch('https://api.deepwikiopen.com/chat/history', { method: 'DELETE', headers: { 'Authorization': 'Bearer your-api-key', 'Content-Type': 'application/json' }, body: JSON.stringify({ conversation_id: conversationId }) }); return await response.json(); } clearChatHistory('conv_123'); ``` ### Response Format Whether the operation was successful Number of messages deleted Number of conversations affected *** ## WebSocket /ws/chat Real-time bidirectional chat connection with live streaming and collaborative features. ### Connection URL ``` wss://api.deepwikiopen.com/ws/chat?token=your-api-key&conversation_id=conv_123 ``` ### Query Parameters Your API authentication token Conversation ID to connect to (auto-generated if not provided) User identifier for multi-user conversations ### Connection Example ```javascript JavaScript const ws = new WebSocket('wss://api.deepwikiopen.com/ws/chat?token=your-api-key&conversation_id=conv_123'); ws.onopen = function(event) { console.log('Connected to chat WebSocket'); // Send initial message ws.send(JSON.stringify({ type: 'message', content: 'Hello, can you help me with quantum computing?', model: 'gpt-4-turbo' })); }; ws.onmessage = function(event) { const data = JSON.parse(event.data); console.log('Received:', data); switch(data.type) { case 'content': // Stream AI response content process.stdout.write(data.content); break; case 'user_message': // Another user's message in group chat console.log(`${data.user}: ${data.content}`); break; case 'typing': // Typing indicator console.log(`${data.user} is typing...`); break; } }; ws.onerror = function(error) { console.error('WebSocket error:', error); }; ws.onclose = function(event) { console.log('Disconnected from chat WebSocket'); }; ``` ```python Python import asyncio import websockets import json async def chat_websocket(): uri = "wss://api.deepwikiopen.com/ws/chat?token=your-api-key&conversation_id=conv_123" async with websockets.connect(uri) as websocket: # Send initial message await websocket.send(json.dumps({ "type": "message", "content": "Hello, can you help me with quantum computing?", "model": "gpt-4-turbo" })) # Listen for responses async for message in websocket: data = json.loads(message) if data["type"] == "content": print(data["content"], end="", flush=True) elif data["type"] == "user_message": print(f"\n{data['user']}: {data['content']}") elif data["type"] == "typing": print(f"\n{data['user']} is typing...") # Run the WebSocket client asyncio.run(chat_websocket()) ``` ### Message Types #### Client to Server Messages Message type Values: `message`, `typing`, `stop_generation`, `join_room`, `leave_room` Message content (for `message` type) AI model to use (for `message` type) Room to join/leave (for room operations) #### Server to Client Messages Response type Values: `content`, `sources`, `user_message`, `typing`, `error`, `connection_info` Response content or user message Username for user messages and typing indicators Current conversation identifier Error information (for `error` type) Error code Error description ### Real-time Features #### Typing Indicators ```javascript // Send typing indicator ws.send(JSON.stringify({ type: 'typing', is_typing: true })); // Stop typing indicator ws.send(JSON.stringify({ type: 'typing', is_typing: false })); ``` #### Multi-user Chat Rooms ```javascript // Join a chat room ws.send(JSON.stringify({ type: 'join_room', room_id: 'quantum_physics_discussion' })); // Leave a chat room ws.send(JSON.stringify({ type: 'leave_room', room_id: 'quantum_physics_discussion' })); ``` #### Stop AI Generation ```javascript // Stop the AI from generating more content ws.send(JSON.stringify({ type: 'stop_generation' })); ``` *** ## Error Handling ### HTTP Error Codes Invalid request parameters or malformed JSON Missing or invalid API key Insufficient permissions or quota exceeded Conversation or resource not found Rate limit exceeded Server error - try again later ### Error Response Format ```json { "error": { "code": "RATE_LIMIT_EXCEEDED", "message": "Rate limit exceeded. Try again in 60 seconds.", "details": { "retry_after": 60, "current_usage": 100, "limit": 100 } } } ``` ### WebSocket Error Codes Authentication token is invalid or expired Too many messages sent too quickly Message format is invalid or unsupported Specified conversation ID doesn't exist *** ## Real Conversation Examples ### Basic Q\&A with Streaming ```bash # Request curl -X POST "https://api.deepwikiopen.com/chat/stream" \ -H "Authorization: Bearer demo-key" \ -H "Content-Type: application/json" \ -d '{ "message": "What causes climate change?", "context_sources": ["wikipedia", "academic_papers"] }' # Streaming Response data: {"type": "start", "conversation_id": "conv_abc123"} data: {"type": "content", "content": "Climate change is primarily caused by human activities that increase"} data: {"type": "content", "content": " greenhouse gas concentrations in the atmosphere. The main causes include:"} data: {"type": "sources", "sources": [ { "title": "Climate Change - IPCC Report 2023", "url": "https://www.ipcc.ch/report/ar6/", "snippet": "Human influence has unequivocally warmed the planet...", "relevance_score": 0.95 } ]} data: {"type": "content", "content": "\n\n**1. Fossil Fuel Combustion**\nBurning coal, oil, and natural gas..."} data: {"type": "end", "conversation_id": "conv_abc123", "total_tokens": 342} ``` ### Deep Research Mode ```bash # Research Request curl -X POST "https://api.deepwikiopen.com/chat/research" \ -H "Authorization: Bearer demo-key" \ -H "Content-Type: application/json" \ -d '{ "query": "Impact of AI on healthcare diagnostics", "research_depth": "comprehensive", "source_types": ["academic", "news", "documentation"], "max_sources": 25 }' # Response (truncated for brevity) { "research_id": "research_xyz789", "query": "Impact of AI on healthcare diagnostics", "summary": "AI is revolutionizing healthcare diagnostics through machine learning algorithms that can analyze medical images, predict disease outcomes, and assist in early detection with unprecedented accuracy...", "key_findings": [ { "finding": "AI diagnostic systems show 94% accuracy in detecting skin cancer vs 86% for dermatologists", "confidence": 0.92, "supporting_sources": ["Nature Medicine 2024", "JAMA Dermatology 2024"] } ], "sources": [ { "title": "AI in Medical Diagnosis: A Systematic Review", "authors": ["Smith, J.", "Chen, L.", "Kumar, R."], "publication_date": "2024-03-15", "url": "https://doi.org/10.1038/s41591-024-2847-2", "relevance_score": 0.96, "key_points": [ "Machine learning models outperform traditional diagnostic methods", "Reduced diagnostic errors by 23% in clinical trials" ] } ] } ``` ### WebSocket Multi-user Chat ```javascript // User A connects and sends message const wsA = new WebSocket('wss://api.deepwikiopen.com/ws/chat?token=user-a-key&room_id=ai_discussion'); wsA.send(JSON.stringify({ type: 'message', content: 'What are the latest developments in transformer models?' })); // User B receives the message and AI response // Message from User A: {"type": "user_message", "user": "UserA", "content": "What are..."} // AI starts responding: {"type": "content", "content": "The latest developments..."} // User B sends a follow-up const wsB = new WebSocket('wss://api.deepwikiopen.com/ws/chat?token=user-b-key&room_id=ai_discussion'); wsB.send(JSON.stringify({ type: 'message', content: 'Can you elaborate on attention mechanisms?' })); ``` ## SDK Libraries For easier integration, use our official SDK libraries: * **Python**: `pip install deepwikiopen` * **JavaScript/Node.js**: `npm install deepwikiopen-sdk` * **Go**: `go get github.com/deepwikiopen/go-sdk` * **Rust**: `cargo add deepwikiopen` Each SDK provides typed interfaces, automatic retry logic, streaming support, and WebSocket management. # null Source: https://asyncfunc.mintlify.app/endpoints/model-endpoints # Model Endpoints API DeepWiki provides a flexible provider-based model selection system that supports multiple LLM providers. This documentation covers the model-related API endpoints and how to work with different model providers. ## Overview DeepWiki's model provider system allows you to choose from various AI model providers including: * **Google** - Gemini models * **OpenAI** - GPT models * **OpenRouter** - Access to multiple model providers through a unified API * **Azure OpenAI** - Azure-hosted OpenAI models * **Ollama** - Locally running open-source models * **AWS Bedrock** - Amazon's managed AI models * **DashScope** - Alibaba's AI models Each provider offers different models with specific capabilities and pricing. The system is designed to be extensible, allowing service providers to add custom models as needed. ## Authentication Before using any model provider, you need to configure the appropriate API keys as environment variables: ```bash # Google Gemini GOOGLE_API_KEY=your_google_api_key # OpenAI OPENAI_API_KEY=your_openai_api_key # OpenRouter OPENROUTER_API_KEY=your_openrouter_api_key # Azure OpenAI AZURE_OPENAI_API_KEY=your_azure_openai_api_key AZURE_OPENAI_ENDPOINT=your_azure_openai_endpoint AZURE_OPENAI_VERSION=your_azure_openai_version # AWS Bedrock AWS_ACCESS_KEY_ID=your_aws_access_key AWS_SECRET_ACCESS_KEY=your_aws_secret_key AWS_REGION=your_aws_region # Ollama (if not local) OLLAMA_HOST=http://your-ollama-host:11434 # DashScope DASHSCOPE_API_KEY=your_dashscope_api_key ``` ## Endpoints ### Get Model Configuration Retrieves the available model providers and their supported models. ```http GET /models/config ``` #### Response ```json { "providers": [ { "id": "google", "name": "Google", "supportsCustomModel": true, "models": [ { "id": "gemini-2.0-flash", "name": "gemini-2.0-flash" }, { "id": "gemini-2.5-flash-preview-05-20", "name": "gemini-2.5-flash-preview-05-20" }, { "id": "gemini-2.5-pro-preview-03-25", "name": "gemini-2.5-pro-preview-03-25" } ] }, { "id": "openai", "name": "Openai", "supportsCustomModel": true, "models": [ { "id": "gpt-4o", "name": "gpt-4o" }, { "id": "gpt-4.1", "name": "gpt-4.1" }, { "id": "o1", "name": "o1" }, { "id": "o3", "name": "o3" }, { "id": "o4-mini", "name": "o4-mini" } ] } ], "defaultProvider": "google" } ``` #### Example Requests **cURL:** ```bash curl -X GET "http://localhost:8001/models/config" \ -H "Accept: application/json" ``` **Python:** ```python import requests response = requests.get("http://localhost:8001/models/config") config = response.json() # List all providers for provider in config["providers"]: print(f"Provider: {provider['name']}") for model in provider["models"]: print(f" - {model['id']}") ``` **JavaScript:** ```javascript const response = await fetch('http://localhost:8001/models/config'); const config = await response.json(); // Get available models for a specific provider const googleModels = config.providers .find(p => p.id === 'google') ?.models || []; ``` ### Using Models in Chat Completions The model selection is integrated into the chat completions endpoint. You specify the provider and model when making requests. ```http POST /chat/completions/stream ``` #### Request Body ```json { "repo_url": "https://github.com/user/repo", "messages": [ { "role": "user", "content": "Explain the main functionality of this repository" } ], "provider": "google", "model": "gemini-2.0-flash", "language": "en", "token": "optional_github_token_for_private_repos" } ``` #### Parameters | Parameter | Type | Required | Description | | ---------- | ------ | -------- | ------------------------------------------------------------------------------ | | `repo_url` | string | Yes | URL of the repository to analyze | | `messages` | array | Yes | Array of chat messages | | `provider` | string | No | Model provider ID (default: "google") | | `model` | string | No | Model ID for the specified provider (uses provider's default if not specified) | | `language` | string | No | Language for content generation (default: "en") | | `token` | string | No | Personal access token for private repositories | | `type` | string | No | Repository type: "github", "gitlab", or "bitbucket" (default: "github") | #### Example Requests **cURL with Google Gemini:** ```bash curl -X POST "http://localhost:8001/chat/completions/stream" \ -H "Content-Type: application/json" \ -d '{ "repo_url": "https://github.com/asyncfuncai/deepwiki-open", "messages": [ { "role": "user", "content": "What is the main purpose of this project?" } ], "provider": "google", "model": "gemini-2.0-flash" }' ``` **Python with OpenAI:** ```python import requests import json url = "http://localhost:8001/chat/completions/stream" data = { "repo_url": "https://github.com/asyncfuncai/deepwiki-open", "messages": [ { "role": "user", "content": "Explain the architecture of this application" } ], "provider": "openai", "model": "gpt-4o" } response = requests.post(url, json=data, stream=True) for line in response.iter_lines(): if line: print(line.decode('utf-8')) ``` **JavaScript with OpenRouter:** ```javascript const response = await fetch('http://localhost:8001/chat/completions/stream', { method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify({ repo_url: 'https://github.com/asyncfuncai/deepwiki-open', messages: [ { role: 'user', content: 'What are the key features of this repository?' } ], provider: 'openrouter', model: 'anthropic/claude-3.5-sonnet' }) }); const reader = response.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; console.log(decoder.decode(value)); } ``` ## Model Provider Details ### Google (Gemini) Default provider with fast and capable models. **Available Models:** * `gemini-2.0-flash` - Fast, efficient model (default) * `gemini-2.5-flash-preview-05-20` - Preview of upcoming flash model * `gemini-2.5-pro-preview-03-25` - Preview of pro model **Configuration:** ```json { "provider": "google", "model": "gemini-2.0-flash" } ``` ### OpenAI Industry-standard GPT models. **Available Models:** * `gpt-4o` - Latest GPT-4 model (default) * `gpt-4.1` - Updated GPT-4 version * `o1` - Reasoning model * `o3` - Advanced model * `o4-mini` - Smaller, faster model **Configuration:** ```json { "provider": "openai", "model": "gpt-4o" } ``` ### OpenRouter Access multiple model providers through a unified API. **Available Models:** * `openai/gpt-4o` - OpenAI GPT-4 (default) * `deepseek/deepseek-r1` - DeepSeek reasoning model * `anthropic/claude-3.7-sonnet` - Claude 3.7 Sonnet * `anthropic/claude-3.5-sonnet` - Claude 3.5 Sonnet * And many more... **Configuration:** ```json { "provider": "openrouter", "model": "anthropic/claude-3.5-sonnet" } ``` ### Azure OpenAI Azure-hosted OpenAI models with enterprise features. **Available Models:** * `gpt-4o` - GPT-4 on Azure (default) * `gpt-4` - Standard GPT-4 * `gpt-35-turbo` - GPT-3.5 Turbo * `gpt-4-turbo` - GPT-4 Turbo **Configuration:** ```json { "provider": "azure", "model": "gpt-4o" } ``` **Note:** Requires Azure OpenAI endpoint and API version configuration. ### Ollama Run models locally for privacy and cost efficiency. **Available Models:** * `qwen3:1.7b` - Small, fast model (default) * `llama3:8b` - Llama 3 8B model * `qwen3:8b` - Qwen 3 8B model **Configuration:** ```json { "provider": "ollama", "model": "llama3:8b" } ``` **Note:** Requires Ollama to be running locally or accessible via OLLAMA\_HOST. ### AWS Bedrock Amazon's managed AI service. **Available Models:** * `anthropic.claude-3-sonnet-20240229-v1:0` - Claude 3 Sonnet (default) * `anthropic.claude-3-haiku-20240307-v1:0` - Claude 3 Haiku * `anthropic.claude-3-opus-20240229-v1:0` - Claude 3 Opus * `amazon.titan-text-express-v1` - Amazon Titan * `cohere.command-r-v1:0` - Cohere Command R * `ai21.j2-ultra-v1` - AI21 Jurassic **Configuration:** ```json { "provider": "bedrock", "model": "anthropic.claude-3-sonnet-20240229-v1:0" } ``` ### DashScope Alibaba's AI models. **Available Models:** * `qwen-plus` - Qwen Plus (default) * `qwen-turbo` - Qwen Turbo * `deepseek-r1` - DeepSeek R1 **Configuration:** ```json { "provider": "dashscope", "model": "qwen-plus" } ``` ## Custom Models Providers that support custom models (where `supportsCustomModel: true`) allow you to specify model IDs not listed in the predefined options. This is useful for: * Newly released models * Fine-tuned models * Private or custom deployments **Example with custom model:** ```json { "provider": "openai", "model": "ft:gpt-3.5-turbo-0125:custom:model:id" } ``` ## Error Handling The API returns standard HTTP status codes and error messages. ### Common Errors **400 Bad Request:** ```json { "detail": "No messages provided" } ``` **401 Unauthorized:** ```json { "detail": "Invalid API key for provider" } ``` **404 Not Found:** ```json { "detail": "Model not found for provider" } ``` **500 Internal Server Error:** ```json { "detail": "Error preparing retriever: No valid document embeddings found" } ``` ### Error Handling Examples **Python:** ```python try: response = requests.post(url, json=data) response.raise_for_status() result = response.json() except requests.exceptions.HTTPError as e: if e.response.status_code == 400: print(f"Bad request: {e.response.json()['detail']}") elif e.response.status_code == 500: print(f"Server error: {e.response.json()['detail']}") ``` **JavaScript:** ```javascript try { const response = await fetch(url, options); if (!response.ok) { const error = await response.json(); throw new Error(error.detail); } const data = await response.json(); } catch (error) { console.error('API Error:', error.message); } ``` ## Rate Limiting Rate limiting depends on the model provider being used: * **Google Gemini**: Subject to Google AI Studio quotas * **OpenAI**: Based on your OpenAI tier and usage * **OpenRouter**: Depends on the specific model and your OpenRouter credits * **Azure OpenAI**: Based on your Azure deployment quotas * **Ollama**: Limited by local hardware resources * **AWS Bedrock**: Subject to AWS service quotas * **DashScope**: Based on Alibaba Cloud quotas It's recommended to implement retry logic with exponential backoff for production applications. ## Best Practices 1. **Model Selection**: Choose models based on your specific needs: * Use faster models (e.g., `gemini-2.0-flash`, `gpt-4o-mini`) for simple queries * Use more capable models (e.g., `gpt-4o`, `claude-3.5-sonnet`) for complex analysis 2. **Error Handling**: Always implement proper error handling for API calls 3. **Streaming**: The chat endpoint supports streaming responses for better user experience 4. **Caching**: DeepWiki automatically caches wiki generation results to improve performance 5. **Security**: Never expose API keys in client-side code; use environment variables 6. **Cost Optimization**: Monitor usage and costs, especially with premium models ## Configuration Files DeepWiki uses JSON configuration files to manage model settings: * `api/config/generator.json` - Model provider configurations * `api/config/embedder.json` - Embedding model settings * `api/config/repo.json` - Repository processing settings You can customize these files or use the `DEEPWIKI_CONFIG_DIR` environment variable to specify a custom configuration directory. # API Overview Source: https://asyncfunc.mintlify.app/endpoints/overview Complete guide to the DeepWiki-Open REST API for programmatic wiki generation and repository analysis The DeepWiki-Open API provides programmatic access to all wiki generation and repository analysis features. Built with FastAPI, it offers high-performance endpoints for integration into your development workflows. ## API Base URL Base URL: `http://localhost:8001` (development) For production deployments, replace with your actual API server URL. ## API Architecture The DeepWiki API is organized into several key areas: Generate comprehensive documentation wikis from repository URLs Interactive Q\&A with repository content using RAG Configure and manage AI model providers Real-time streaming for generation progress and chat ## Quick Start ### Authentication Most endpoints require authentication via environment-configured API keys. The API validates your configured providers automatically. ```bash Test API Health curl -X GET "http://localhost:8001/health" ``` ```json Response { "status": "healthy", "version": "0.1.0", "timestamp": "2024-01-15T10:30:00Z" } ``` ### Basic Wiki Generation ```bash cURL curl -X POST "http://localhost:8001/wiki/generate" \ -H "Content-Type: application/json" \ -d '{ "repo_url": "https://github.com/microsoft/TypeScript-Node-Starter", "model_provider": "google", "model_name": "gemini-2.0-flash", "force_regenerate": false }' ``` ```python Python import requests response = requests.post("http://localhost:8001/wiki/generate", json={ "repo_url": "https://github.com/microsoft/TypeScript-Node-Starter", "model_provider": "google", "model_name": "gemini-2.0-flash", "force_regenerate": False }) wiki_data = response.json() ``` ```javascript JavaScript const response = await fetch('http://localhost:8001/wiki/generate', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ repo_url: 'https://github.com/microsoft/TypeScript-Node-Starter', model_provider: 'google', model_name: 'gemini-2.0-flash', force_regenerate: false }) }); const wikiData = await response.json(); ``` ```json Success Response { "success": true, "wiki_id": "wiki_1234567890", "repo_info": { "url": "https://github.com/microsoft/TypeScript-Node-Starter", "name": "TypeScript-Node-Starter", "owner": "microsoft", "platform": "github" }, "generation_status": "completed", "wiki_pages": [ { "id": "overview", "title": "Project Overview", "content": "TypeScript Node Starter is...", "importance": "high", "filePaths": ["README.md", "package.json"], "relatedPages": ["setup", "architecture"] } ], "metadata": { "generated_at": "2024-01-15T10:30:00Z", "model_used": "gemini-2.0-flash", "processing_time": 45.2, "total_files_analyzed": 127 } } ``` ## Core Endpoints ### Wiki Generation Generate a comprehensive wiki from a repository URL. **Request Body:** * `repo_url` (string, required): Repository URL * `model_provider` (string): AI provider (`google`, `openai`, `openrouter`, `azure`, `ollama`) * `model_name` (string): Specific model to use * `force_regenerate` (boolean): Force regeneration even if cached * `access_token` (string): Repository access token for private repos * `auth_code` (string): Authorization code (if auth mode enabled) **Response:** Complete wiki structure with pages and metadata List all processed repositories and their wiki status. **Query Parameters:** * `limit` (integer): Number of results to return * `offset` (integer): Pagination offset **Response:** Array of processed projects with generation status Retrieve a specific generated wiki by project ID. **Path Parameters:** * `project_id` (string): Unique project identifier **Response:** Complete wiki data including all pages ### Chat & RAG Stream chat responses using RAG on repository content. **Request Body:** * `message` (string, required): User question * `repo_url` (string, required): Repository URL for context * `conversation_history` (array): Previous conversation messages * `model_provider` (string): AI provider for responses * `deep_research` (boolean): Enable multi-turn research mode **Response:** Server-sent events with streaming chat responses ### Model Configuration Get available model providers and configurations. **Response:** Available providers, models, and their parameters Validate API keys and model availability. **Request Body:** * `provider` (string): Provider to validate * `model_name` (string): Specific model to test **Response:** Validation status and model information ## Data Models ### WikiPage Unique identifier for the wiki page Human-readable page title Full page content in Markdown format with Mermaid diagrams Source file paths that contributed to this page Page importance level: `high`, `medium`, or `low` Array of related page IDs for cross-references ### RepoInfo Full repository URL Repository name Repository owner/organization Git platform: `github`, `gitlab`, or `bitbucket` Whether the repository is private Default branch name (usually `main` or `master`) ## Error Handling The API uses standard HTTP status codes and returns detailed error information: ```json Error Response { "error": { "code": "INVALID_REPOSITORY", "message": "Repository not found or not accessible", "details": { "repo_url": "https://github.com/invalid/repo", "status_code": 404, "suggestion": "Verify the repository URL and access permissions" }, "timestamp": "2024-01-15T10:30:00Z" } } ``` ### Common Error Codes **Common causes:** * Invalid repository URL format * Missing required parameters * Invalid model provider/name combination **Example:** ```json { "error": { "code": "VALIDATION_ERROR", "message": "Invalid repository URL format", "details": { "field": "repo_url", "provided": "not-a-valid-url", "expected": "https://github.com/owner/repo" } } } ``` **Common causes:** * Missing or invalid API keys * Repository access token required but not provided * Invalid authorization code **Example:** ```json { "error": { "code": "API_KEY_INVALID", "message": "Google API key is invalid or expired", "details": { "provider": "google", "suggestion": "Check your GOOGLE_API_KEY environment variable" } } } ``` **Common causes:** * Repository doesn't exist * Repository is private and requires access token * Wiki not found for the given project ID **Example:** ```json { "error": { "code": "REPOSITORY_NOT_FOUND", "message": "Repository not accessible", "details": { "repo_url": "https://github.com/private/repo", "suggestion": "Provide access_token for private repositories" } } } ``` **Common causes:** * API rate limits exceeded * AI provider rate limits reached **Example:** ```json { "error": { "code": "RATE_LIMIT_EXCEEDED", "message": "OpenAI API rate limit exceeded", "details": { "provider": "openai", "retry_after": 60, "suggestion": "Wait 60 seconds before retrying or use a different provider" } } } ``` **Common causes:** * AI model generation failures * Repository processing errors * Configuration issues **Example:** ```json { "error": { "code": "GENERATION_FAILED", "message": "Wiki generation failed due to model error", "details": { "model": "gpt-4o", "stage": "content_generation", "suggestion": "Try a different model or contact support" } } } ``` ## Rate Limits Rate limits depend on your AI provider's limits. DeepWiki doesn't impose additional rate limits. ### Provider Rate Limits * **GPT-4**: 500 requests/minute, 30,000 tokens/minute * **GPT-3.5**: 3,500 requests/minute, 90,000 tokens/minute * Varies by usage tier and model * **Free tier**: 15 requests/minute, 32,000 tokens/minute * **Paid tier**: 360 requests/minute, 120,000 tokens/minute * Varies by model and payment plan * Some models have per-request costs * Check [OpenRouter pricing](https://openrouter.ai/docs#limits) * Configurable limits per deployment * Enterprise-grade quotas available * Contact Microsoft for high-volume needs ## WebSocket API For real-time updates during wiki generation and chat: ```javascript WebSocket Connection const ws = new WebSocket('ws://localhost:8001/ws/wiki/generate'); ws.onopen = () => { ws.send(JSON.stringify({ repo_url: 'https://github.com/microsoft/vscode', model_provider: 'google', model_name: 'gemini-2.0-flash' })); }; ws.onmessage = (event) => { const data = JSON.parse(event.data); switch(data.type) { case 'progress': console.log(`Progress: ${data.progress}%`); break; case 'page_generated': console.log(`Generated page: ${data.page.title}`); break; case 'completed': console.log('Wiki generation completed!'); break; case 'error': console.error('Error:', data.error); break; } }; ``` ## SDK Examples ### Python SDK Usage ```python import asyncio import aiohttp import json class DeepWikiClient: def __init__(self, base_url="http://localhost:8001"): self.base_url = base_url async def generate_wiki(self, repo_url, model_provider="google", model_name="gemini-2.0-flash"): async with aiohttp.ClientSession() as session: payload = { "repo_url": repo_url, "model_provider": model_provider, "model_name": model_name } async with session.post( f"{self.base_url}/wiki/generate", json=payload ) as response: return await response.json() async def ask_question(self, question, repo_url): async with aiohttp.ClientSession() as session: payload = { "message": question, "repo_url": repo_url } async with session.post( f"{self.base_url}/chat/stream", json=payload ) as response: async for line in response.content: yield json.loads(line.decode()) # Usage async def main(): client = DeepWikiClient() # Generate wiki wiki = await client.generate_wiki( "https://github.com/fastapi/fastapi" ) print(f"Generated {len(wiki['wiki_pages'])} pages") # Ask questions async for response in client.ask_question( "How does FastAPI handle dependency injection?", "https://github.com/fastapi/fastapi" ): print(response['content'], end='', flush=True) asyncio.run(main()) ``` ### Node.js SDK Usage ```javascript class DeepWikiClient { constructor(baseUrl = 'http://localhost:8001') { this.baseUrl = baseUrl; } async generateWiki(repoUrl, options = {}) { const { modelProvider = 'google', modelName = 'gemini-2.0-flash', forceRegenerate = false } = options; const response = await fetch(`${this.baseUrl}/wiki/generate`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ repo_url: repoUrl, model_provider: modelProvider, model_name: modelName, force_regenerate: forceRegenerate }) }); if (!response.ok) { throw new Error(`API Error: ${response.status}`); } return await response.json(); } async *askQuestion(question, repoUrl) { const response = await fetch(`${this.baseUrl}/chat/stream`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ message: question, repo_url: repoUrl }) }); const reader = response.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\n').filter(line => line.trim()); for (const line of lines) { if (line.startsWith('data: ')) { try { yield JSON.parse(line.slice(6)); } catch (e) { console.warn('Failed to parse SSE data:', line); } } } } } } // Usage const client = new DeepWikiClient(); async function example() { try { // Generate wiki const wiki = await client.generateWiki( 'https://github.com/expressjs/express' ); console.log(`Generated ${wiki.wiki_pages.length} pages`); // Ask question with streaming response console.log('\nAsking question...'); for await (const chunk of client.askQuestion( 'Explain Express.js middleware system', 'https://github.com/expressjs/express' )) { if (chunk.type === 'content') { process.stdout.write(chunk.content); } } } catch (error) { console.error('Error:', error.message); } } example(); ``` ## Next Steps Detailed wiki generation and management endpoints Interactive chat and RAG endpoints for repository Q\&A Real-time streaming APIs for live updates API authentication and security configuration # WebSocket API Source: https://asyncfunc.mintlify.app/endpoints/websocket-api Real-time streaming API for chat completions and wiki generation # WebSocket API DeepWiki uses WebSocket connections to provide real-time streaming responses for chat completions and wiki generation. This enables a more responsive user experience with lower latency compared to traditional HTTP streaming. ## Overview The WebSocket API replaces HTTP streaming endpoints with persistent WebSocket connections that: * Stream AI responses in real-time as they're generated * Provide immediate feedback for long-running operations * Support bidirectional communication for future enhancements * Automatically fall back to HTTP if WebSocket connection fails ## Connection Establishment ### Endpoint ``` ws://localhost:8001/ws/chat ``` For production deployments with SSL: ``` wss://your-domain.com/ws/chat ``` ### Client Connection Example ```typescript // Convert HTTP URL to WebSocket URL const getWebSocketUrl = () => { const baseUrl = process.env.SERVER_BASE_URL || 'http://localhost:8001'; // Replace http:// with ws:// or https:// with wss:// const wsBaseUrl = baseUrl.replace(/^http/, 'ws'); return `${wsBaseUrl}/ws/chat`; }; // Create WebSocket connection const ws = new WebSocket(getWebSocketUrl()); ws.onopen = () => { console.log('WebSocket connection established'); // Send the request after connection is open ws.send(JSON.stringify(requestData)); }; ``` ## Message Formats ### Request Format After establishing the connection, send a JSON message with the following structure: ```typescript interface ChatCompletionRequest { repo_url: string; // Repository URL (GitHub, GitLab, BitBucket) messages: ChatMessage[]; // Conversation history filePath?: string; // Optional: Focus on specific file token?: string; // Optional: Access token for private repos type?: string; // Repository type: 'github' | 'gitlab' | 'bitbucket' provider?: string; // AI provider: 'google' | 'openai' | 'openrouter' | 'ollama' | 'azure' model?: string; // Model name for the provider language?: string; // Response language: 'en' | 'ja' | 'zh' | 'es' | 'kr' | 'vi' excluded_dirs?: string; // Newline-separated list of directories to exclude excluded_files?: string; // Newline-separated list of file patterns to exclude included_dirs?: string; // Newline-separated list of directories to include included_files?: string; // Newline-separated list of file patterns to include } interface ChatMessage { role: 'user' | 'assistant' | 'system'; content: string; } ``` ### Response Format The server streams text responses as they're generated: ```typescript // Each WebSocket message contains a text chunk ws.onmessage = (event) => { const textChunk = event.data; // Plain text chunk console.log('Received:', textChunk); // Append to the full response fullResponse += textChunk; }; ``` ## Event Types ### Connection Events ```typescript // Connection established ws.onopen = (event) => { console.log('Connected to DeepWiki WebSocket'); // Send your request here }; // Connection closed ws.onclose = (event) => { console.log('WebSocket connection closed'); // Handle completion or reconnection logic }; // Connection error ws.onerror = (error) => { console.error('WebSocket error:', error); // Fall back to HTTP streaming }; ``` ### Message Flow 1. **Client connects** to WebSocket endpoint 2. **Client sends** JSON request after connection opens 3. **Server streams** text responses 4. **Server closes** connection when complete 5. **Client handles** close event ## Streaming Features ### Wiki Generation For wiki page generation, the WebSocket streams Markdown content in real-time: ```typescript const generateWikiPage = async (page: WikiPage) => { const ws = new WebSocket(wsUrl); await new Promise((resolve, reject) => { ws.onopen = () => { ws.send(JSON.stringify({ repo_url: repoUrl, messages: [{ role: 'user', content: generatePagePrompt(page) }], provider: selectedProvider, model: selectedModel, language: language })); }; let content = ''; ws.onmessage = (event) => { content += event.data; // Update UI with streaming content updatePageContent(content); }; ws.onclose = () => resolve(content); ws.onerror = (error) => reject(error); }); }; ``` ### Chat/Ask Feature The Ask feature uses WebSocket for real-time streaming with support for: * **Conversation History**: Maintains context across multiple questions * **Deep Research Mode**: Multi-turn research with automatic continuation * **File Context**: Include specific file content in queries ```typescript // Example: Deep Research request const deepResearchRequest = { repo_url: 'https://github.com/user/repo', messages: [ { role: 'user', content: '[DEEP RESEARCH] How does the authentication system work?' } ], provider: 'google', model: 'gemini-2.0-flash', language: 'en' }; ws.send(JSON.stringify(deepResearchRequest)); ``` ## Connection Lifecycle ### Automatic Closure The server automatically closes the WebSocket connection after: * Completing the response stream * Encountering an error * Detecting client disconnect ### Reconnection Strategy ```typescript const createResilientWebSocket = ( request: ChatCompletionRequest, maxRetries: number = 3 ) => { let retryCount = 0; const connect = () => { const ws = new WebSocket(getWebSocketUrl()); ws.onerror = (error) => { if (retryCount < maxRetries) { retryCount++; console.log(`Retrying connection (${retryCount}/${maxRetries})...`); setTimeout(connect, 1000 * retryCount); // Exponential backoff } else { console.error('Max retries reached, falling back to HTTP'); fallbackToHttp(request); } }; ws.onopen = () => { retryCount = 0; // Reset on successful connection ws.send(JSON.stringify(request)); }; return ws; }; return connect(); }; ``` ## Error Handling ### Server-Side Errors The server sends error messages as text before closing the connection: ```typescript ws.onmessage = (event) => { const message = event.data; if (message.startsWith('Error:')) { // Handle error message console.error('Server error:', message); // Error types: // - "Error: No valid document embeddings found..." // - "Error: No messages provided" // - "Error: Last message must be from the user" // - "Error preparing retriever: ..." } else { // Handle normal response processResponse(message); } }; ``` ### Client-Side Error Handling ```typescript const handleWebSocketError = (error: Event) => { console.error('WebSocket error:', error); // Fallback to HTTP streaming return fetch('/api/chat/stream', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(request) }); }; ``` ## Status Codes WebSocket connections use standard close codes: | Code | Status | Description | | ---- | ---------------- | ------------------------------ | | 1000 | Normal Closure | Request completed successfully | | 1001 | Going Away | Server is shutting down | | 1002 | Protocol Error | Invalid message format | | 1003 | Unsupported Data | Invalid request data | | 1006 | Abnormal Closure | Connection lost unexpectedly | | 1011 | Internal Error | Server encountered an error | ## Security Considerations ### Authentication For private repositories, include the access token in the request: ```typescript const secureRequest = { repo_url: 'https://github.com/org/private-repo', token: 'ghp_xxxxxxxxxxxx', // GitHub personal access token type: 'github', messages: [...] }; ``` ### Connection Security 1. **Use WSS in Production**: Always use `wss://` (WebSocket Secure) in production 2. **Token Validation**: Tokens are validated server-side before accessing repositories 3. **Origin Validation**: Consider implementing origin checks for CORS security 4. **Rate Limiting**: Implement connection rate limiting to prevent abuse ### Example Security Headers ```typescript // Server-side WebSocket upgrade with security headers app.add_websocket_route("/ws/chat", handle_websocket_chat, { headers: { 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'DENY', 'X-XSS-Protection': '1; mode=block' } }); ``` ## Complete Client Implementation Example ```typescript import { ChatCompletionRequest } from '@/types/chat'; class DeepWikiWebSocketClient { private ws: WebSocket | null = null; private messageBuffer: string = ''; async streamCompletion( request: ChatCompletionRequest, onChunk: (chunk: string) => void, onError: (error: Error) => void, onComplete: () => void ): Promise { return new Promise((resolve, reject) => { try { // Close existing connection if any this.close(); // Create new WebSocket connection this.ws = new WebSocket(this.getWebSocketUrl()); // Set binary type for potential future binary support this.ws.binaryType = 'arraybuffer'; // Connection opened this.ws.onopen = () => { console.log('WebSocket connection established'); this.ws!.send(JSON.stringify(request)); }; // Message received this.ws.onmessage = (event) => { try { const chunk = event.data; this.messageBuffer += chunk; onChunk(chunk); } catch (error) { console.error('Error processing message:', error); onError(error as Error); } }; // Connection closed this.ws.onclose = (event) => { console.log('WebSocket closed:', event.code, event.reason); onComplete(); resolve(); }; // Connection error this.ws.onerror = (error) => { console.error('WebSocket error:', error); onError(new Error('WebSocket connection failed')); reject(error); }; // Set timeout for connection setTimeout(() => { if (this.ws?.readyState === WebSocket.CONNECTING) { this.close(); reject(new Error('WebSocket connection timeout')); } }, 5000); } catch (error) { reject(error); } }); } private getWebSocketUrl(): string { const baseUrl = process.env.NEXT_PUBLIC_SERVER_BASE_URL || 'http://localhost:8001'; return baseUrl.replace(/^http/, 'ws') + '/ws/chat'; } close(): void { if (this.ws && this.ws.readyState === WebSocket.OPEN) { this.ws.close(1000, 'Client closing connection'); } this.ws = null; this.messageBuffer = ''; } getFullResponse(): string { return this.messageBuffer; } } // Usage example const client = new DeepWikiWebSocketClient(); await client.streamCompletion( { repo_url: 'https://github.com/user/repo', messages: [{ role: 'user', content: 'Explain the main functionality' }], provider: 'google', model: 'gemini-2.0-flash', language: 'en' }, (chunk) => { // Update UI with streaming chunk setResponse(prev => prev + chunk); }, (error) => { console.error('Streaming error:', error); // Fall back to HTTP }, () => { console.log('Streaming complete'); // Enable UI for next question } ); ``` ## Performance Considerations ### Advantages over HTTP Streaming 1. **Lower Latency**: No HTTP overhead for each chunk 2. **Bidirectional**: Enables future features like progress updates 3. **Connection Reuse**: Single connection for entire session 4. **Binary Support**: Can handle binary data if needed ### Best Practices 1. **Implement Fallback**: Always have HTTP streaming as fallback 2. **Handle Disconnects**: Gracefully handle unexpected disconnections 3. **Buffer Management**: Clear buffers after each completion 4. **Resource Cleanup**: Close connections when component unmounts ```typescript // React cleanup example useEffect(() => { return () => { client.close(); // Clean up WebSocket on unmount }; }, []); ``` ## Troubleshooting ### Common Issues 1. **Connection Refused** * Check if the API server is running on port 8001 * Verify WebSocket endpoint URL is correct * Check for proxy/firewall blocking WebSocket connections 2. **Immediate Disconnect** * Verify request JSON format is valid * Check for missing required fields * Ensure messages array is not empty 3. **No Response** * Check server logs for errors * Verify model provider credentials are set * Ensure repository URL is accessible ### Debug Logging Enable detailed logging for troubleshooting: ```typescript const debugWebSocket = (ws: WebSocket) => { ws.addEventListener('open', (e) => console.log('WS Open:', e)); ws.addEventListener('message', (e) => console.log('WS Message:', e.data)); ws.addEventListener('error', (e) => console.log('WS Error:', e)); ws.addEventListener('close', (e) => console.log('WS Close:', e.code, e.reason)); }; ``` ## Future Enhancements The WebSocket infrastructure enables future real-time features: * Progress indicators for long operations * Cancel/interrupt ongoing generation * Real-time collaboration features * Live repository updates * Streaming file analysis * Interactive debugging sessions # Wiki Generation API Source: https://asyncfunc.mintlify.app/endpoints/wiki-endpoints Complete API reference for DeepWikiOpen wiki generation and management endpoints ## Overview The DeepWikiOpen API provides endpoints for generating comprehensive documentation from repositories using AI-powered analysis. All endpoints require authentication and support JSON request/response formats. **Base URL**: `https://api.deepwikiopen.com/v1` **Authentication**: All endpoints require an API key in the Authorization header. *** ## POST /wiki/generate Generate comprehensive wiki documentation from a repository URL or uploaded codebase. The GitHub repository URL to analyze and generate documentation for Custom name for the project (defaults to repository name) Generation configuration options Include code examples in generated documentation Maximum directory depth to analyze (1-10) File patterns to exclude from analysis (e.g., \["*.test.js", "node\_modules/*"]) Include private/internal functions in documentation Output format: "markdown", "html", or "json" Optional webhook URL for completion notifications ### Request Example ```bash curl curl -X POST "https://api.deepwikiopen.com/v1/wiki/generate" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "repository_url": "https://github.com/username/awesome-project", "project_name": "Awesome Project", "config": { "include_code_examples": true, "max_depth": 3, "exclude_patterns": ["*.test.js", "dist/*"], "output_format": "markdown" }, "webhook_url": "https://your-domain.com/webhooks/wiki-complete" }' ``` ```python Python import requests url = "https://api.deepwikiopen.com/v1/wiki/generate" headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" } data = { "repository_url": "https://github.com/username/awesome-project", "project_name": "Awesome Project", "config": { "include_code_examples": True, "max_depth": 3, "exclude_patterns": ["*.test.js", "dist/*"], "output_format": "markdown" } } response = requests.post(url, headers=headers, json=data) result = response.json() print(f"Task ID: {result['task_id']}") ``` ```javascript JavaScript const response = await fetch('https://api.deepwikiopen.com/v1/wiki/generate', { method: 'POST', headers: { 'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json' }, body: JSON.stringify({ repository_url: 'https://github.com/username/awesome-project', project_name: 'Awesome Project', config: { include_code_examples: true, max_depth: 3, exclude_patterns: ['*.test.js', 'dist/*'], output_format: 'markdown' } }) }); const result = await response.json(); console.log('Task ID:', result.task_id); ``` ### Response Unique identifier for tracking the generation task Unique project identifier for accessing results Current task status: "queued", "processing", "completed", or "failed" Estimated completion time in seconds Whether webhook notification was successfully registered ```json Response Example { "task_id": "task_12345abcde", "project_id": "proj_awesome_project_67890", "status": "queued", "estimated_duration": 180, "webhook_registered": true, "created_at": "2024-01-15T10:30:00Z" } ``` *** ## GET /wiki/projects Retrieve a list of all processed repositories and their current status. Page number for pagination Number of projects per page (max: 100) Filter by status: "completed", "processing", "failed" Search projects by name or repository URL ### Request Example ```bash curl curl -X GET "https://api.deepwikiopen.com/v1/wiki/projects?page=1&limit=10&status=completed" \ -H "Authorization: Bearer YOUR_API_KEY" ``` ```python Python import requests url = "https://api.deepwikiopen.com/v1/wiki/projects" headers = {"Authorization": "Bearer YOUR_API_KEY"} params = { "page": 1, "limit": 10, "status": "completed" } response = requests.get(url, headers=headers, params=params) projects = response.json() ``` ```javascript JavaScript const params = new URLSearchParams({ page: '1', limit: '10', status: 'completed' }); const response = await fetch(`https://api.deepwikiopen.com/v1/wiki/projects?${params}`, { headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }); const projects = await response.json(); ``` ### Response Array of project objects Unique project identifier Display name of the project Original repository URL Current project status ISO timestamp of creation ISO timestamp of completion (if applicable) Number of files processed Number of documentation pages generated Pagination information Current page number Total number of pages Total number of projects Whether there are more pages *** ## GET /wiki/{project_id} Retrieve the complete wiki documentation for a specific project. The unique project identifier Response format: "json", "markdown", or "html" Include project metadata in response ### Request Example ```bash curl curl -X GET "https://api.deepwikiopen.com/v1/wiki/proj_awesome_project_67890?format=json" \ -H "Authorization: Bearer YOUR_API_KEY" ``` ```python Python import requests project_id = "proj_awesome_project_67890" url = f"https://api.deepwikiopen.com/v1/wiki/{project_id}" headers = {"Authorization": "Bearer YOUR_API_KEY"} params = {"format": "json", "include_metadata": True} response = requests.get(url, headers=headers, params=params) wiki_data = response.json() ``` ```javascript JavaScript const projectId = 'proj_awesome_project_67890'; const params = new URLSearchParams({ format: 'json', include_metadata: 'true' }); const response = await fetch(`https://api.deepwikiopen.com/v1/wiki/${projectId}?${params}`, { headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }); const wikiData = await response.json(); ``` ### Response Project identifier Project display name Original repository URL ISO timestamp of wiki generation Array of documentation pages Unique page identifier Page title Page content (format depends on query parameter) File path in repository Page type: "overview", "api", "guide", "reference" Display order Project metadata (if include\_metadata=true) Programming language distribution Total files processed Project complexity rating (0-10) Last commit hash from repository *** ## DELETE /wiki/{project_id} Delete a project's cached wiki data and generated documentation. The unique project identifier to delete ### Request Example ```bash curl curl -X DELETE "https://api.deepwikiopen.com/v1/wiki/proj_awesome_project_67890" \ -H "Authorization: Bearer YOUR_API_KEY" ``` ```python Python import requests project_id = "proj_awesome_project_67890" url = f"https://api.deepwikiopen.com/v1/wiki/{project_id}" headers = {"Authorization": "Bearer YOUR_API_KEY"} response = requests.delete(url, headers=headers) result = response.json() ``` ```javascript JavaScript const projectId = 'proj_awesome_project_67890'; const response = await fetch(`https://api.deepwikiopen.com/v1/wiki/${projectId}`, { method: 'DELETE', headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }); const result = await response.json(); ``` ### Response Whether the deletion was successful The deleted project identifier Confirmation message ISO timestamp of deletion ```json Response Example { "success": true, "project_id": "proj_awesome_project_67890", "message": "Project wiki data successfully deleted", "deleted_at": "2024-01-15T14:30:00Z" } ``` *** ## POST /wiki/regenerate Force regenerate wiki documentation for an existing project with updated configuration. The project identifier to regenerate Updated generation configuration (same structure as /wiki/generate) Force refresh from repository (ignore cache) Optional webhook URL for completion notifications ### Request Example ```bash curl curl -X POST "https://api.deepwikiopen.com/v1/wiki/regenerate" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "project_id": "proj_awesome_project_67890", "config": { "include_code_examples": true, "max_depth": 5, "output_format": "html" }, "force_refresh": true }' ``` ```python Python import requests url = "https://api.deepwikiopen.com/v1/wiki/regenerate" headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" } data = { "project_id": "proj_awesome_project_67890", "config": { "include_code_examples": True, "max_depth": 5, "output_format": "html" }, "force_refresh": True } response = requests.post(url, headers=headers, json=data) result = response.json() ``` ```javascript JavaScript const response = await fetch('https://api.deepwikiopen.com/v1/wiki/regenerate', { method: 'POST', headers: { 'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json' }, body: JSON.stringify({ project_id: 'proj_awesome_project_67890', config: { include_code_examples: true, max_depth: 5, output_format: 'html' }, force_refresh: true }) }); const result = await response.json(); ``` ### Response Same response structure as POST /wiki/generate. *** ## GET /wiki/status/{task_id} Check the status of a wiki generation task and get progress updates. The task identifier returned from generation request ### Request Example ```bash curl curl -X GET "https://api.deepwikiopen.com/v1/wiki/status/task_12345abcde" \ -H "Authorization: Bearer YOUR_API_KEY" ``` ```python Python import requests task_id = "task_12345abcde" url = f"https://api.deepwikiopen.com/v1/wiki/status/{task_id}" headers = {"Authorization": "Bearer YOUR_API_KEY"} response = requests.get(url, headers=headers) status = response.json() ``` ```javascript JavaScript const taskId = 'task_12345abcde'; const response = await fetch(`https://api.deepwikiopen.com/v1/wiki/status/${taskId}`, { headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }); const status = await response.json(); ``` ### Response The task identifier Associated project identifier Current status: "queued", "processing", "completed", "failed" Progress information (when status is "processing") Completion percentage (0-100) Current processing step Number of files processed so far Total files to process Estimated seconds remaining Results (when status is "completed") URL to access generated wiki Number of pages created Total processing time in seconds Error information (when status is "failed") Error code Error description Additional error details ISO timestamp when task was created ISO timestamp of last status update ```json Response Example (Processing) { "task_id": "task_12345abcde", "project_id": "proj_awesome_project_67890", "status": "processing", "progress": { "percentage": 65, "current_step": "Generating API documentation", "files_processed": 42, "total_files": 65, "estimated_remaining": 45 }, "created_at": "2024-01-15T10:30:00Z", "updated_at": "2024-01-15T10:32:15Z" } ``` *** ## Authentication All API endpoints require authentication using an API key passed in the Authorization header: ``` Authorization: Bearer YOUR_API_KEY ``` ### Getting an API Key 1. Sign up for a DeepWikiOpen account at [console.deepwikiopen.com](https://console.deepwikiopen.com) 2. Navigate to the API Keys section 3. Generate a new API key 4. Store it securely (it won't be shown again) *** ## Rate Limiting API requests are rate limited to prevent abuse: * **Standard Plan**: 100 requests per hour * **Pro Plan**: 1,000 requests per hour * **Enterprise Plan**: 10,000 requests per hour Rate limit headers are included in all responses: ```http X-RateLimit-Limit: 100 X-RateLimit-Remaining: 95 X-RateLimit-Reset: 1642248000 ``` When rate limited, you'll receive a 429 status code: ```json { "error": { "code": "RATE_LIMIT_EXCEEDED", "message": "Too many requests. Limit resets at 2024-01-15T11:00:00Z" } } ``` *** ## Error Responses All endpoints follow consistent error response format: ### HTTP Status Codes | Code | Description | | ---- | --------------------- | | 200 | Success | | 201 | Created | | 400 | Bad Request | | 401 | Unauthorized | | 403 | Forbidden | | 404 | Not Found | | 429 | Rate Limited | | 500 | Internal Server Error | | 503 | Service Unavailable | ### Error Response Format ```json { "error": { "code": "ERROR_CODE", "message": "Human-readable error message", "details": "Additional error context", "request_id": "req_12345abcde" } } ``` ### Common Error Codes | Code | Description | | ------------------------ | ---------------------------------------------- | | `INVALID_API_KEY` | API key is missing or invalid | | `PROJECT_NOT_FOUND` | Specified project doesn't exist | | `TASK_NOT_FOUND` | Specified task doesn't exist | | `INVALID_REPOSITORY_URL` | Repository URL is malformed or inaccessible | | `GENERATION_FAILED` | Wiki generation failed due to processing error | | `RATE_LIMIT_EXCEEDED` | Too many requests within time window | | `INSUFFICIENT_CREDITS` | Not enough API credits remaining | *** ## Webhooks When providing a `webhook_url` in generation requests, DeepWikiOpen will send HTTP POST notifications when tasks complete: ### Webhook Payload ```json { "event": "wiki.generation.completed", "task_id": "task_12345abcde", "project_id": "proj_awesome_project_67890", "status": "completed", "result": { "wiki_url": "https://api.deepwikiopen.com/v1/wiki/proj_awesome_project_67890", "pages_generated": 15, "processing_time": 142 }, "timestamp": "2024-01-15T10:35:00Z" } ``` ### Webhook Security Webhooks include a signature header for verification: ```http X-DeepWiki-Signature: sha256=1234567890abcdef... ``` Verify the signature using your webhook secret (available in your dashboard). *** ## SDK Examples ### Python SDK ```python from deepwikiopen import DeepWikiClient client = DeepWikiClient(api_key="YOUR_API_KEY") # Generate wiki task = client.generate_wiki( repository_url="https://github.com/username/repo", config={ "include_code_examples": True, "max_depth": 3 } ) # Wait for completion result = client.wait_for_completion(task.task_id) print(f"Wiki generated: {result.wiki_url}") ``` ### Node.js SDK ```javascript const { DeepWikiClient } = require('deepwikiopen'); const client = new DeepWikiClient({ apiKey: 'YOUR_API_KEY' }); // Generate wiki const task = await client.generateWiki({ repositoryUrl: 'https://github.com/username/repo', config: { includeCodeExamples: true, maxDepth: 3 } }); // Poll for completion const result = await client.waitForCompletion(task.taskId); console.log(`Wiki generated: ${result.wikiUrl}`); ``` *** ## Support For API support and questions: * **Documentation**: [docs.deepwikiopen.com](https://docs.deepwikiopen.com) * **Support Email**: [support@deepwikiopen.com](mailto:support@deepwikiopen.com) * **Status Page**: [status.deepwikiopen.com](https://status.deepwikiopen.com) * **Community Discord**: [discord.gg/deepwikiopen](https://discord.gg/deepwikiopen) # null Source: https://asyncfunc.mintlify.app/examples/basic-usage # Basic Usage This guide provides practical examples of using DeepWiki's core features. All examples are designed to be runnable and progressively introduce more complex functionality. ## Table of Contents 1. [Simple Wiki Generation](#simple-wiki-generation) 2. [Using the Ask Feature](#using-the-ask-feature) 3. [Using DeepResearch Feature](#using-deepresearch-feature) 4. [Working with Public Repositories](#working-with-public-repositories) 5. [Basic API Calls with curl](#basic-api-calls-with-curl) 6. [Python Client Example](#python-client-example) 7. [JavaScript/Node.js Example](#javascriptnodejs-example) 8. [Common Use Cases and Workflows](#common-use-cases-and-workflows) ## Simple Wiki Generation The most basic use of DeepWiki is generating documentation from your codebase. ### Example 1: Generate a basic wiki ```bash # Using the CLI deepwiki generate --input ./my-project --output ./wiki # With specific file types deepwiki generate --input ./my-project --output ./wiki --include "*.py,*.js" # Exclude certain directories deepwiki generate --input ./my-project --output ./wiki --exclude "node_modules,dist" ``` ### Example 2: Generate with custom configuration ```yaml # deepwiki.config.yaml input: path: ./my-project include: - "src/**/*.py" - "lib/**/*.js" exclude: - "**/test/**" - "**/__pycache__/**" output: path: ./documentation format: markdown generation: include_source_links: true max_file_size: 1MB language_support: - python - javascript ``` ```bash # Use the configuration file deepwiki generate --config deepwiki.config.yaml ``` ## Using the Ask Feature The Ask feature allows you to query your codebase using natural language. ### Example 3: Basic questions about your code ```bash # Ask about function usage deepwiki ask "How does the authentication system work?" # Ask about specific components deepwiki ask "What does the UserService class do?" # Ask about dependencies deepwiki ask "What external libraries does this project use?" ``` ### Example 4: Advanced Ask queries with context ```python from deepwiki import DeepWikiClient client = DeepWikiClient(api_key="your-api-key") # Ask with specific context response = client.ask( question="How can I add a new payment method?", context={ "files": ["src/payments/*.py", "src/models/payment.py"], "focus": "implementation details" } ) print(response.answer) print(f"Confidence: {response.confidence}") print(f"Sources: {response.sources}") ``` ## Using DeepResearch Feature DeepResearch performs in-depth analysis of your codebase to answer complex questions. ### Example 5: Basic research query ```bash # Research architectural patterns deepwiki research "What architectural patterns are used in this project?" # Research performance optimizations deepwiki research "What are the potential performance bottlenecks?" # Research security concerns deepwiki research "Are there any security vulnerabilities in the authentication flow?" ``` ### Example 6: Comprehensive research with multiple aspects ```python from deepwiki import DeepWikiClient client = DeepWikiClient(api_key="your-api-key") # Perform comprehensive research research_config = { "topic": "Database optimization opportunities", "aspects": [ "query performance", "indexing strategies", "connection pooling", "caching mechanisms" ], "depth": "comprehensive", "include_recommendations": True } result = client.deep_research(research_config) # Process results for finding in result.findings: print(f"Finding: {finding.title}") print(f"Impact: {finding.impact}") print(f"Recommendation: {finding.recommendation}") print("---") ``` ## Working with Public Repositories DeepWiki can analyze public repositories directly from GitHub. ### Example 7: Analyze a public GitHub repository ```bash # Analyze a public repository deepwiki analyze --repo https://github.com/username/repo-name # Generate documentation for a specific branch deepwiki generate --repo https://github.com/username/repo-name --branch develop # Analyze with specific focus deepwiki analyze --repo https://github.com/username/repo-name --focus "api-design" ``` ### Example 8: Batch analysis of multiple repositories ```python from deepwiki import DeepWikiClient client = DeepWikiClient(api_key="your-api-key") # Analyze multiple related repositories repos = [ "https://github.com/org/frontend", "https://github.com/org/backend", "https://github.com/org/shared-lib" ] for repo in repos: analysis = client.analyze_repository(repo) print(f"Repository: {repo}") print(f"Tech Stack: {', '.join(analysis.tech_stack)}") print(f"Complexity Score: {analysis.complexity_score}") print(f"Documentation Coverage: {analysis.doc_coverage}%") print("---") ``` ## Basic API Calls with curl ### Example 9: Generate wiki via API ```bash # Generate wiki for a repository curl -X POST https://api.deepwiki.com/v1/generate \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "repository": "https://github.com/username/repo", "output_format": "markdown", "include_diagrams": true }' # Check generation status curl -X GET https://api.deepwiki.com/v1/jobs/JOB_ID \ -H "Authorization: Bearer YOUR_API_KEY" ``` ### Example 10: Ask questions via API ```bash # Ask a question about your codebase curl -X POST https://api.deepwiki.com/v1/ask \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "question": "How does the caching system work?", "repository": "https://github.com/username/repo", "context_depth": "detailed" }' ``` ### Example 11: Deep research via API ```bash # Initiate deep research curl -X POST https://api.deepwiki.com/v1/research \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "topic": "Security vulnerabilities in API endpoints", "repository": "https://github.com/username/repo", "analysis_type": "security", "include_fixes": true }' ``` ## Python Client Example ### Example 12: Complete Python workflow ````python from deepwiki import DeepWikiClient import json # Initialize client client = DeepWikiClient( api_key="your-api-key", base_url="https://api.deepwiki.com/v1" # Optional: for self-hosted ) # 1. Generate documentation def generate_documentation(repo_path): """Generate comprehensive documentation for a repository.""" job = client.generate_wiki( input_path=repo_path, output_format="markdown", options={ "include_api_docs": True, "include_examples": True, "generate_diagrams": True, "languages": ["python", "javascript"] } ) # Wait for completion result = job.wait_for_completion(timeout=300) if result.status == "completed": print(f"Documentation generated at: {result.output_path}") return result.output_path else: print(f"Generation failed: {result.error}") return None # 2. Interactive Q&A session def interactive_qa_session(repo_path): """Run an interactive Q&A session.""" print("DeepWiki Q&A Session (type 'exit' to quit)") while True: question = input("\nYour question: ") if question.lower() == 'exit': break response = client.ask( question=question, repository=repo_path, options={ "include_code_snippets": True, "max_sources": 5 } ) print(f"\nAnswer: {response.answer}") if response.code_snippets: print("\nRelevant code:") for snippet in response.code_snippets: print(f"File: {snippet.file}") print(f"```{snippet.language}") print(snippet.code) print("```") # 3. Automated code analysis def analyze_code_quality(repo_path): """Perform automated code quality analysis.""" analysis = client.analyze( repository=repo_path, analysis_types=[ "code_quality", "security", "performance", "maintainability" ] ) # Generate report report = { "summary": analysis.summary, "scores": { "quality": analysis.quality_score, "security": analysis.security_score, "performance": analysis.performance_score, "maintainability": analysis.maintainability_score }, "issues": [ { "type": issue.type, "severity": issue.severity, "file": issue.file, "line": issue.line, "message": issue.message, "suggestion": issue.suggestion } for issue in analysis.issues ], "recommendations": analysis.recommendations } with open("analysis_report.json", "w") as f: json.dump(report, f, indent=2) return report # 4. Research complex topics def research_topic(repo_path, topic): """Conduct deep research on a specific topic.""" research = client.deep_research( topic=topic, repository=repo_path, research_config={ "depth": "comprehensive", "include_external_resources": True, "compare_with_best_practices": True, "generate_recommendations": True } ) # Process findings for finding in research.findings: print(f"\n## {finding.title}") print(f"Importance: {finding.importance}/10") print(f"\nDescription: {finding.description}") if finding.code_examples: print("\nCode Examples:") for example in finding.code_examples: print(f"```{example.language}") print(example.code) print("```") if finding.recommendations: print("\nRecommendations:") for rec in finding.recommendations: print(f"- {rec}") # Main workflow if __name__ == "__main__": repo_path = "./my-project" # Generate documentation doc_path = generate_documentation(repo_path) # Analyze code quality report = analyze_code_quality(repo_path) print(f"Overall quality score: {report['scores']['quality']}/100") # Research specific topic research_topic(repo_path, "API performance optimization strategies") # Start Q&A session interactive_qa_session(repo_path) ```` ## JavaScript/Node.js Example ### Example 13: Complete Node.js workflow ````javascript const { DeepWikiClient } = require('deepwiki-js'); // Initialize client const client = new DeepWikiClient({ apiKey: process.env.DEEPWIKI_API_KEY, baseUrl: 'https://api.deepwiki.com/v1' // Optional: for self-hosted }); // 1. Generate documentation with progress tracking async function generateDocumentation(repoPath) { console.log('Starting documentation generation...'); const job = await client.generateWiki({ inputPath: repoPath, outputFormat: 'markdown', options: { includeApiDocs: true, includeExamples: true, generateDiagrams: true, languages: ['javascript', 'typescript'] } }); // Track progress job.on('progress', (progress) => { console.log(`Progress: ${progress.percentage}% - ${progress.status}`); }); try { const result = await job.waitForCompletion(); console.log(`Documentation generated at: ${result.outputPath}`); return result; } catch (error) { console.error('Generation failed:', error); throw error; } } // 2. Real-time Q&A with streaming async function askWithStreaming(question, repoPath) { const stream = await client.askStream({ question, repository: repoPath, options: { includeCodeSnippets: true, streamResponse: true } }); console.log('\nAnswer: '); for await (const chunk of stream) { process.stdout.write(chunk.text); // Handle code snippets if (chunk.codeSnippet) { console.log(`\n\nCode from ${chunk.codeSnippet.file}:`); console.log('```' + chunk.codeSnippet.language); console.log(chunk.codeSnippet.code); console.log('```\n'); } } } // 3. Analyze repository with webhooks async function analyzeWithWebhooks(repoUrl, webhookUrl) { const analysis = await client.analyze({ repository: repoUrl, analysisTypes: ['all'], webhook: { url: webhookUrl, events: ['started', 'progress', 'completed', 'failed'] } }); console.log(`Analysis started with ID: ${analysis.id}`); // Poll for results let result; while (!result || result.status === 'processing') { await new Promise(resolve => setTimeout(resolve, 5000)); result = await client.getAnalysis(analysis.id); console.log(`Status: ${result.status}`); } return result; } // 4. Batch operations async function batchAnalyzeRepositories(repositories) { const batch = client.createBatch(); // Add operations to batch for (const repo of repositories) { batch.addOperation('analyze', { repository: repo, analysisTypes: ['code_quality', 'security'] }); } // Execute batch const results = await batch.execute(); // Process results results.forEach((result, index) => { console.log(`\nRepository: ${repositories[index]}`); if (result.success) { console.log(`Quality Score: ${result.data.qualityScore}`); console.log(`Security Score: ${result.data.securityScore}`); } else { console.log(`Failed: ${result.error}`); } }); } // 5. Research with caching class CachedResearchClient { constructor(client) { this.client = client; this.cache = new Map(); } async research(topic, repoPath, useCache = true) { const cacheKey = `${topic}:${repoPath}`; if (useCache && this.cache.has(cacheKey)) { console.log('Returning cached result'); return this.cache.get(cacheKey); } const result = await this.client.deepResearch({ topic, repository: repoPath, config: { depth: 'comprehensive', includeExternalResources: true } }); this.cache.set(cacheKey, result); return result; } } // 6. Express.js integration const express = require('express'); const app = express(); app.use(express.json()); // Wiki generation endpoint app.post('/api/generate-wiki', async (req, res) => { const { repository, format = 'markdown' } = req.body; try { const job = await client.generateWiki({ repository, outputFormat: format }); res.json({ jobId: job.id, status: 'processing', checkUrl: `/api/jobs/${job.id}` }); } catch (error) { res.status(500).json({ error: error.message }); } }); // Ask endpoint app.post('/api/ask', async (req, res) => { const { question, repository } = req.body; try { const response = await client.ask({ question, repository }); res.json({ answer: response.answer, confidence: response.confidence, sources: response.sources }); } catch (error) { res.status(500).json({ error: error.message }); } }); // Job status endpoint app.get('/api/jobs/:jobId', async (req, res) => { try { const job = await client.getJob(req.params.jobId); res.json(job); } catch (error) { res.status(404).json({ error: 'Job not found' }); } }); // Main execution async function main() { const repoPath = './my-project'; try { // Generate documentation await generateDocumentation(repoPath); // Ask questions with streaming await askWithStreaming( 'What are the main components of this application?', repoPath ); // Batch analyze multiple repos await batchAnalyzeRepositories([ 'https://github.com/org/repo1', 'https://github.com/org/repo2', 'https://github.com/org/repo3' ]); // Start API server app.listen(3000, () => { console.log('DeepWiki API server running on port 3000'); }); } catch (error) { console.error('Error:', error); process.exit(1); } } // Run if called directly if (require.main === module) { main(); } module.exports = { generateDocumentation, askWithStreaming, analyzeWithWebhooks, batchAnalyzeRepositories, CachedResearchClient }; ```` ## Common Use Cases and Workflows ### Example 14: Onboarding new developers ```bash #!/bin/bash # onboard-developer.sh # Generate comprehensive documentation deepwiki generate \ --input . \ --output ./docs/wiki \ --include-getting-started \ --include-architecture # Create developer guide deepwiki research "What should a new developer know about this codebase?" \ --output ./docs/developer-guide.md # Generate API reference deepwiki generate-api-docs \ --input ./src/api \ --output ./docs/api-reference # Create interactive tutorial deepwiki create-tutorial \ --topics "setup,first-feature,testing" \ --output ./docs/tutorial ``` ### Example 15: Continuous documentation workflow ```yaml # .github/workflows/deepwiki.yml name: DeepWiki Documentation on: push: branches: [main] pull_request: branches: [main] jobs: update-docs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Setup DeepWiki uses: deepwiki/setup-action@v1 with: api-key: ${{ secrets.DEEPWIKI_API_KEY }} - name: Generate Documentation run: | deepwiki generate \ --input . \ --output ./docs \ --config .deepwiki.yml - name: Analyze Changes if: github.event_name == 'pull_request' run: | deepwiki analyze-pr \ --base ${{ github.base_ref }} \ --head ${{ github.head_ref }} \ --comment - name: Update Wiki if: github.ref == 'refs/heads/main' run: | deepwiki publish \ --source ./docs \ --destination wiki ``` ### Example 16: API documentation generation ```python from deepwiki import DeepWikiClient from pathlib import Path client = DeepWikiClient(api_key="your-api-key") def generate_api_documentation(api_dir): """Generate comprehensive API documentation.""" # Analyze API endpoints api_analysis = client.analyze_api({ "directory": api_dir, "frameworks": ["express", "fastapi", "django"], "include_examples": True }) # Generate OpenAPI spec openapi_spec = client.generate_openapi({ "analysis": api_analysis, "version": "3.0.0", "include_schemas": True }) # Create interactive documentation interactive_docs = client.generate_interactive_docs({ "openapi_spec": openapi_spec, "theme": "modern", "include_playground": True }) # Save outputs Path("./docs/api/openapi.yaml").write_text(openapi_spec) Path("./docs/api/interactive").mkdir(parents=True, exist_ok=True) return { "openapi": "./docs/api/openapi.yaml", "interactive": "./docs/api/interactive/index.html" } ``` ### Example 17: Code migration assistant ```javascript const { DeepWikiClient } = require('deepwiki-js'); async function migrationAssistant(sourcePath, targetFramework) { const client = new DeepWikiClient({ apiKey: process.env.DEEPWIKI_API_KEY }); // Analyze current codebase const analysis = await client.analyze({ repository: sourcePath, focus: ['architecture', 'dependencies', 'patterns'] }); // Research migration path const migrationPlan = await client.deepResearch({ topic: `Migrating from ${analysis.mainFramework} to ${targetFramework}`, context: analysis, includeStepByStep: true }); // Generate migration checklist const checklist = await client.generateChecklist({ plan: migrationPlan, includeTimeEstimates: true, includeDependencies: true }); // Create migration documentation const docs = await client.generateMigrationDocs({ currentState: analysis, targetState: targetFramework, plan: migrationPlan, checklist: checklist }); return { analysis, plan: migrationPlan, checklist, documentation: docs }; } // Usage migrationAssistant('./legacy-app', 'react') .then(result => { console.log('Migration plan generated successfully'); console.log(`Total steps: ${result.checklist.steps.length}`); console.log(`Estimated time: ${result.checklist.totalHours} hours`); }); ``` ### Example 18: Security audit workflow ```python import asyncio from deepwiki import DeepWikiClient async def security_audit(repo_path): """Perform comprehensive security audit.""" client = DeepWikiClient(api_key="your-api-key") # 1. Initial security scan scan_result = await client.security_scan({ "repository": repo_path, "scan_types": [ "dependency_vulnerabilities", "code_vulnerabilities", "configuration_issues", "secret_detection" ] }) # 2. Deep research on vulnerabilities vulnerabilities = [] for vuln in scan_result.vulnerabilities: research = await client.deep_research({ "topic": f"Fix for {vuln.type}: {vuln.description}", "context": { "file": vuln.file, "line": vuln.line, "severity": vuln.severity } }) vulnerabilities.append({ "vulnerability": vuln, "research": research, "fix_recommendation": research.recommendations[0] }) # 3. Generate security report report = await client.generate_security_report({ "scan_result": scan_result, "vulnerability_research": vulnerabilities, "include_remediation": True, "format": "html" }) # 4. Create fix PRs for critical issues for vuln in vulnerabilities: if vuln['vulnerability'].severity == 'critical': await client.create_fix_pr({ "vulnerability": vuln['vulnerability'], "fix": vuln['fix_recommendation'], "branch_name": f"fix-{vuln['vulnerability'].id}", "pr_title": f"Security: Fix {vuln['vulnerability'].type}", "pr_body": report.get_section(vuln['vulnerability'].id) }) return report # Run security audit if __name__ == "__main__": report = asyncio.run(security_audit("./my-app")) print(f"Security audit complete. Report saved to: {report.path}") ``` ## Best Practices 1. **Start Simple**: Begin with basic wiki generation before moving to advanced features 2. **Use Configuration Files**: Store settings in `deepwiki.config.yaml` for consistency 3. **Leverage Caching**: Cache results for frequently asked questions or repeated analyses 4. **Batch Operations**: Group multiple operations together for better performance 5. **Monitor Progress**: Use progress callbacks for long-running operations 6. **Handle Errors Gracefully**: Implement proper error handling and retries 7. **Secure API Keys**: Never hardcode API keys; use environment variables 8. **Regular Updates**: Keep documentation in sync with code changes using CI/CD ## Next Steps * Explore [Advanced Features](./advanced-features.mdx) for more complex scenarios * Check out [API Reference](../api-reference/index.mdx) for detailed API documentation * Join our [Community](https://deepwiki.com/community) for support and updates # Custom Models Source: https://asyncfunc.mintlify.app/examples/custom-models Configure and use custom AI models with DeepWiki # Using Custom Models with DeepWiki DeepWiki supports a wide range of AI models through various providers. This guide covers how to configure and use custom models for optimal performance and cost efficiency. ## Overview DeepWiki's flexible architecture allows you to use models from: * OpenRouter (access to 100+ models) * Ollama (local models) * Azure OpenAI * Any OpenAI-compatible endpoint * Custom API endpoints ## OpenRouter Integration OpenRouter provides access to multiple model providers through a single API. ### Configuration ```json // generator.json { "provider": "openrouter", "apiKey": "YOUR_OPENROUTER_API_KEY", "model": "anthropic/claude-3-opus", "baseURL": "https://openrouter.ai/api/v1", "headers": { "HTTP-Referer": "https://yourapp.com", "X-Title": "DeepWiki" } } ``` ### Available Models Popular models on OpenRouter: * `anthropic/claude-3-opus` - Best for complex reasoning * `anthropic/claude-3-sonnet` - Balanced performance/cost * `openai/gpt-4-turbo` - Latest GPT-4 variant * `google/gemini-pro` - Google's latest model * `meta-llama/llama-3-70b` - Open source alternative ### Usage Example ```typescript // app/lib/ai/generator.ts import { OpenRouter } from '@openrouter/sdk'; const client = new OpenRouter({ apiKey: process.env.OPENROUTER_API_KEY, defaultHeaders: { 'HTTP-Referer': process.env.APP_URL, 'X-Title': 'DeepWiki' } }); export async function generateContent(prompt: string) { const response = await client.chat.completions.create({ model: 'anthropic/claude-3-opus', messages: [{ role: 'user', content: prompt }], temperature: 0.7, max_tokens: 4000 }); return response.choices[0].message.content; } ``` ## Ollama for Local Models Run models locally for privacy and zero API costs. ### Installation ```bash # Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Pull models ollama pull llama3 ollama pull mistral ollama pull codellama ``` ### Configuration ```json // generator.json { "provider": "ollama", "baseURL": "http://localhost:11434", "model": "llama3:70b", "options": { "temperature": 0.7, "num_predict": 4096 } } ``` ### Integration ```typescript // app/lib/ai/ollama-provider.ts export class OllamaProvider { private baseURL: string; constructor(baseURL = 'http://localhost:11434') { this.baseURL = baseURL; } async generate(prompt: string, model = 'llama3') { const response = await fetch(`${this.baseURL}/api/generate`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model, prompt, stream: false, options: { temperature: 0.7, num_predict: 4096 } }) }); const data = await response.json(); return data.response; } } ``` ## Azure OpenAI Configuration Use Azure's enterprise-grade OpenAI deployment. ### Setup ```json // generator.json { "provider": "azure-openai", "apiKey": "YOUR_AZURE_API_KEY", "baseURL": "https://YOUR_RESOURCE.openai.azure.com", "apiVersion": "2024-02-15-preview", "deployment": "gpt-4-turbo", "model": "gpt-4-turbo" } ``` ### Environment Variables ```bash # .env.local AZURE_OPENAI_API_KEY=your_key_here AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com AZURE_OPENAI_DEPLOYMENT=gpt-4-turbo AZURE_OPENAI_API_VERSION=2024-02-15-preview ``` ### Implementation ```typescript // app/lib/ai/azure-provider.ts import { AzureOpenAI } from '@azure/openai'; const client = new AzureOpenAI({ apiKey: process.env.AZURE_OPENAI_API_KEY, endpoint: process.env.AZURE_OPENAI_ENDPOINT, apiVersion: process.env.AZURE_OPENAI_API_VERSION }); export async function generateWithAzure(prompt: string) { const result = await client.getChatCompletions( process.env.AZURE_OPENAI_DEPLOYMENT, [{ role: 'user', content: prompt }], { temperature: 0.7, maxTokens: 4000 } ); return result.choices[0].message?.content; } ``` ## Custom Model Selection UI Implement a model selector in your DeepWiki interface. ### Model Selector Component ```tsx // app/components/model-selector.tsx import { useState } from 'react'; import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from '@/components/ui/select'; const AVAILABLE_MODELS = [ { id: 'gpt-4-turbo', name: 'GPT-4 Turbo', provider: 'openai' }, { id: 'claude-3-opus', name: 'Claude 3 Opus', provider: 'anthropic' }, { id: 'llama3:70b', name: 'Llama 3 70B', provider: 'ollama' }, { id: 'mistral-large', name: 'Mistral Large', provider: 'mistral' } ]; export function ModelSelector({ onModelChange }: { onModelChange: (model: string) => void }) { const [selectedModel, setSelectedModel] = useState('gpt-4-turbo'); const handleChange = (value: string) => { setSelectedModel(value); onModelChange(value); }; return ( ); } ``` ### Dynamic Model Configuration ```typescript // app/lib/ai/model-config.ts export interface ModelConfig { provider: string; model: string; apiKey?: string; baseURL?: string; temperature?: number; maxTokens?: number; } export const MODEL_CONFIGS: Record = { 'gpt-4-turbo': { provider: 'openai', model: 'gpt-4-turbo-preview', temperature: 0.7, maxTokens: 4000 }, 'claude-3-opus': { provider: 'openrouter', model: 'anthropic/claude-3-opus', baseURL: 'https://openrouter.ai/api/v1', temperature: 0.7, maxTokens: 4000 }, 'llama3:70b': { provider: 'ollama', model: 'llama3:70b', baseURL: 'http://localhost:11434', temperature: 0.8, maxTokens: 4096 } }; ``` ## Modifying generator.json The `generator.json` file controls model configuration. ### Basic Structure ```json { "provider": "openai", "model": "gpt-4-turbo", "apiKey": "${OPENAI_API_KEY}", "temperature": 0.7, "maxTokens": 4000, "systemPrompt": "You are a helpful wiki content generator...", "retryAttempts": 3, "retryDelay": 1000 } ``` ### Multi-Provider Configuration ```json { "providers": { "primary": { "provider": "openai", "model": "gpt-4-turbo", "apiKey": "${OPENAI_API_KEY}" }, "fallback": { "provider": "openrouter", "model": "meta-llama/llama-3-70b", "apiKey": "${OPENROUTER_API_KEY}", "baseURL": "https://openrouter.ai/api/v1" }, "local": { "provider": "ollama", "model": "llama3", "baseURL": "http://localhost:11434" } }, "strategy": "fallback", "timeout": 30000 } ``` ## OpenAI-Compatible Endpoints Many providers offer OpenAI-compatible APIs. ### Generic Configuration ```typescript // app/lib/ai/openai-compatible.ts export class OpenAICompatibleProvider { private apiKey: string; private baseURL: string; constructor(config: { apiKey: string; baseURL: string }) { this.apiKey = config.apiKey; this.baseURL = config.baseURL; } async chat(messages: any[], options: any = {}) { const response = await fetch(`${this.baseURL}/v1/chat/completions`, { method: 'POST', headers: { 'Authorization': `Bearer ${this.apiKey}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ messages, ...options }) }); return response.json(); } } ``` ### Supported Providers * **Perplexity AI**: `https://api.perplexity.ai` * **Together AI**: `https://api.together.xyz/v1` * **Anyscale**: `https://api.endpoints.anyscale.com/v1` * **Groq**: `https://api.groq.com/openai/v1` ## Performance Comparisons ### Benchmark Results | Model | Tokens/Second | Quality Score | Cost/1M Tokens | | ------------------- | ------------- | ------------- | -------------- | | GPT-4 Turbo | 50 | 9.5/10 | \$10.00 | | Claude 3 Opus | 40 | 9.3/10 | \$15.00 | | Llama 3 70B (Local) | 30 | 8.5/10 | \$0.00 | | Mistral Large | 60 | 8.8/10 | \$8.00 | | GPT-3.5 Turbo | 80 | 7.5/10 | \$0.50 | ### Performance Testing Script ```typescript // scripts/benchmark-models.ts async function benchmarkModel(provider: any, prompt: string) { const startTime = Date.now(); let tokens = 0; try { const response = await provider.generate(prompt); tokens = response.usage?.total_tokens || 0; const duration = Date.now() - startTime; return { duration, tokens, tokensPerSecond: tokens / (duration / 1000), cost: calculateCost(provider.model, tokens) }; } catch (error) { return { error: error.message }; } } ``` ## Cost Optimization Strategies ### 1. Model Cascading Use cheaper models first, escalate to expensive ones only when needed. ```typescript // app/lib/ai/cascade-strategy.ts export async function generateWithCascade(prompt: string, complexity: 'low' | 'medium' | 'high') { const models = { low: 'gpt-3.5-turbo', medium: 'claude-3-sonnet', high: 'gpt-4-turbo' }; const model = models[complexity]; return await generate(prompt, { model }); } ``` ### 2. Caching Responses ```typescript // app/lib/ai/cache-manager.ts import { Redis } from '@upstash/redis'; const redis = new Redis({ url: process.env.UPSTASH_REDIS_URL, token: process.env.UPSTASH_REDIS_TOKEN }); export async function getCachedOrGenerate( prompt: string, generator: () => Promise ) { const cacheKey = `ai:${createHash('sha256').update(prompt).digest('hex')}`; // Check cache const cached = await redis.get(cacheKey); if (cached) return cached; // Generate and cache const result = await generator(); await redis.set(cacheKey, result, { ex: 3600 }); // 1 hour TTL return result; } ``` ### 3. Batch Processing ```typescript // app/lib/ai/batch-processor.ts export async function processBatch(prompts: string[], model: string) { const batchSize = 10; const results = []; for (let i = 0; i < prompts.length; i += batchSize) { const batch = prompts.slice(i, i + batchSize); const batchResults = await Promise.all( batch.map(prompt => generate(prompt, { model })) ); results.push(...batchResults); } return results; } ``` ### 4. Token Optimization ```typescript // app/lib/ai/token-optimizer.ts export function optimizePrompt(prompt: string, maxTokens: number = 2000) { // Remove unnecessary whitespace let optimized = prompt.replace(/\s+/g, ' ').trim(); // Truncate if too long const encoder = new GPT3Tokenizer({ type: 'gpt3' }); const tokens = encoder.encode(optimized); if (tokens.length > maxTokens) { const truncated = tokens.slice(0, maxTokens); optimized = encoder.decode(truncated); } return optimized; } ``` ## Best Practices ### 1. Error Handling ```typescript export async function generateWithRetry( prompt: string, options: any, maxRetries = 3 ) { for (let i = 0; i < maxRetries; i++) { try { return await generate(prompt, options); } catch (error) { if (i === maxRetries - 1) throw error; await new Promise(resolve => setTimeout(resolve, 1000 * (i + 1))); } } } ``` ### 2. Model Selection Logic ```typescript export function selectOptimalModel(requirements: { maxCost?: number; minQuality?: number; maxLatency?: number; }) { const models = getAvailableModels(); return models .filter(m => m.costPer1M <= (requirements.maxCost || Infinity)) .filter(m => m.qualityScore >= (requirements.minQuality || 0)) .filter(m => m.avgLatency <= (requirements.maxLatency || Infinity)) .sort((a, b) => b.qualityScore - a.qualityScore)[0]; } ``` ### 3. Monitoring and Logging ```typescript export async function trackModelUsage( model: string, tokens: number, duration: number ) { await db.modelUsage.create({ data: { model, tokens, duration, cost: calculateCost(model, tokens), timestamp: new Date() } }); } ``` ## Conclusion DeepWiki's flexible model system allows you to optimize for your specific needs: * Use **OpenRouter** for access to multiple models * Deploy **Ollama** for privacy and zero API costs * Choose **Azure OpenAI** for enterprise requirements * Implement **cascading strategies** for cost optimization * Monitor usage and performance to make informed decisions Remember to regularly review your model usage and costs to ensure you're using the most appropriate models for your use case. # null Source: https://asyncfunc.mintlify.app/examples/enterprise-setup # Enterprise Deployment Guide This guide covers comprehensive enterprise deployment strategies for DeepWikiOpen, focusing on security, scalability, and compliance requirements for production environments. ## Table of Contents 1. [Private API Channel Configuration](#private-api-channel-configuration) 2. [Custom Base URL Setup](#custom-base-url-setup) 3. [Authorization Mode Configuration](#authorization-mode-configuration) 4. [Multi-User Deployment Strategies](#multi-user-deployment-strategies) 5. [Security Hardening](#security-hardening) 6. [Corporate Authentication Integration](#corporate-authentication-integration) 7. [Monitoring and Logging](#monitoring-and-logging) 8. [High Availability Configuration](#high-availability-configuration) 9. [Backup and Disaster Recovery](#backup-and-disaster-recovery) 10. [Compliance Considerations](#compliance-considerations) ## Private API Channel Configuration Configure DeepWikiOpen to use private API channels for enhanced security and control. ### API Gateway Setup ```yaml # api-gateway-config.yaml apiVersion: v1 kind: ConfigMap metadata: name: deepwikiopen-api-config data: api-config.json: | { "channels": { "private": { "endpoint": "https://api-internal.company.com/deepwiki", "protocol": "https", "port": 443, "timeout": 30000, "retryPolicy": { "maxRetries": 3, "backoffMultiplier": 2 } } }, "routing": { "defaultChannel": "private", "fallbackEnabled": false } } ``` ### Environment Configuration ```bash # .env.production DEEPWIKI_API_CHANNEL=private DEEPWIKI_API_ENDPOINT=https://api-internal.company.com/deepwiki DEEPWIKI_API_KEY_ROTATION_DAYS=30 DEEPWIKI_API_RATE_LIMIT=1000 DEEPWIKI_API_BURST_LIMIT=2000 ``` ### Network Isolation ```yaml # network-policy.yaml apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: deepwikiopen-api-isolation spec: podSelector: matchLabels: app: deepwikiopen policyTypes: - Ingress - Egress egress: - to: - namespaceSelector: matchLabels: name: api-gateway ports: - protocol: TCP port: 443 ``` ## Custom Base URL Setup Configure custom base URLs for corporate proxy environments. ### Proxy Configuration ```javascript // config/proxy.js module.exports = { production: { baseUrl: process.env.CORPORATE_BASE_URL || 'https://deepwiki.company.com', proxy: { host: process.env.PROXY_HOST || 'proxy.company.com', port: process.env.PROXY_PORT || 8080, auth: { username: process.env.PROXY_USERNAME, password: process.env.PROXY_PASSWORD }, protocol: 'https' }, bypassList: [ 'localhost', '127.0.0.1', '.company.internal' ] } } ``` ### Reverse Proxy Setup (NGINX) ```nginx # nginx.conf upstream deepwikiopen_backend { least_conn; server backend1.internal:8080 weight=5; server backend2.internal:8080 weight=5; server backend3.internal:8080 weight=5; keepalive 32; keepalive_requests 100; keepalive_timeout 60s; } server { listen 443 ssl http2; server_name deepwiki.company.com; ssl_certificate /etc/nginx/certs/company.crt; ssl_certificate_key /etc/nginx/certs/company.key; ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers HIGH:!aNULL:!MD5; location / { proxy_pass https://deepwikiopen_backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; # Corporate headers proxy_set_header X-Company-User $http_x_company_user; proxy_set_header X-Company-Department $http_x_company_department; # Security headers add_header X-Content-Type-Options nosniff; add_header X-Frame-Options DENY; add_header X-XSS-Protection "1; mode=block"; add_header Strict-Transport-Security "max-age=31536000; includeSubDomains"; } } ``` ## Authorization Mode Configuration Implement robust authorization modes for enterprise environments. ### RBAC Configuration ```yaml # rbac-config.yaml apiVersion: v1 kind: ConfigMap metadata: name: deepwikiopen-rbac data: roles.json: | { "roles": { "admin": { "permissions": ["read", "write", "delete", "admin"], "resources": ["*"] }, "developer": { "permissions": ["read", "write"], "resources": ["projects/*", "documents/*"] }, "viewer": { "permissions": ["read"], "resources": ["documents/*", "public/*"] } }, "defaultRole": "viewer", "superAdmins": ["admin@company.com"] } ``` ### OAuth2/OIDC Integration ```javascript // auth/oauth-config.js const OIDCStrategy = require('passport-openidconnect').Strategy; module.exports = { strategy: new OIDCStrategy({ issuer: process.env.OIDC_ISSUER || 'https://auth.company.com', authorizationURL: 'https://auth.company.com/authorize', tokenURL: 'https://auth.company.com/token', userInfoURL: 'https://auth.company.com/userinfo', clientID: process.env.OIDC_CLIENT_ID, clientSecret: process.env.OIDC_CLIENT_SECRET, callbackURL: 'https://deepwiki.company.com/auth/callback', scope: ['openid', 'profile', 'email', 'groups'] }, (issuer, profile, done) => { // Map corporate groups to DeepWiki roles const roles = mapCorporateGroups(profile.groups); return done(null, { id: profile.id, email: profile.email, roles: roles, department: profile.department }); }), sessionConfig: { secret: process.env.SESSION_SECRET, resave: false, saveUninitialized: false, cookie: { secure: true, httpOnly: true, maxAge: 8 * 60 * 60 * 1000, // 8 hours sameSite: 'strict' } } }; ``` ### API Key Management ```javascript // auth/api-key-manager.js class EnterpriseAPIKeyManager { constructor(config) { this.rotationDays = config.rotationDays || 30; this.keyVault = config.keyVault; } async rotateKeys() { const keys = await this.keyVault.listKeys(); const now = Date.now(); for (const key of keys) { if (now - key.createdAt > this.rotationDays * 24 * 60 * 60 * 1000) { // Generate new key const newKey = await this.generateSecureKey(); // Store in vault await this.keyVault.store({ id: key.id, key: newKey, createdAt: now, previousKey: key.key, rotatedAt: now }); // Notify administrators await this.notifyKeyRotation(key.id); } } } async generateSecureKey() { const crypto = require('crypto'); return crypto.randomBytes(32).toString('base64'); } } ``` ## Multi-User Deployment Strategies ### Kubernetes Deployment ```yaml # deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: deepwikiopen labels: app: deepwikiopen spec: replicas: 3 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 selector: matchLabels: app: deepwikiopen template: metadata: labels: app: deepwikiopen spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - deepwikiopen topologyKey: kubernetes.io/hostname containers: - name: deepwikiopen image: company.registry.io/deepwikiopen:enterprise-1.0 ports: - containerPort: 8080 env: - name: NODE_ENV value: "production" - name: MULTI_TENANT_MODE value: "true" resources: requests: memory: "512Mi" cpu: "500m" limits: memory: "1Gi" cpu: "1000m" livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 ``` ### Multi-Tenant Configuration ```javascript // config/multi-tenant.js module.exports = { multiTenant: { enabled: true, isolation: 'database', // 'database' | 'schema' | 'row' strategy: { identification: 'subdomain', // 'subdomain' | 'header' | 'jwt' headerName: 'X-Tenant-ID', defaultTenant: 'default' }, database: { poolSize: 10, idleTimeout: 30000, connectionLimit: 100 }, cache: { ttl: 3600, checkPeriod: 600, maxKeys: 10000 } } }; ``` ## Security Hardening ### Security Headers ```javascript // middleware/security.js const helmet = require('helmet'); module.exports = { configure: (app) => { app.use(helmet({ contentSecurityPolicy: { directives: { defaultSrc: ["'self'"], styleSrc: ["'self'", "'unsafe-inline'"], scriptSrc: ["'self'"], imgSrc: ["'self'", "data:", "https:"], connectSrc: ["'self'", "wss:", "https:"], fontSrc: ["'self'"], objectSrc: ["'none'"], mediaSrc: ["'self'"], frameSrc: ["'none'"], }, }, hsts: { maxAge: 31536000, includeSubDomains: true, preload: true } })); // Additional security middleware app.use(require('./rate-limiter')); app.use(require('./input-validation')); app.use(require('./sql-injection-prevention')); } }; ``` ### Encryption at Rest ```yaml # encryption-config.yaml apiVersion: v1 kind: EncryptionConfiguration resources: - resources: - secrets - configmaps providers: - aescbc: keys: - name: key1 secret: - identity: {} ``` ### Network Security ```yaml # pod-security-policy.yaml apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: deepwikiopen-psp spec: privileged: false allowPrivilegeEscalation: false requiredDropCapabilities: - ALL volumes: - 'configMap' - 'emptyDir' - 'projected' - 'secret' - 'downwardAPI' - 'persistentVolumeClaim' hostNetwork: false hostIPC: false hostPID: false runAsUser: rule: 'MustRunAsNonRoot' seLinux: rule: 'RunAsAny' supplementalGroups: rule: 'RunAsAny' fsGroup: rule: 'RunAsAny' readOnlyRootFilesystem: true ``` ## Corporate Authentication Integration ### LDAP Integration ```javascript // auth/ldap-config.js const LdapAuth = require('ldapauth-fork'); module.exports = { ldap: new LdapAuth({ url: process.env.LDAP_URL || 'ldaps://ldap.company.com:636', bindDN: process.env.LDAP_BIND_DN, bindCredentials: process.env.LDAP_BIND_PASSWORD, searchBase: process.env.LDAP_SEARCH_BASE || 'ou=users,dc=company,dc=com', searchFilter: '(&(objectClass=user)(sAMAccountName={{username}}))', searchAttributes: ['displayName', 'mail', 'memberOf', 'department'], groupSearchBase: 'ou=groups,dc=company,dc=com', groupSearchFilter: '(&(objectClass=group)(member={{dn}}))', groupSearchAttributes: ['cn', 'description'], cache: true, cacheTime: 3600000, // 1 hour tlsOptions: { ca: [fs.readFileSync('./certs/company-ca.crt')] } }), mapUserRoles: (ldapUser) => { const roles = []; if (ldapUser.memberOf) { const groups = Array.isArray(ldapUser.memberOf) ? ldapUser.memberOf : [ldapUser.memberOf]; groups.forEach(group => { if (group.includes('CN=DeepWiki-Admins')) roles.push('admin'); if (group.includes('CN=DeepWiki-Developers')) roles.push('developer'); if (group.includes('CN=DeepWiki-Users')) roles.push('viewer'); }); } return roles.length > 0 ? roles : ['viewer']; } }; ``` ### SAML Integration ```javascript // auth/saml-config.js const saml = require('passport-saml'); module.exports = { strategy: new saml.Strategy({ callbackUrl: 'https://deepwiki.company.com/auth/saml/callback', entryPoint: process.env.SAML_ENTRY_POINT, issuer: 'deepwikiopen', cert: fs.readFileSync('./certs/saml-cert.pem', 'utf-8'), privateKey: fs.readFileSync('./certs/saml-key.pem', 'utf-8'), identifierFormat: 'urn:oasis:names:tc:SAML:2.0:nameid-format:persistent', wantAssertionsSigned: true, signatureAlgorithm: 'sha256' }, (profile, done) => { return done(null, { id: profile.nameID, email: profile.email, displayName: profile.displayName, department: profile['http://schemas.company.com/claims/department'], roles: profile['http://schemas.company.com/claims/roles'] }); }) }; ``` ## Monitoring and Logging ### Prometheus Metrics ```javascript // monitoring/metrics.js const prometheus = require('prom-client'); // Custom metrics const httpRequestDuration = new prometheus.Histogram({ name: 'deepwiki_http_request_duration_seconds', help: 'Duration of HTTP requests in seconds', labelNames: ['method', 'route', 'status_code'], buckets: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10] }); const activeUsers = new prometheus.Gauge({ name: 'deepwiki_active_users', help: 'Number of active users', labelNames: ['tenant', 'department'] }); const apiCallsTotal = new prometheus.Counter({ name: 'deepwiki_api_calls_total', help: 'Total number of API calls', labelNames: ['endpoint', 'method', 'status'] }); // Export metrics endpoint module.exports = { endpoint: '/metrics', register: prometheus.register, metrics: { httpRequestDuration, activeUsers, apiCallsTotal } }; ``` ### ELK Stack Configuration ```yaml # filebeat.yml filebeat.inputs: - type: container paths: - /var/lib/docker/containers/*/*.log processors: - add_kubernetes_metadata: host: ${NODE_NAME} matchers: - logs_path: logs_path: "/var/lib/docker/containers/" multiline.pattern: '^\d{4}-\d{2}-\d{2}' multiline.negate: true multiline.match: after output.elasticsearch: hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}'] username: ${ELASTICSEARCH_USERNAME} password: ${ELASTICSEARCH_PASSWORD} indices: - index: "deepwiki-%{[agent.version]}-%{+yyyy.MM.dd}" when.contains: kubernetes.labels.app: "deepwikiopen" processors: - add_host_metadata: when.not.contains.tags: forwarded - add_docker_metadata: ~ - add_kubernetes_metadata: ~ ``` ### Application Performance Monitoring ```javascript // monitoring/apm.js const apm = require('elastic-apm-node'); module.exports = { init: () => { apm.start({ serviceName: 'deepwikiopen', secretToken: process.env.APM_SECRET_TOKEN, serverUrl: process.env.APM_SERVER_URL, environment: process.env.NODE_ENV, captureBody: 'all', captureHeaders: true, transactionSampleRate: 0.1, spanFramesMinDuration: '5ms', usePathAsTransactionName: true, ignoreUrls: ['/health', '/metrics'], errorOnAbortedRequests: true, captureErrorLogStackTraces: 'always', logLevel: 'info' }); } }; ``` ## High Availability Configuration ### Database Clustering ```yaml # postgres-ha.yaml apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: name: deepwiki-postgres-cluster spec: instances: 3 primaryUpdateStrategy: unsupervised postgresql: parameters: max_connections: "200" shared_buffers: "256MB" effective_cache_size: "1GB" maintenance_work_mem: "64MB" checkpoint_completion_target: "0.9" wal_buffers: "16MB" default_statistics_target: "100" random_page_cost: "1.1" effective_io_concurrency: "200" work_mem: "4MB" min_wal_size: "1GB" max_wal_size: "2GB" bootstrap: initdb: database: deepwiki owner: deepwiki_user secret: name: deepwiki-db-secret monitoring: enabled: true customQueriesConfigMap: - name: deepwiki-metrics key: queries.yaml ``` ### Redis Sentinel Configuration ```yaml # redis-sentinel.yaml apiVersion: v1 kind: ConfigMap metadata: name: redis-sentinel-config data: sentinel.conf: | port 26379 bind 0.0.0.0 sentinel announce-ip ${HOSTNAME}.redis-sentinel sentinel down-after-milliseconds mymaster 5000 sentinel parallel-syncs mymaster 1 sentinel failover-timeout mymaster 60000 sentinel auth-pass mymaster ${REDIS_PASSWORD} sentinel monitor mymaster redis-master 6379 2 ``` ### Load Balancer Configuration ```yaml # service.yaml apiVersion: v1 kind: Service metadata: name: deepwikiopen-lb annotations: service.beta.kubernetes.io/aws-load-balancer-type: "nlb" service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true" spec: type: LoadBalancer selector: app: deepwikiopen ports: - port: 443 targetPort: 8080 protocol: TCP sessionAffinity: ClientIP sessionAffinityConfig: clientIP: timeoutSeconds: 10800 ``` ## Backup and Disaster Recovery ### Automated Backup Strategy ```yaml # velero-backup.yaml apiVersion: velero.io/v1 kind: Schedule metadata: name: deepwiki-daily-backup spec: schedule: "0 2 * * *" template: ttl: 720h0m0s includedNamespaces: - deepwikiopen includedResources: - '*' storageLocation: deepwiki-backup-location volumeSnapshotLocations: - deepwiki-snapshots hooks: resources: - name: database-backup includedNamespaces: - deepwikiopen labelSelector: matchLabels: app: postgres pre: - exec: container: postgres command: - /bin/bash - -c - pg_dump -U $POSTGRES_USER -d $POSTGRES_DB > /backup/dump.sql ``` ### Disaster Recovery Plan ```javascript // dr/recovery-plan.js class DisasterRecoveryPlan { constructor(config) { this.rto = config.rto || '4 hours'; // Recovery Time Objective this.rpo = config.rpo || '1 hour'; // Recovery Point Objective } async executeRecovery(disaster) { const steps = [ this.assessDamage(disaster), this.activateSecondaryRegion(), this.restoreFromBackup(), this.validateDataIntegrity(), this.switchTraffic(), this.notifyStakeholders() ]; for (const step of steps) { await step; await this.logRecoveryStep(step); } } async testDRPlan() { // Automated DR testing const scenarios = [ 'database-failure', 'region-outage', 'data-corruption', 'security-breach' ]; for (const scenario of scenarios) { await this.simulateDisaster(scenario); await this.measureRecoveryMetrics(); } } } ``` ### Cross-Region Replication ```yaml # cross-region-sync.yaml apiVersion: batch/v1 kind: CronJob metadata: name: cross-region-sync spec: schedule: "*/15 * * * *" jobTemplate: spec: template: spec: containers: - name: sync image: company.registry.io/deepwiki-sync:latest env: - name: SOURCE_REGION value: "us-east-1" - name: TARGET_REGION value: "us-west-2" - name: SYNC_MODE value: "incremental" command: - /bin/sh - -c - | rsync -avz --delete \ --exclude='temp/*' \ --exclude='cache/*' \ /data/source/ /data/target/ ``` ## Compliance Considerations ### GDPR Compliance ```javascript // compliance/gdpr.js class GDPRCompliance { constructor() { this.consentManager = new ConsentManager(); this.dataProcessor = new PersonalDataProcessor(); } async handleDataRequest(userId, requestType) { switch (requestType) { case 'access': return await this.exportUserData(userId); case 'rectification': return await this.correctUserData(userId); case 'erasure': return await this.deleteUserData(userId); case 'portability': return await this.exportPortableData(userId); default: throw new Error('Invalid request type'); } } async anonymizeData(data) { // Implement data anonymization const anonymized = { ...data, email: this.hashEmail(data.email), name: 'ANONYMIZED', ipAddress: this.anonymizeIP(data.ipAddress) }; return anonymized; } } ``` ### Audit Logging ```javascript // compliance/audit-logger.js class AuditLogger { constructor(config) { this.storage = config.storage; this.encryption = config.encryption; } async log(event) { const auditEntry = { timestamp: new Date().toISOString(), userId: event.userId, action: event.action, resource: event.resource, result: event.result, ipAddress: event.ipAddress, userAgent: event.userAgent, department: event.department, metadata: event.metadata }; // Encrypt sensitive data const encrypted = await this.encryption.encrypt(auditEntry); // Store with integrity check await this.storage.store({ data: encrypted, hash: this.calculateHash(encrypted), signature: await this.sign(encrypted) }); } async generateComplianceReport(startDate, endDate) { const entries = await this.storage.query({ startDate, endDate }); return { summary: this.summarizeActivity(entries), accessPatterns: this.analyzeAccess(entries), anomalies: this.detectAnomalies(entries), privilegedActions: this.filterPrivileged(entries) }; } } ``` ### Data Retention Policies ```yaml # data-retention-policy.yaml apiVersion: v1 kind: ConfigMap metadata: name: data-retention-policy data: policy.json: | { "policies": { "user-data": { "retentionDays": 2555, "archiveAfterDays": 365, "deletePersonalInfoAfterDays": 1095 }, "audit-logs": { "retentionDays": 2555, "immutable": true, "compressionAfterDays": 90 }, "application-logs": { "retentionDays": 90, "compressionAfterDays": 7 }, "metrics": { "retentionDays": 365, "aggregationRules": { "1m": 24, "5m": 168, "1h": 730 } } } } ``` ## Security Best Practices Summary 1. **Zero Trust Architecture**: Never trust, always verify 2. **Principle of Least Privilege**: Grant minimum required permissions 3. **Defense in Depth**: Multiple layers of security controls 4. **Regular Security Audits**: Quarterly penetration testing 5. **Incident Response Plan**: Documented and tested procedures 6. **Encryption Everywhere**: In transit and at rest 7. **Continuous Monitoring**: Real-time threat detection 8. **Patch Management**: Automated security updates 9. **Access Reviews**: Regular audit of user permissions 10. **Security Training**: Regular employee security awareness ## Deployment Checklist * [ ] Configure private API channels * [ ] Set up custom base URLs and proxies * [ ] Implement authentication and authorization * [ ] Deploy multi-user infrastructure * [ ] Apply security hardening measures * [ ] Integrate corporate authentication * [ ] Set up monitoring and logging * [ ] Configure high availability * [ ] Implement backup strategy * [ ] Ensure compliance requirements * [ ] Test disaster recovery plan * [ ] Conduct security audit * [ ] Document operational procedures * [ ] Train operations team * [ ] Schedule regular reviews ## Support and Resources * **Enterprise Support Portal**: [https://support.deepwikiopen.enterprise](https://support.deepwikiopen.enterprise) * **Security Updates**: [https://security.deepwikiopen.enterprise](https://security.deepwikiopen.enterprise) * **Compliance Documentation**: [https://compliance.deepwikiopen.enterprise](https://compliance.deepwikiopen.enterprise) * **Operations Runbook**: [https://runbook.deepwikiopen.enterprise](https://runbook.deepwikiopen.enterprise) *** For additional enterprise features or custom requirements, please contact your DeepWikiOpen enterprise support team. # Troubleshooting Scenarios Source: https://asyncfunc.mintlify.app/examples/troubleshooting-scenarios Real-world troubleshooting examples and solutions for common DeepWikiOpen issues # Troubleshooting Scenarios This guide provides step-by-step solutions for common issues you may encounter when using DeepWikiOpen. ## 1. API Key Validation Issues ### Scenario: "Invalid API key" error when starting the application #### Symptoms ```bash Error: Invalid API key provided Status: 401 Unauthorized ``` #### Debugging Steps 1. **Verify API key format** ```bash # Check if API key is properly set echo $ANTHROPIC_API_KEY # Should start with "sk-ant-api03-" ``` 2. **Check environment file** ```bash # Ensure .env file exists ls -la .env # Verify contents cat .env | grep ANTHROPIC_API_KEY ``` 3. **Test API key directly** ```bash curl https://api.anthropic.com/v1/messages \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -H "content-type: application/json" \ -d '{ "model": "claude-3-5-sonnet-20241022", "max_tokens": 10, "messages": [{"role": "user", "content": "Hello"}] }' ``` #### Solutions ```bash # Solution 1: Export API key correctly export ANTHROPIC_API_KEY="sk-ant-api03-your-actual-key" # Solution 2: Fix .env file formatting echo 'ANTHROPIC_API_KEY="sk-ant-api03-your-actual-key"' > .env # Solution 3: Use Docker with proper env passing docker run -e ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \ deepwikiopen/app:latest # Solution 4: Check for special characters # Ensure no trailing spaces or newlines ANTHROPIC_API_KEY=$(echo "$ANTHROPIC_API_KEY" | tr -d ' \n') ``` ## 2. Connection Problems (CORS, Network) ### Scenario: CORS errors when accessing the API #### Symptoms ``` Access to XMLHttpRequest at 'http://localhost:8000/api' from origin 'http://localhost:3000' has been blocked by CORS policy ``` #### Debugging Steps 1. **Check server CORS configuration** ```python # backend/main.py from fastapi.middleware.cors import CORSMiddleware app.add_middleware( CORSMiddleware, allow_origins=["http://localhost:3000"], # Add your frontend URL allow_credentials=True, allow_methods=["*"], allow_headers=["*"], ) ``` 2. **Verify network connectivity** ```bash # Test backend is running curl http://localhost:8000/health # Check ports are not blocked netstat -an | grep -E "8000|3000" ``` 3. **Inspect browser network tab** * Open Developer Tools → Network tab * Look for preflight OPTIONS requests * Check response headers for CORS headers #### Solutions ```javascript // Solution 1: Frontend proxy configuration // frontend/package.json { "proxy": "http://localhost:8000" } // Solution 2: Use environment-specific API URLs const API_BASE = process.env.REACT_APP_API_URL || 'http://localhost:8000'; // Solution 3: nginx reverse proxy // nginx.conf server { location /api { proxy_pass http://backend:8000; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; } } ``` ## 3. Large Repository Handling ### Scenario: Application crashes when analyzing large repositories #### Symptoms ``` Error: JavaScript heap out of memory FATAL ERROR: Reached heap limit Allocation failed ``` #### Debugging Steps 1. **Monitor memory usage** ```bash # Check current memory usage docker stats # Monitor Node.js memory node --max-old-space-size=4096 server.js ``` 2. **Profile repository size** ```bash # Check repository size du -sh /path/to/repo # Count files find /path/to/repo -type f | wc -l ``` #### Solutions ```javascript // Solution 1: Implement streaming file processing const processLargeRepo = async (repoPath) => { const stream = fs.createReadStream(repoPath); const chunks = []; for await (const chunk of stream) { // Process in chunks await processChunk(chunk); // Clear memory periodically if (chunks.length > 1000) { await flushToDatabase(chunks); chunks.length = 0; } } }; // Solution 2: Use worker threads const { Worker } = require('worker_threads'); const analyzeInWorker = (filePath) => { return new Promise((resolve, reject) => { const worker = new Worker('./analyzer.js', { workerData: { filePath } }); worker.on('message', resolve); worker.on('error', reject); }); }; // Solution 3: Docker memory limits // docker-compose.yml services: app: mem_limit: 4g memswap_limit: 4g ``` ## 4. Memory and Performance Issues ### Scenario: Slow response times and high memory usage #### Debugging Steps 1. **Profile application performance** ```bash # Node.js profiling node --inspect server.js # Python profiling python -m cProfile -o profile.stats app.py ``` 2. **Monitor resource usage** ```bash # Real-time monitoring htop iotop # Docker resource usage docker stats --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}" ``` #### Solutions ```python # Solution 1: Implement caching from functools import lru_cache import redis cache = redis.Redis(host='localhost', port=6379) @lru_cache(maxsize=1000) def expensive_operation(repo_path): # Check cache first cached = cache.get(f"analysis:{repo_path}") if cached: return json.loads(cached) result = perform_analysis(repo_path) cache.set(f"analysis:{repo_path}", json.dumps(result), ex=3600) return result # Solution 2: Batch processing async def batch_analyze(files, batch_size=100): for i in range(0, len(files), batch_size): batch = files[i:i + batch_size] await asyncio.gather(*[analyze_file(f) for f in batch]) # Allow garbage collection gc.collect() ``` ## 5. Docker Container Problems ### Scenario: Container fails to start or crashes immediately #### Symptoms ``` docker: Error response from daemon: OCI runtime create failed Container exited with code 137 (Out of Memory) ``` #### Debugging Steps 1. **Check container logs** ```bash docker logs deepwikiopen-app docker logs --tail 50 -f deepwikiopen-app ``` 2. **Inspect container** ```bash docker inspect deepwikiopen-app docker exec -it deepwikiopen-app /bin/sh ``` #### Solutions ```bash # Solution 1: Rebuild with proper base image FROM node:18-alpine AS build WORKDIR /app COPY package*.json ./ RUN npm ci --only=production # Solution 2: Fix permission issues RUN addgroup -g 1001 -S nodejs RUN adduser -S nodejs -u 1001 USER nodejs # Solution 3: Health check HEALTHCHECK --interval=30s --timeout=3s --start-period=40s \ CMD node healthcheck.js || exit 1 ``` ## 6. Model Selection Errors ### Scenario: "Model not found" or incorrect model responses #### Symptoms ``` Error: Model 'claude-3-opus' not found Available models: ['claude-3-5-sonnet-20241022', 'claude-3-5-haiku-20241022'] ``` #### Debugging Steps 1. **List available models** ```python import anthropic client = anthropic.Anthropic() # Check model availability based on your API tier ``` 2. **Verify model configuration** ```javascript // config/models.js export const AVAILABLE_MODELS = { 'claude-3-5-sonnet-20241022': { maxTokens: 8192, contextWindow: 200000 }, 'claude-3-5-haiku-20241022': { maxTokens: 8192, contextWindow: 200000 } }; ``` #### Solutions ```javascript // Solution 1: Implement model fallback const getModel = (preferred) => { const fallbackChain = [ 'claude-3-5-sonnet-20241022', 'claude-3-5-haiku-20241022', 'claude-3-haiku-20240307' ]; if (AVAILABLE_MODELS[preferred]) { return preferred; } return fallbackChain.find(model => AVAILABLE_MODELS[model]); }; // Solution 2: Dynamic model selection const selectOptimalModel = (contextLength, speed = 'balanced') => { if (speed === 'fast' || contextLength < 10000) { return 'claude-3-5-haiku-20241022'; } return 'claude-3-5-sonnet-20241022'; }; ``` ## 7. Private Repository Access Issues ### Scenario: Cannot access private GitHub repositories #### Symptoms ``` Error: Repository not found or you don't have access Status: 404 ``` #### Debugging Steps 1. **Verify GitHub token** ```bash # Test token permissions curl -H "Authorization: token $GITHUB_TOKEN" \ https://api.github.com/user/repos ``` 2. **Check token scopes** ```bash curl -H "Authorization: token $GITHUB_TOKEN" \ -I https://api.github.com/user # Look for X-OAuth-Scopes header ``` #### Solutions ```bash # Solution 1: Create token with correct scopes # Go to GitHub Settings → Developer settings → Personal access tokens # Required scopes: repo, read:org (for private repos) # Solution 2: Use GitHub App authentication # github-app-auth.js const { createAppAuth } = require("@octokit/auth-app"); const auth = createAppAuth({ appId: process.env.GITHUB_APP_ID, privateKey: process.env.GITHUB_PRIVATE_KEY, installationId: process.env.GITHUB_INSTALLATION_ID, }); # Solution 3: SSH key authentication for cloning ssh-keygen -t ed25519 -C "deepwikiopen@example.com" # Add to GitHub account settings ``` ## 8. WebSocket Connection Failures ### Scenario: Real-time updates not working, WebSocket disconnects #### Symptoms ``` WebSocket connection to 'ws://localhost:8000/ws' failed Error: Connection closed before established ``` #### Debugging Steps 1. **Test WebSocket endpoint** ```javascript // Test WebSocket connection const ws = new WebSocket('ws://localhost:8000/ws'); ws.onopen = () => console.log('Connected'); ws.onerror = (error) => console.error('Error:', error); ws.onclose = (event) => console.log('Closed:', event.code, event.reason); ``` 2. **Check server WebSocket implementation** ```python # Ensure WebSocket endpoint exists @app.websocket("/ws") async def websocket_endpoint(websocket: WebSocket): await websocket.accept() # Handle connection ``` #### Solutions ```javascript // Solution 1: Implement reconnection logic class ReconnectingWebSocket { constructor(url, options = {}) { this.url = url; this.reconnectInterval = options.reconnectInterval || 1000; this.maxReconnectInterval = options.maxReconnectInterval || 30000; this.reconnectDecay = options.reconnectDecay || 1.5; this.reconnectAttempts = 0; this.connect(); } connect() { this.ws = new WebSocket(this.url); this.ws.onclose = () => { this.reconnect(); }; this.ws.onerror = (error) => { console.error('WebSocket error:', error); this.ws.close(); }; } reconnect() { this.reconnectAttempts++; const timeout = Math.min( this.reconnectInterval * Math.pow(this.reconnectDecay, this.reconnectAttempts), this.maxReconnectInterval ); setTimeout(() => this.connect(), timeout); } } // Solution 2: Use Socket.IO for better reliability const io = require('socket.io')(server, { cors: { origin: "http://localhost:3000", methods: ["GET", "POST"] }, reconnection: true, reconnectionAttempts: 5, reconnectionDelay: 1000, }); ``` ## 9. Rate Limiting Scenarios ### Scenario: API rate limit exceeded errors #### Symptoms ``` Error: Rate limit exceeded. Please retry after 30 seconds. Status: 429 Too Many Requests ``` #### Debugging Steps 1. **Monitor rate limit headers** ```javascript // Check response headers response.headers.get('X-RateLimit-Limit') response.headers.get('X-RateLimit-Remaining') response.headers.get('X-RateLimit-Reset') ``` #### Solutions ```javascript // Solution 1: Implement rate limit handling class RateLimiter { constructor(maxRequests = 50, windowMs = 60000) { this.maxRequests = maxRequests; this.windowMs = windowMs; this.requests = []; } async acquire() { const now = Date.now(); this.requests = this.requests.filter(time => now - time < this.windowMs); if (this.requests.length >= this.maxRequests) { const oldestRequest = this.requests[0]; const waitTime = this.windowMs - (now - oldestRequest); await new Promise(resolve => setTimeout(resolve, waitTime)); return this.acquire(); } this.requests.push(now); } } // Solution 2: Implement exponential backoff async function makeRequestWithRetry(fn, maxRetries = 3) { for (let i = 0; i < maxRetries; i++) { try { return await fn(); } catch (error) { if (error.status === 429 && i < maxRetries - 1) { const delay = Math.pow(2, i) * 1000; await new Promise(resolve => setTimeout(resolve, delay)); continue; } throw error; } } } // Solution 3: Queue requests const pQueue = require('p-queue').default; const queue = new pQueue({ concurrency: 2, interval: 1000, intervalCap: 10 }); ``` ## 10. Common Error Messages and Solutions ### Error: "Context length exceeded" #### Solution ```javascript // Implement context window management const truncateContext = (messages, maxTokens = 150000) => { let totalTokens = 0; const truncated = []; for (let i = messages.length - 1; i >= 0; i--) { const tokens = estimateTokens(messages[i].content); if (totalTokens + tokens > maxTokens) break; totalTokens += tokens; truncated.unshift(messages[i]); } return truncated; }; ``` ### Error: "WebContainer initialization failed" #### Solution ```javascript // Ensure proper WebContainer setup import { WebContainer } from '@webcontainer/api'; let webcontainerInstance; async function initWebContainer() { try { webcontainerInstance = await WebContainer.boot(); console.log('WebContainer initialized'); } catch (error) { console.error('Failed to initialize WebContainer:', error); // Fallback to server-side execution return initServerSideContainer(); } } ``` ### Error: "Database connection timeout" #### Solution ```python # Implement connection pooling and retry logic from sqlalchemy import create_engine from sqlalchemy.pool import QueuePool import time def create_db_engine(retry_count=3, retry_delay=5): for attempt in range(retry_count): try: engine = create_engine( DATABASE_URL, poolclass=QueuePool, pool_size=10, max_overflow=20, pool_timeout=30, pool_recycle=3600 ) # Test connection with engine.connect() as conn: conn.execute("SELECT 1") return engine except Exception as e: if attempt < retry_count - 1: time.sleep(retry_delay) continue raise e ``` ## Best Practices for Troubleshooting 1. **Enable verbose logging** ```bash export DEBUG=deepwikiopen:* export LOG_LEVEL=debug ``` 2. **Use health check endpoints** ```bash curl http://localhost:8000/health curl http://localhost:8000/api/status ``` 3. **Monitor system resources** ```bash # Create monitoring script #!/bin/bash while true; do echo "=== $(date) ===" docker stats --no-stream sleep 5 done ``` 4. **Implement comprehensive error handling** ```javascript app.use((err, req, res, next) => { console.error('Error details:', { message: err.message, stack: err.stack, timestamp: new Date().toISOString(), request: { method: req.method, url: req.url, headers: req.headers } }); res.status(err.status || 500).json({ error: err.message, timestamp: new Date().toISOString(), requestId: req.id }); }); ``` ## Getting Help If you encounter issues not covered here: 1. Check the [GitHub Issues](https://github.com/deepwikiopen/deepwikiopen/issues) 2. Enable debug logging and collect logs 3. Provide system information: ```bash node --version npm --version docker --version uname -a ``` 4. Include minimal reproduction steps 5. Join our [Discord community](https://discord.gg/deepwikiopen) for real-time support # Authorization Mode Source: https://asyncfunc.mintlify.app/getting-started/authorization-mode Secure your DeepWiki-Open instance with authorization mode - requiring authentication codes for wiki generation and API access Authorization mode provides access control for your DeepWiki-Open instance by requiring users to enter a secret authorization code before generating wikis. This is essential for production deployments, shared environments, or when you want to control who can generate documentation from repositories. ## What is Authorization Mode? Authorization mode is a simple but effective security feature that: Users must enter an authorization code in the web interface All API requests must include the authorization code Prevents unauthorized usage of AI provider quotas Tracks and logs authentication attempts ## When to Use Authorization Mode ### Recommended Scenarios **Always enable authorization mode in production environments:** * Public-facing instances * Shared hosting environments * Corporate deployments * Any instance accessible from the internet **Benefits:** * Prevents unauthorized access to AI providers * Controls API usage and costs * Maintains audit trail of access **Shared development environments:** * Team wikis for private repositories * Collaborative documentation projects * Educational environments * Demo instances with controlled access **Benefits:** * Ensures only authorized team members can generate docs * Prevents accidental quota exhaustion * Maintains consistent access control **AI provider quota protection:** * Limited API budgets * Pay-per-use providers * Rate-limited accounts * Enterprise cost control **Benefits:** * Prevents unexpected charges * Controls usage patterns * Enables usage tracking per authorization ### When NOT to Use Authorization Mode Authorization mode can be disabled for: * Local development instances * Personal single-user setups * Internal networks with existing security * Testing and experimentation environments ## Configuration ### Environment Variables Configure authorization mode using these environment variables: DEEPWIKI\_AUTH\_MODE Enable or disable authorization mode. DEEPWIKI\_AUTH\_CODE Secret authorization code required for access. ### Basic Configuration Add to your `.env` file: ```env # Enable authorization mode DEEPWIKI_AUTH_MODE=true # Set your secret authorization code DEEPWIKI_AUTH_CODE=your-secure-code-here ``` Choose a strong, unique authorization code. Avoid common passwords or easily guessable codes. Authorization mode requires a full restart: ```bash # Stop containers docker-compose down # Start with new configuration docker-compose up -d # Verify authorization mode is enabled docker-compose logs api | grep "Authorization" ``` ```bash # Stop backend API pkill -f "python -m api.main" # Stop frontend pkill -f "npm run dev" # Start backend with new configuration python -m api.main & # Start frontend npm run dev & ``` Check authorization status: ```bash curl -X GET "http://localhost:8001/auth/status" ``` Expected response: ```json { "auth_required": true } ``` Server logs should show: "Authorization mode: ENABLED" ### Advanced Configuration ```env Production Environment # Production authorization settings DEEPWIKI_AUTH_MODE=true DEEPWIKI_AUTH_CODE=prod-secure-auth-2024-XYZ789 # Additional security settings NODE_ENV=production LOG_LEVEL=WARNING LOG_SENSITIVE_DATA=false # Server configuration SERVER_BASE_URL=https://deepwiki.yourdomain.com PORT=8001 ``` ```env Development Environment # Development with light authorization DEEPWIKI_AUTH_MODE=true DEEPWIKI_AUTH_CODE=dev-auth-123 # Development settings NODE_ENV=development LOG_LEVEL=DEBUG LOG_SENSITIVE_DATA=true # Local server configuration SERVER_BASE_URL=http://localhost:8001 PORT=8001 ``` ```env Team Shared Environment # Team authorization settings DEEPWIKI_AUTH_MODE=true DEEPWIKI_AUTH_CODE=team-wiki-access-2024 # Shared environment configuration NODE_ENV=production LOG_LEVEL=INFO # Team server settings SERVER_BASE_URL=https://team-wiki.company.com PORT=8001 ``` ## Frontend Usage ### Authorization Code Input When authorization mode is enabled, users see an authorization code input field:
Authorization Required
This code is required to generate wikis
### User Experience Flow User visits the DeepWiki-Open interface and sees: * Normal repository URL input * Authorization code field (when mode is enabled) * Clear indication that authorization is required User enters the authorization code: * Code is masked (password field) * Real-time validation (optional) * Clear error messages for invalid codes After successful authorization: * Normal wiki generation flow continues * Code is included in API requests automatically * User doesn't need to re-enter code for the session ### Frontend Implementation The frontend automatically detects authorization mode: ```javascript // Authorization status check const checkAuthStatus = async () => { const response = await fetch('/api/auth/status'); const { auth_required } = await response.json(); if (auth_required) { // Show authorization code input setShowAuthInput(true); } }; // Authorization validation const validateAuthCode = async (code) => { const response = await fetch('/api/auth/validate', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ code }) }); const { success } = await response.json(); return success; }; ``` ## API Usage ### Including Authorization Codes All API requests must include the authorization code when mode is enabled: ```bash cURL curl -X POST "http://localhost:8001/wiki/generate" \ -H "Content-Type: application/json" \ -d '{ "repo_url": "https://github.com/facebook/react", "model_provider": "google", "authorization_code": "your-secure-code-here" }' ``` ```python Python import requests # Wiki generation with authorization response = requests.post("http://localhost:8001/wiki/generate", json={ "repo_url": "https://github.com/facebook/react", "model_provider": "google", "authorization_code": "your-secure-code-here" }) if response.status_code == 401: print("Invalid authorization code") else: wiki_data = response.json() ``` ```javascript JavaScript // API request with authorization const generateWiki = async (repoUrl, authCode) => { try { const response = await fetch('/api/wiki/generate', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ repo_url: repoUrl, model_provider: 'google', authorization_code: authCode }) }); if (response.status === 401) { throw new Error('Invalid authorization code'); } return await response.json(); } catch (error) { console.error('Wiki generation failed:', error); throw error; } }; ``` ### Authorization Validation Endpoint Test authorization codes before use: ```bash Validate Code curl -X POST "http://localhost:8001/auth/validate" \ -H "Content-Type: application/json" \ -d '{ "code": "your-secure-code-here" }' ``` ```python Python Validation import requests def validate_auth_code(code): response = requests.post("http://localhost:8001/auth/validate", json={"code": code}) result = response.json() return result.get("success", False) # Usage if validate_auth_code("test-code"): print("Authorization code is valid") else: print("Invalid authorization code") ``` ```javascript JavaScript Validation const validateAuthCode = async (code) => { const response = await fetch('/api/auth/validate', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ code }) }); const { success } = await response.json(); return success; }; ``` ```json Valid Code { "success": true } ``` ```json Invalid Code { "success": false } ``` ## Security Considerations ### Authorization Code Security **Strong authorization codes:** ```bash # Good examples team-wiki-2024-XRT789-secure prod-deepwiki-auth-94KL23P company-docs-access-2024-QW45 # Avoid these 123456 password admin test deepwiki ``` **Generation methods:** ```bash # Generate random codes openssl rand -hex 16 # Output: a3f7c9e2b4d8f6a1c5e9b7d3f8a2c6e4 # Generate pronounceable codes openssl rand -base64 12 | tr -d "=+/" | cut -c1-16 # Output: kH7mP9qR3sL4wX8t # Custom format echo "wiki-$(date +%Y)-$(openssl rand -hex 4)" # Output: wiki-2024-7a8b9c2d ``` **Secure storage:** * Store codes in environment variables, not source code * Use secrets management in production (AWS Secrets Manager, Azure Key Vault) * Restrict file permissions on `.env` files (`chmod 600`) * Never commit authorization codes to version control **Regular rotation:** ```bash # Rotation schedule # - Development: Monthly # - Staging: Bi-weekly # - Production: Weekly or after security incidents # Rotation process: # 1. Generate new authorization code # 2. Test with new code in staging # 3. Update production environment variables # 4. Restart services # 5. Notify authorized users # 6. Monitor for authentication failures ``` **Code management:** ```bash # Environment-specific codes DEEPWIKI_AUTH_CODE_DEV=dev-auth-code-123 DEEPWIKI_AUTH_CODE_STAGING=staging-auth-456 DEEPWIKI_AUTH_CODE_PROD=prod-auth-789 # Use deployment scripts to select appropriate code export DEEPWIKI_AUTH_CODE=${DEEPWIKI_AUTH_CODE_PROD} ``` **HTTPS requirement:** ```nginx # Always use HTTPS in production server { listen 443 ssl; server_name deepwiki.yourdomain.com; # SSL configuration ssl_certificate /path/to/certificate.crt; ssl_certificate_key /path/to/private.key; ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers HIGH:!aNULL:!MD5; location / { proxy_pass http://localhost:8001; proxy_set_header X-Forwarded-Proto $scheme; } } # Redirect HTTP to HTTPS server { listen 80; server_name deepwiki.yourdomain.com; return 301 https://$server_name$request_uri; } ``` **Access restrictions:** ```bash # Firewall rules for additional security # Only allow specific IPs to access API directly iptables -A INPUT -p tcp --dport 8001 -s 192.168.1.0/24 -j ACCEPT iptables -A INPUT -p tcp --dport 8001 -j DROP # Allow HTTPS from anywhere (proxied) iptables -A INPUT -p tcp --dport 443 -j ACCEPT ``` ### Attack Prevention Authorization mode provides basic access control but is not a complete security solution. Consider additional measures: * Rate limiting to prevent brute force attacks * IP whitelisting for sensitive environments * Web Application Firewall (WAF) for production * Regular security audits and monitoring * Multi-factor authentication for critical deployments **Rate limiting example:** ```python # Basic rate limiting implementation from collections import defaultdict from time import time auth_attempts = defaultdict(list) MAX_ATTEMPTS = 5 WINDOW_SECONDS = 300 # 5 minutes def check_rate_limit(ip_address): now = time() attempts = auth_attempts[ip_address] # Remove old attempts attempts[:] = [t for t in attempts if now - t < WINDOW_SECONDS] if len(attempts) >= MAX_ATTEMPTS: return False # Rate limited attempts.append(now) return True ``` ## Troubleshooting ### Common Issues **Symptoms:** * "Invalid authorization code" errors * Authentication consistently fails * API returns 401 status codes **Solutions:** 1. **Verify environment variables:** ```bash echo "Auth Mode: $DEEPWIKI_AUTH_MODE" echo "Auth Code: $DEEPWIKI_AUTH_CODE" ``` 2. **Check code formatting:** ```bash # Ensure no extra spaces or characters export DEEPWIKI_AUTH_CODE=$(echo "$DEEPWIKI_AUTH_CODE" | xargs) ``` 3. **Restart services completely:** ```bash # Kill all processes pkill -f "python -m api.main" pkill -f "npm" # Wait a moment, then restart sleep 2 python -m api.main & npm run dev & ``` 4. **Test with curl:** ```bash curl -X POST "http://localhost:8001/auth/validate" \ -H "Content-Type: application/json" \ -d '{"code": "your-exact-code-here"}' ``` **Symptoms:** * Authorization mode enabled but no auth input field * Frontend behaves as if authorization is disabled * No indication that auth is required **Solutions:** 1. **Check API status endpoint:** ```bash curl -X GET "http://localhost:8001/auth/status" ``` 2. **Verify frontend API connection:** ```javascript // Check in browser console fetch('/api/auth/status') .then(r => r.json()) .then(console.log); ``` 3. **Clear browser cache:** ```bash # Or use browser dev tools: # Application > Storage > Clear Site Data ``` 4. **Check Next.js API routes:** ```bash # Verify routes exist ls -la src/app/api/auth/ ``` **Symptoms:** * Frontend shows auth required but validation fails * Network errors in browser console * Proxy/forwarding failures **Solutions:** 1. **Check SERVER\_BASE\_URL:** ```bash echo "Server URL: $SERVER_BASE_URL" # Should match your backend API location ``` 2. **Test direct API access:** ```bash # Test backend directly curl -X GET "http://localhost:8001/auth/status" # Test frontend proxy curl -X GET "http://localhost:3000/api/auth/status" ``` 3. **Verify port configuration:** ```bash # Backend should be running on PORT netstat -tulpn | grep :8001 # Frontend should be running on 3000 netstat -tulpn | grep :3000 ``` ### Debugging Steps ```bash # Verify environment variables are loaded python -c " import os from dotenv import load_dotenv load_dotenv() print('Auth Mode:', os.getenv('DEEPWIKI_AUTH_MODE')) print('Auth Code Set:', bool(os.getenv('DEEPWIKI_AUTH_CODE'))) print('Server URL:', os.getenv('SERVER_BASE_URL')) " ``` ```bash # Test status endpoint curl -v -X GET "http://localhost:8001/auth/status" # Test validation endpoint curl -v -X POST "http://localhost:8001/auth/validate" \ -H "Content-Type: application/json" \ -d '{"code": "test-code"}' # Test wiki generation (should require auth) curl -v -X POST "http://localhost:8001/wiki/generate" \ -H "Content-Type: application/json" \ -d '{"repo_url": "https://github.com/octocat/Hello-World", "model_provider": "google"}' ``` ```bash # Backend logs tail -f api/logs/application.log # Look for: # - "Authorization mode: ENABLED/DISABLED" # - Authentication attempt logs # - Error messages related to auth ``` ```javascript // In browser console // Check authorization status fetch('/api/auth/status').then(r => r.json()).then(console.log); // Test validation fetch('/api/auth/validate', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ code: 'test-code' }) }).then(r => r.json()).then(console.log); ``` ## Production Deployment ### Docker Configuration ```yaml docker-compose.yml services: deepwiki-api: build: . environment: # Authorization configuration - DEEPWIKI_AUTH_MODE=true - DEEPWIKI_AUTH_CODE=${DEEPWIKI_AUTH_CODE} # API keys - GOOGLE_API_KEY=${GOOGLE_API_KEY} - OPENAI_API_KEY=${OPENAI_API_KEY} # Server configuration - PORT=8001 - SERVER_BASE_URL=https://api.deepwiki.yourdomain.com # Security settings - NODE_ENV=production - LOG_LEVEL=WARNING ports: - "8001:8001" restart: unless-stopped deepwiki-frontend: build: context: . dockerfile: Dockerfile.frontend environment: - SERVER_BASE_URL=https://api.deepwiki.yourdomain.com - NEXT_PUBLIC_API_URL=https://api.deepwiki.yourdomain.com ports: - "3000:3000" depends_on: - deepwiki-api restart: unless-stopped ``` ```env .env.production # Production authorization settings DEEPWIKI_AUTH_MODE=true DEEPWIKI_AUTH_CODE=prod-secure-auth-2024-XYZ789 # API Keys GOOGLE_API_KEY=your_production_google_key OPENAI_API_KEY=your_production_openai_key # Server configuration SERVER_BASE_URL=https://api.deepwiki.yourdomain.com PORT=8001 # Security NODE_ENV=production LOG_LEVEL=WARNING LOG_SENSITIVE_DATA=false ``` ### Kubernetes Deployment ```yaml deepwiki-secret.yaml apiVersion: v1 kind: Secret metadata: name: deepwiki-auth type: Opaque stringData: auth-code: "prod-secure-auth-2024-XYZ789" google-api-key: "your_google_api_key" openai-api-key: "your_openai_api_key" ``` ```yaml deepwiki-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: deepwiki-api spec: replicas: 3 selector: matchLabels: app: deepwiki-api template: metadata: labels: app: deepwiki-api spec: containers: - name: api image: deepwiki/api:latest ports: - containerPort: 8001 env: - name: DEEPWIKI_AUTH_MODE value: "true" - name: DEEPWIKI_AUTH_CODE valueFrom: secretKeyRef: name: deepwiki-auth key: auth-code - name: GOOGLE_API_KEY valueFrom: secretKeyRef: name: deepwiki-auth key: google-api-key - name: SERVER_BASE_URL value: "https://deepwiki.yourdomain.com" - name: NODE_ENV value: "production" ``` ### Monitoring and Alerting ```python # Example monitoring setup import logging from collections import defaultdict from time import time # Metrics collection auth_metrics = { 'successful_auths': 0, 'failed_auths': 0, 'unique_ips': set(), 'requests_per_hour': defaultdict(int) } def track_auth_attempt(ip_address, success): current_hour = int(time()) // 3600 if success: auth_metrics['successful_auths'] += 1 else: auth_metrics['failed_auths'] += 1 auth_metrics['unique_ips'].add(ip_address) auth_metrics['requests_per_hour'][current_hour] += 1 # Alert on suspicious activity if auth_metrics['failed_auths'] > 50: send_security_alert("High number of failed authentications") ``` ```json { "dashboard": { "title": "DeepWiki Authorization", "panels": [ { "title": "Authentication Success Rate", "type": "stat", "targets": [ { "expr": "rate(deepwiki_auth_success_total[5m]) / rate(deepwiki_auth_attempts_total[5m]) * 100" } ] }, { "title": "Failed Authentication Attempts", "type": "graph", "targets": [ { "expr": "rate(deepwiki_auth_failures_total[1m])" } ] }, { "title": "Unique IPs per Hour", "type": "bargraph", "targets": [ { "expr": "increase(deepwiki_unique_ips_total[1h])" } ] } ] } } ``` ```bash # Analyze authentication logs grep "Authorization" api/logs/application.log | tail -100 # Failed authentication attempts grep "Authorization.*invalid" api/logs/application.log | wc -l # Unique IP addresses attempting authentication grep "Authorization" api/logs/application.log | \ grep -oE '([0-9]{1,3}\.){3}[0-9]{1,3}' | \ sort | uniq -c | sort -nr # Successful authentications per hour grep "Authorization.*success" api/logs/application.log | \ awk '{print $1" "$2}' | cut -c1-13 | uniq -c ``` ## Best Practices ### Development Workflow ```bash # Use simple auth codes for development DEEPWIKI_AUTH_MODE=true DEEPWIKI_AUTH_CODE=dev-123 # Enable detailed logging LOG_LEVEL=DEBUG ``` ```bash # Use production-like codes DEEPWIKI_AUTH_MODE=true DEEPWIKI_AUTH_CODE=staging-secure-2024 # Test authorization workflows thoroughly ``` ```bash # Strong authorization codes DEEPWIKI_AUTH_MODE=true DEEPWIKI_AUTH_CODE=complex-random-production-code # Minimal logging LOG_LEVEL=WARNING LOG_SENSITIVE_DATA=false ``` ### Team Management **Secure methods for sharing authorization codes:** 1. **Encrypted communication:** * Use encrypted messaging (Signal, encrypted email) * Password managers with secure sharing * Company secrets management systems 2. **Documentation:** ```markdown # Team Wiki Access Authorization code: [Available in team password manager] Location: Team Vault > DeepWiki > Production Access Updated: 2024-01-15 Next rotation: 2024-02-15 ``` 3. **Onboarding process:** * Include authorization code in new team member setup * Document where to find current codes * Explain rotation schedule and notifications **Multiple authorization codes for different access levels:** ```env # Different codes for different environments DEEPWIKI_AUTH_CODE_READONLY=readonly-access-2024 DEEPWIKI_AUTH_CODE_FULL=full-access-2024-secure DEEPWIKI_AUTH_CODE_ADMIN=admin-access-ultra-secure-2024 ``` **Implementation:** ```python # Multiple authorization codes VALID_AUTH_CODES = { 'readonly': os.getenv('DEEPWIKI_AUTH_CODE_READONLY'), 'full': os.getenv('DEEPWIKI_AUTH_CODE_FULL'), 'admin': os.getenv('DEEPWIKI_AUTH_CODE_ADMIN') } def validate_auth_code(code, required_level='readonly'): allowed_levels = { 'readonly': ['readonly', 'full', 'admin'], 'full': ['full', 'admin'], 'admin': ['admin'] } for level in allowed_levels[required_level]: if VALID_AUTH_CODES[level] == code: return True, level return False, None ``` ### Security Maintenance ```bash #!/bin/bash # auth-rotation.sh - Automated code rotation # Generate new auth code NEW_CODE="wiki-$(date +%Y%m%d)-$(openssl rand -hex 6)" # Update environment echo "New authorization code: $NEW_CODE" echo "Update DEEPWIKI_AUTH_CODE in production environment" # Log rotation echo "$(date): Code rotated to $NEW_CODE" >> auth-rotation.log ``` ```python # Audit script def audit_auth_access(): with open('api/logs/application.log', 'r') as f: auth_events = [ line for line in f if 'Authorization' in line ] print(f"Total auth events: {len(auth_events)}") print(f"Successful: {len([e for e in auth_events if 'success' in e])}") print(f"Failed: {len([e for e in auth_events if 'invalid' in e])}") ``` ```bash # Monitor for security issues tail -f api/logs/application.log | grep -i "auth\|security\|unauthorized" # Set up alerts for suspicious activity grep -c "Authorization.*invalid" api/logs/application.log | \ awk '{if ($1 > 10) print "ALERT: High auth failure rate"}' ``` ## Next Steps Complete production deployment with authorization mode Learn about complete API authentication and security Configure access to private repositories with tokens Implement comprehensive security measures # Environment Variables Source: https://asyncfunc.mintlify.app/getting-started/environment-variables Complete guide to configuring DeepWiki-Open environment variables for all supported AI providers and features DeepWiki-Open uses environment variables to configure AI providers, server settings, authentication, and advanced features. This guide covers all available environment variables and their usage. ## Required Environment Variables At minimum, you need API keys for at least one AI provider: ```bash Minimal Setup # Choose ONE of these AI providers GOOGLE_API_KEY=your_google_api_key_here # OR OPENAI_API_KEY=your_openai_api_key_here ``` ```bash Complete .env File # AI Provider API Keys (choose at least one) GOOGLE_API_KEY=your_google_api_key_here OPENAI_API_KEY=your_openai_api_key_here OPENROUTER_API_KEY=your_openrouter_api_key_here # Azure OpenAI (Enterprise) AZURE_OPENAI_API_KEY=your_azure_openai_api_key_here AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com AZURE_OPENAI_VERSION=2024-02-15-preview # AWS Bedrock AWS_ACCESS_KEY_ID=your_aws_access_key AWS_SECRET_ACCESS_KEY=your_aws_secret_key AWS_REGION=us-east-1 # Local/Self-Hosted Models OLLAMA_HOST=http://localhost:11434 OPENAI_BASE_URL=https://api.openai.com/v1 # Server Configuration PORT=8001 SERVER_BASE_URL=http://localhost:8001 # Authorization & Security DEEPWIKI_AUTH_MODE=false DEEPWIKI_AUTH_CODE=your_secret_code_here # Logging & Debugging LOG_LEVEL=INFO LOG_FILE_PATH=./api/logs/application.log # Advanced Configuration DEEPWIKI_CONFIG_DIR=./api/config REDIS_URL=redis://localhost:6379 ``` ## AI Provider Configuration ### Google Gemini GOOGLE\_API\_KEY Google AI Studio API key for Gemini models. **How to get:** 1. Visit [Google AI Studio](https://makersuite.google.com/app/apikey) 2. Click "Create API Key" 3. Copy the generated key **Supported models:** * `gemini-2.0-flash` (default, recommended) * `gemini-1.5-flash` * `gemini-1.0-pro` Google Gemini offers generous free tier limits and excellent performance for documentation generation. ### OpenAI OPENAI\_API\_KEY OpenAI API key for GPT models. OPENAI\_BASE\_URL
Default: `https://api.openai.com/v1`
Custom OpenAI API endpoint (for OpenAI-compatible services). **How to get:** 1. Visit [OpenAI Platform](https://platform.openai.com/api-keys) 2. Create new secret key 3. Copy the key (starts with `sk-`) **Supported models:** * `gpt-4o` (default) * `gpt-4.1` * `o1` * `o3` * `o4-mini` OpenAI requires a paid account. Free tier users cannot access the API. ### OpenRouter OPENROUTER\_API\_KEY OpenRouter API key for access to multiple model providers. **How to get:** 1. Sign up at [OpenRouter](https://openrouter.ai/) 2. Go to Keys section 3. Create new API key **Available models:** * `openai/gpt-4o` * `anthropic/claude-3.5-sonnet` * `deepseek/deepseek-r1` * `google/gemini-pro` * And 100+ more models OpenRouter provides access to multiple AI providers through a single API, perfect for comparing models. ### Azure OpenAI AZURE\_OPENAI\_API\_KEY Azure OpenAI service API key. AZURE\_OPENAI\_ENDPOINT Your Azure OpenAI resource endpoint URL. AZURE\_OPENAI\_VERSION API version (e.g., `2024-02-15-preview`). **How to get:** 1. Create Azure OpenAI resource in [Azure Portal](https://portal.azure.com/) 2. Deploy a model (GPT-4, GPT-3.5-turbo, etc.) 3. Get endpoint and API key from resource overview 4. Note the API version from the deployment **Example configuration:** ```env AZURE_OPENAI_API_KEY=abc123def456ghi789 AZURE_OPENAI_ENDPOINT=https://my-resource.openai.azure.com AZURE_OPENAI_VERSION=2024-02-15-preview ``` ### AWS Bedrock AWS\_ACCESS\_KEY\_ID AWS access key for Bedrock access. AWS\_SECRET\_ACCESS\_KEY AWS secret access key. AWS\_REGION
Default: `us-east-1`
AWS region where Bedrock models are available. **Supported models:** * `anthropic.claude-3-sonnet-20240229-v1:0` * `anthropic.claude-3-haiku-20240307-v1:0` * `anthropic.claude-3-opus-20240229-v1:0` * `amazon.titan-text-express-v1` ### Ollama (Local Models) OLLAMA\_HOST
Default: `http://localhost:11434`
Ollama server URL for local AI models. **Setup Ollama:** ```bash # Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Start service ollama serve # Pull a model ollama pull llama3:8b ``` ```bash # Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Start as service sudo systemctl start ollama sudo systemctl enable ollama # Pull a model ollama pull qwen3:8b ``` ```bash # Run Ollama in container docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama # Pull a model docker exec -it ollama ollama pull llama3:8b ``` **Supported models:** * `qwen3:1.7b` (lightweight) * `llama3:8b` (balanced) * `qwen3:8b` (high context) ### DashScope (Alibaba) DASHSCOPE\_API\_KEY Alibaba DashScope API key for Qwen models. **How to get:** 1. Sign up at [DashScope](https://dashscope.aliyun.com/) 2. Create API key in console 3. Add key to environment **Supported models:** * `qwen-plus` * `qwen-turbo` * `deepseek-r1` ## Server Configuration PORT
Default: `8001`
Port for the FastAPI backend server. SERVER\_BASE\_URL
Default: `http://localhost:8001`
Base URL for API server (used by frontend). NODE\_ENV
Default: `production`
Environment mode (`development`, `production`, `test`). **Example server configuration:** ```env PORT=8002 SERVER_BASE_URL=https://api.deepwiki.example.com NODE_ENV=production ``` ## Security & Authentication ### Authorization Mode DEEPWIKI\_AUTH\_MODE
Default: `false`
Enable authorization requirement for wiki generation. DEEPWIKI\_AUTH\_CODE Secret code required when authorization mode is enabled. **Usage:** ```env DEEPWIKI_AUTH_MODE=true DEEPWIKI_AUTH_CODE=my-secret-code-123 ``` When enabled, users must enter the auth code to generate wikis. Authorization mode provides basic frontend protection but doesn't secure direct API access. ## Logging & Debugging LOG\_LEVEL
Default: `INFO`
Logging verbosity level. **Options:** `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL` LOG\_FILE\_PATH
Default: `./api/logs/application.log`
Path for log file output. **Example logging configuration:** ```env LOG_LEVEL=DEBUG LOG_FILE_PATH=./logs/deepwiki-debug.log ``` In production, use `INFO` or `WARNING` level to reduce log volume. ## Advanced Configuration DEEPWIKI\_CONFIG\_DIR
Default: `./api/config`
Directory containing configuration JSON files. REDIS\_URL Redis connection URL for caching (optional). **Example:** ```env DEEPWIKI_CONFIG_DIR=/custom/config/path REDIS_URL=redis://localhost:6379/0 ``` ## Environment File Templates ### Development ```env .env.development # Development Environment NODE_ENV=development LOG_LEVEL=DEBUG LOG_FILE_PATH=./api/logs/development.log # Server Config PORT=8001 SERVER_BASE_URL=http://localhost:8001 # API Keys GOOGLE_API_KEY=your_development_google_key OPENAI_API_KEY=your_development_openai_key # Local Ollama OLLAMA_HOST=http://localhost:11434 # No authentication for dev DEEPWIKI_AUTH_MODE=false ``` ### Production ```env .env.production # Production Environment NODE_ENV=production LOG_LEVEL=INFO LOG_FILE_PATH=/var/log/deepwiki/application.log # Server Config PORT=8001 SERVER_BASE_URL=https://api.yourdomain.com # Production API Keys GOOGLE_API_KEY=your_production_google_key OPENAI_API_KEY=your_production_openai_key AZURE_OPENAI_API_KEY=your_azure_key AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com AZURE_OPENAI_VERSION=2024-02-15-preview # Enable authentication DEEPWIKI_AUTH_MODE=true DEEPWIKI_AUTH_CODE=your_secure_production_code # Redis caching REDIS_URL=redis://redis-server:6379/0 ``` ### Docker ```env .env.docker # Docker Environment NODE_ENV=production LOG_LEVEL=INFO # Container networking PORT=8001 SERVER_BASE_URL=http://deepwiki-api:8001 # API Keys GOOGLE_API_KEY=your_google_key OPENAI_API_KEY=your_openai_key # External Ollama OLLAMA_HOST=http://ollama-server:11434 # Persistent data LOG_FILE_PATH=/app/logs/application.log DEEPWIKI_CONFIG_DIR=/app/config ``` ## Validation & Testing ```bash # Check environment variables are loaded python -c " import os from dotenv import load_dotenv load_dotenv() # Check API keys providers = { 'Google': os.getenv('GOOGLE_API_KEY'), 'OpenAI': os.getenv('OPENAI_API_KEY'), 'OpenRouter': os.getenv('OPENROUTER_API_KEY'), 'Azure': os.getenv('AZURE_OPENAI_API_KEY'), } for name, key in providers.items(): status = '✓ Configured' if key else '✗ Missing' print(f'{name}: {status}') " ``` ```bash # Test backend startup python -m api.main # Check for successful startup messages # Look for: "Starting Streaming API on port 8001" # No API key warnings for your configured providers ``` ```bash # Start frontend npm run dev # Test API connection at http://localhost:3000 # Model selection should show your configured providers ``` ## Security Best Practices * Never commit `.env` files to version control * Use different API keys for development and production * Regularly rotate API keys * Monitor API usage for unexpected activity * Use environment-specific keys when possible ```bash # Secure file permissions chmod 600 .env # Use secrets management in production # - AWS Secrets Manager # - Azure Key Vault # - Kubernetes secrets # - Docker secrets ``` * Use HTTPS in production * Configure proper CORS settings * Use private networks for internal components * Enable authorization mode for public deployments ## Troubleshooting **Symptoms:** API key errors, default values used **Solutions:** 1. Verify `.env` file is in project root 2. Check file permissions (readable by application) 3. Ensure no syntax errors in `.env` file 4. Restart application after changes **Symptoms:** "Invalid API key" errors **Solutions:** 1. Test API keys with provider's documentation 2. Check for extra spaces or characters 3. Verify key has correct permissions/scopes 4. Confirm key hasn't expired or been revoked **Symptoms:** "Port already in use" errors **Solutions:** 1. Change PORT environment variable 2. Kill existing processes on the port 3. Use Docker with port mapping 4. Configure reverse proxy ## Next Steps Configure specific AI model providers and their settings Deploy DeepWiki with production-ready configuration Learn about JSON configuration files for advanced customization Implement security best practices for production deployments # Generate Your First Wiki Source: https://asyncfunc.mintlify.app/getting-started/first-wiki Step-by-step guide to creating your first repository wiki with DeepWiki-Open Now that DeepWiki-Open is running, let's generate your first repository wiki! This guide walks you through the complete process from entering a repository URL to exploring the generated documentation. ## Quick Start Navigate to [http://localhost:3000](http://localhost:3000) in your browser. You should see the main DeepWiki interface with: * Repository URL input field * Model provider selection * Generate Wiki button The interface loads without errors and shows all input fields. For your first wiki, start with a small, well-structured repository: ``` https://github.com/octocat/Hello-World https://github.com/microsoft/TypeScript-Node-Starter https://github.com/expressjs/express ``` ``` https://github.com/microsoft/autogen https://github.com/openai/openai-python https://github.com/vercel/next.js ``` Enter your own repository URL: * GitHub: `https://github.com/username/repo-name` * GitLab: `https://gitlab.com/username/repo-name` * BitBucket: `https://bitbucket.org/username/repo-name` Start with repositories under 100 files for faster initial generation and testing. Select your AI model provider and specific model: * **Provider**: Google * **Model**: `gemini-2.0-flash` (default) * **Why**: Fast, reliable, generous free tier * **Best for**: General documentation, good balance of speed and quality * **Provider**: OpenAI * **Model**: `gpt-4o` (default) * **Why**: High-quality, detailed documentation * **Best for**: Complex codebases requiring detailed analysis * **Provider**: OpenRouter * **Model**: `openai/gpt-4o` or `anthropic/claude-3.5-sonnet` * **Why**: Access to multiple providers through one API * **Best for**: Experimenting with different models Ensure you have valid API keys configured for your chosen provider. Click the **"Generate Wiki"** button and watch the progress: 1. **Repository Analysis**: DeepWiki clones and analyzes the repository structure 2. **Code Processing**: Creates embeddings for intelligent code understanding 3. **AI Generation**: Generates documentation using your selected AI model 4. **Diagram Creation**: Creates Mermaid diagrams for visualization 5. **Wiki Assembly**: Organizes everything into a structured wiki Generation typically takes 30 seconds to 2 minutes for small repos, 5-10 minutes for larger ones. ## Understanding the Generated Wiki Once generation completes, you'll see your interactive wiki with several key sections: ### Wiki Structure **What you'll see:** * Repository summary and purpose * Key features and functionality * Architecture overview * Main technologies used **Example content:** * Project description * Installation instructions * Usage examples * Contributing guidelines **What you'll see:** * File structure breakdown * Component relationships * Data flow analysis * Key functions and classes **Includes:** * Code snippets with explanations * Function parameter details * Class inheritance diagrams * Module dependencies **Mermaid diagrams showing:** * System architecture * Data flow * Component relationships * Process workflows * Database schemas (if applicable) **Interactive features:** * Zoom and pan diagrams * Clickable nodes * Responsive design ### Navigation Features Hierarchical navigation showing all wiki pages and sections Full-text search across all generated documentation Links between related pages and code sections Direct links to source files and line numbers ## Using the Ask Feature The Ask feature lets you chat with your repository using RAG (Retrieval Augmented Generation): Look for the "Ask" or chat interface in your generated wiki. Try these example questions: ```text General Questions What is the main purpose of this repository? How do I install and run this project? What are the key features? ``` ```text Technical Questions How does the authentication system work? What databases does this project use? Explain the API endpoints ``` ```text Code-Specific Questions Show me the main entry point How is error handling implemented? What testing framework is used? ``` The AI will provide answers based on your actual repository code, including: * Direct code references * File locations * Relevant context * Example usage ## Deep Research Mode For complex analysis, try the Deep Research feature: Toggle the "Deep Research" switch in the Ask interface. Deep Research is perfect for questions like: * "Analyze the security implementation across the entire codebase" * "How is performance optimized in this application?" * "Explain the complete data flow from user input to database" Deep Research conducts multiple investigation rounds: 1. **Initial Analysis**: Basic research and findings 2. **Deep Dive**: Detailed investigation of specific aspects 3. **Cross-References**: Analysis of relationships and dependencies 4. **Final Synthesis**: Comprehensive conclusion with recommendations ## Customizing Your Wiki ### Model Selection You can change models and regenerate sections: 1. Go to model configuration 2. Select a different provider/model 3. Regenerate specific sections 4. Compare results from different models For advanced users, you can adjust: * Temperature (creativity vs. consistency) * Top-p (response diversity) * Max tokens (response length) * Custom model names (for compatible providers) ### Private Repository Access 1. Go to GitHub Settings → Developer settings → Personal access tokens 2. Generate new token (classic) 3. Select scopes: `repo` (for private repos) 4. Copy the generated token 1. Go to GitLab User Settings → Access Tokens 2. Create token with `read_repository` scope 3. Copy the generated token 1. Go to BitBucket Account Settings → App passwords 2. Create password with `Repositories: Read` permission 3. Copy the generated password 1. Click "+ Add access tokens" in the DeepWiki interface 2. Paste your personal access token 3. The token is stored locally and used for repository access Now you can generate wikis for your private repositories using the same process. Tokens are stored locally in your browser. Never share your access tokens. ## Troubleshooting Your First Wiki **Common causes:** * Invalid repository URL * Private repository without access token * API key issues * Repository too large **Solutions:** 1. Verify the repository URL is accessible 2. Check API key configuration 3. Try a smaller repository first 4. Check the browser developer console for errors **Possible issues:** * Wrong model selected * Repository has unusual structure * Limited code documentation **Improvements:** 1. Try a different AI model 2. Use OpenAI GPT-4 for complex repositories 3. Enable Deep Research mode 4. Ask specific questions to get better answers **Normal for:** * Large repositories (1000+ files) * Complex codebases * Detailed analysis requests **Speed tips:** 1. Start with smaller repositories 2. Use faster models like `gemini-2.0-flash` 3. Avoid Deep Research for initial testing 4. Check your internet connection ## Example Walkthrough Let's generate a wiki for the Express.js repository: * Repository: `https://github.com/expressjs/express` * Model: Google Gemini 2.0 Flash * No authentication needed (public repo) The generated wiki should include: * Express.js framework overview * Middleware architecture explanation * Router and route handling * Request/response cycle diagrams * API reference for core methods Try asking: * "How does Express.js middleware work?" * "Show me how to create a simple server" * "Explain the routing system" You should get detailed answers with code examples from the actual Express.js codebase. ## Next Steps Learn about advanced wiki generation options and customization Configure multiple AI providers to compare results Deploy DeepWiki for team or production use Integrate DeepWiki into your development workflow Congratulations! You've successfully generated your first repository wiki with DeepWiki-Open. The AI-powered analysis and documentation generation should give you a comprehensive understanding of any codebase in minutes rather than hours of manual exploration. # Installation Guide Source: https://asyncfunc.mintlify.app/getting-started/installation Detailed installation instructions for DeepWiki-Open including dependencies, system requirements, and setup options This guide covers detailed installation instructions for DeepWiki-Open, including system requirements, dependencies, and various setup options. ## System Requirements * **OS**: Linux, macOS, or Windows 10+ * **Python**: 3.8 or higher * **Node.js**: 16.0 or higher * **Memory**: 4GB RAM minimum * **Storage**: 2GB free space * **Network**: Internet connection for AI API calls * **OS**: Linux (Ubuntu 20.04+) or macOS * **Python**: 3.9 or 3.10 * **Node.js**: 18.0 or higher * **Memory**: 8GB RAM or more * **Storage**: 10GB free space (for repository caches) * **Network**: Stable broadband connection ## Prerequisites Before installing DeepWiki-Open, ensure you have: ```bash # Using Homebrew brew install python@3.10 # Or download from python.org # https://www.python.org/downloads/ ``` ```bash sudo apt update sudo apt install python3.10 python3.10-venv python3-pip ``` ```powershell # Using Chocolatey choco install python3 # Or download from python.org # https://www.python.org/downloads/ ``` Verify Python installation: `python --version` should show 3.8+ ```bash # Using Homebrew brew install node # Or using Node Version Manager curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash nvm install 18 nvm use 18 ``` ```bash # Using NodeSource repository curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash - sudo apt-get install -y nodejs # Or using NVM curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash nvm install 18 nvm use 18 ``` ```powershell # Using Chocolatey choco install nodejs # Or download from nodejs.org # https://nodejs.org/ ``` Verify Node.js installation: `node --version` should show 16.0+ ```bash # Git comes with Xcode Command Line Tools xcode-select --install # Or using Homebrew brew install git ``` ```bash sudo apt update sudo apt install git ``` ```powershell # Using Chocolatey choco install git # Or download from git-scm.com # https://git-scm.com/download/win ``` Verify Git installation: `git --version` ## Installation Methods Choose the installation method that best fits your needs: ### Docker Installation Docker provides the easiest and most consistent setup experience. #### Prerequisites for Docker Download and install [Docker Desktop for Mac](https://docs.docker.com/desktop/mac/install/) ```bash # Ubuntu/Debian curl -fsSL https://get.docker.com -o get-docker.sh sudo sh get-docker.sh # Add user to docker group sudo usermod -aG docker $USER # Log out and back in for group change to take effect ``` Download and install [Docker Desktop for Windows](https://docs.docker.com/desktop/windows/install/) Docker Compose is typically included with Docker Desktop. For Linux: ```bash sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose sudo chmod +x /usr/local/bin/docker-compose ``` #### Docker Setup ```bash git clone https://github.com/AsyncFuncAI/deepwiki-open.git cd deepwiki-open ``` ```bash # Create .env file cp .env.example .env # Edit with your API keys nano .env # or use your preferred editor ``` ```bash docker-compose up -d ``` This starts both backend (port 8001) and frontend (port 3000) services. ### Manual Installation For development or when you need more control over the setup. ```bash git clone https://github.com/AsyncFuncAI/deepwiki-open.git cd deepwiki-open ``` ```bash # Create Python virtual environment python -m venv venv # Activate virtual environment # On macOS/Linux: source venv/bin/activate # On Windows: # venv\Scripts\activate # Install Python dependencies pip install -r api/requirements.txt ``` Using a virtual environment is strongly recommended to avoid dependency conflicts. ```bash # Install Node.js dependencies npm install # or if you prefer yarn: # yarn install ``` Create a `.env` file in the project root: ```bash # Copy example environment file cp .env.example .env # Edit environment variables nano .env ``` Required variables: ```env GOOGLE_API_KEY=your_google_api_key OPENAI_API_KEY=your_openai_api_key ``` Open two terminal windows: Terminal 1 (Backend): ```bash # Ensure virtual environment is activated source venv/bin/activate # or venv\Scripts\activate on Windows python -m api.main ``` Terminal 2 (Frontend): ```bash npm run dev # or: yarn dev ``` ### Development Setup For contributors and developers who want to modify DeepWiki-Open. 1. Fork the repository on GitHub 2. Clone your fork: ```bash git clone https://github.com/YOUR_USERNAME/deepwiki-open.git cd deepwiki-open ``` ```bash # Install Python development dependencies pip install -r api/requirements.txt pip install -r api/requirements-dev.txt # if exists # Install Node.js development dependencies npm install # Install pre-commit hooks (if available) pre-commit install ``` Create `.env.development`: ```env # Development environment NODE_ENV=development LOG_LEVEL=DEBUG LOG_FILE_PATH=./api/logs/development.log # API Configuration PORT=8001 SERVER_BASE_URL=http://localhost:8001 # Your API keys GOOGLE_API_KEY=your_google_api_key OPENAI_API_KEY=your_openai_api_key ``` For development with hot reloading: ```bash # Terminal 1: Backend with auto-reload python -m api.main # Already includes reload in development # Terminal 2: Frontend with Turbopack npm run dev # Uses Next.js Turbopack for fast refresh ``` ## Verification After installation, verify that everything is working correctly: ```bash # Test backend API curl http://localhost:8001/health # Or visit in browser open http://localhost:8001/docs ``` Expected response: ```json {"status": "healthy", "version": "0.1.0"} ``` Visit [http://localhost:3000](http://localhost:3000) in your browser. You should see: * DeepWiki-Open interface * Repository URL input field * Model selection dropdown * Generate Wiki button 1. Enter a small public repository (e.g., `https://github.com/octocat/Hello-World`) 2. Select an AI model provider 3. Click "Generate Wiki" 4. Verify wiki generation completes successfully ## Optional Dependencies To run AI models locally using Ollama: ```bash # Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Or using Homebrew brew install ollama # Start Ollama service ollama serve # Pull a model ollama pull llama3:8b ``` ```bash # Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Start as service sudo systemctl start ollama sudo systemctl enable ollama # Pull a model ollama pull llama3:8b ``` Download and install from [Ollama website](https://ollama.ai/download) Then in PowerShell: ```powershell ollama pull llama3:8b ``` Update your `.env`: ```env OLLAMA_HOST=http://localhost:11434 ``` For improved performance in production: ```bash brew install redis brew services start redis ``` ```bash sudo apt update sudo apt install redis-server sudo systemctl start redis sudo systemctl enable redis ``` ```bash docker run -d -p 6379:6379 redis:alpine ``` Add to `.env`: ```env REDIS_URL=redis://localhost:6379 ``` ## Troubleshooting Installation **Problem**: `python: command not found` **Solutions**: ```bash # Check if python3 is available python3 --version # Create alias (add to ~/.bashrc or ~/.zshrc) alias python=python3 # Or install Python properly sudo apt install python-is-python3 # Ubuntu/Debian ``` **Problem**: `npm ERR! EACCES: permission denied` **Solutions**: ```bash # Use Node Version Manager (recommended) curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash nvm install 18 nvm use 18 # Or fix npm permissions sudo chown -R $(whoami) ~/.npm ``` **Problem**: `Cannot connect to Docker daemon` **Solutions**: ```bash # Start Docker service sudo systemctl start docker # Add user to docker group sudo usermod -aG docker $USER # Then log out and back in # Or run with sudo (not recommended for development) sudo docker-compose up ``` **Problem**: `Port already in use` **Solutions**: ```bash # Find what's using the port lsof -i :3000 # or :8001 # Kill the process kill -9 PID # Or use different ports in .env PORT=8002 # Update frontend to use new backend port ``` ## Next Steps Configure API keys and environment settings Set up AI model providers for documentation generation Create your first repository wiki Deploy DeepWiki for production use # Introduction to DeepWiki-Open Source: https://asyncfunc.mintlify.app/getting-started/introduction AI-powered documentation generator that creates beautiful, interactive wikis for any GitHub, GitLab, or BitBucket repository DeepWiki-Open Banner **DeepWiki-Open** is an innovative AI-powered documentation generator that automatically creates comprehensive, interactive wikis for any GitHub, GitLab, or BitBucket repository. Transform your codebase into beautiful, navigable documentation in seconds. ## What is DeepWiki-Open? DeepWiki-Open analyzes your repository's code structure, understands the relationships between components, and generates: Comprehensive documentation with code analysis and explanations Automatic Mermaid diagrams showing architecture and data flow RAG-powered Q\&A system to chat with your repository Multi-turn AI research for complex code analysis ## Key Features ### 🚀 Instant Documentation Generation Simply enter a repository URL and DeepWiki will analyze the code structure, generate comprehensive documentation, and create visual diagrams to explain how everything works. ### 🔐 Private Repository Support Securely access private repositories with personal access tokens from GitHub, GitLab, or BitBucket. ### 🤖 Multiple AI Model Providers Choose from leading AI providers: * **Google Gemini** (default: gemini-2.0-flash) * **OpenAI** (gpt-4o, o1, o3, o4-mini) * **OpenRouter** (access to Claude, Llama, Mistral, and more) * **Azure OpenAI** (enterprise-grade deployment) * **Ollama** (local open-source models) * **Amazon Bedrock** (AWS-hosted models) * **DashScope** (Alibaba Qwen models) ### 💬 Intelligent Q\&A System The **Ask Feature** uses Retrieval Augmented Generation (RAG) to provide accurate answers based on your repository's actual code. ### 🔍 Advanced Research Capabilities **DeepResearch** conducts multi-turn investigations for complex topics, providing comprehensive analysis through multiple research iterations. ### 🌍 International Support Built-in support for 10+ languages including English, Chinese, Japanese, Spanish, Korean, Vietnamese, Portuguese, French, and Russian. ## How It Works DeepWiki clones and analyzes your repository structure, including private repos with token authentication. Creates embeddings of the code for smart retrieval and uses AI to generate documentation with context-aware understanding. Generates Mermaid diagrams to explain code relationships and system architecture. Organizes everything into a structured, navigable wiki with intelligent Q\&A capabilities. ## Architecture Overview ```mermaid graph TD A[User inputs GitHub/GitLab/BitBucket repo] --> AA{Private repo?} AA -->|Yes| AB[Add access token] AA -->|No| B[Clone Repository] AB --> B B --> C[Analyze Code Structure] C --> D[Create Code Embeddings] D --> M{Select Model Provider} M -->|Google Gemini| E1[Generate with Gemini] M -->|OpenAI| E2[Generate with OpenAI] M -->|OpenRouter| E3[Generate with OpenRouter] M -->|Local Ollama| E4[Generate with Ollama] M -->|Azure| E5[Generate with Azure] E1 --> E[Generate Documentation] E2 --> E E3 --> E E4 --> E E5 --> E D --> F[Create Visual Diagrams] E --> G[Organize as Wiki] F --> G G --> H[Interactive DeepWiki] classDef process stroke-width:2px; classDef data stroke-width:2px; classDef result stroke-width:2px; classDef decision stroke-width:2px; class A,D data; class AA,M decision; class B,C,E,F,G,AB,E1,E2,E3,E4,E5 process; class H result; ``` ## Project Structure The DeepWiki-Open project consists of: ``` src/ ├── app/ # Next.js app directory │ ├── page.tsx # Main application page │ └── api/ # API routes ├── components/ # React components │ ├── Ask.tsx # Q&A interface │ ├── Mermaid.tsx # Diagram renderer │ └── ... └── utils/ # Utility functions ``` ``` api/ ├── main.py # API entry point ├── api.py # FastAPI implementation ├── rag.py # RAG system ├── data_pipeline.py # Data processing ├── config/ # Configuration files │ ├── generator.json # Model configurations │ ├── embedder.json # Embedding settings │ └── repo.json # Repository settings └── requirements.txt # Python dependencies ``` ## Quick Start Ready to get started? Choose your preferred setup method: Get up and running in minutes with Docker Compose Install and configure manually for development You'll need API keys from your chosen AI provider (Google, OpenAI, etc.) to get started. See the [Environment Variables](/getting-started/environment-variables) guide for details. ## Next Steps Get DeepWiki running in under 5 minutes Configure your AI model providers Learn about all DeepWiki features Explore the REST API endpoints # Model Providers Setup Source: https://asyncfunc.mintlify.app/getting-started/model-providers Configure multiple AI model providers for optimal documentation generation with DeepWiki-Open DeepWiki-Open supports multiple AI model providers, each with unique strengths for different documentation needs. This guide covers setup, configuration, and optimization for all supported providers. ## Supported Providers Fast, reliable, generous free tier High-quality, detailed documentation Access to 100+ models through one API Enterprise-grade with enhanced security AWS-hosted models with enterprise features Local, private, cost-free AI models ## Google Gemini Google's Gemini models offer excellent performance with generous free tiers, making them ideal for getting started. ### Setup 1. Visit [Google AI Studio](https://makersuite.google.com/app/apikey) 2. Sign in with your Google account 3. Click "Create API Key" 4. Copy the generated key (starts with `AIza`) API key generated and copied Add to your `.env` file: ```env GOOGLE_API_KEY=AIzaSyC...your_actual_key_here ``` Never commit API keys to version control. Add `.env` to your `.gitignore`. Test the configuration by starting DeepWiki: ```bash python -m api.main # Should show: "Google API key configured successfully" ``` ### Available Models **Best for:** Most documentation tasks * **Speed:** Very fast (1-3 seconds per request) * **Quality:** Excellent for code analysis * **Context:** 1M+ tokens input, 8K output * **Cost:** Free tier: 15 RPM, 1M TPM **Ideal for:** * General repository documentation * Quick prototyping and testing * Regular development workflows * Small to medium repositories **Best for:** Stable, proven performance * **Speed:** Fast (2-4 seconds per request) * **Quality:** Very good, well-tested * **Context:** 1M+ tokens input, 8K output * **Cost:** Free tier: 15 RPM, 1M TPM **Ideal for:** * Production environments requiring stability * Projects where consistency is critical * Long-term documentation maintenance **Best for:** Detailed analysis * **Speed:** Moderate (3-6 seconds per request) * **Quality:** High detail and accuracy * **Context:** 32K tokens input/output * **Cost:** Free tier: 60 RPM **Ideal for:** * Complex architectural analysis * Detailed technical documentation * Academic or research projects ### Optimization Tips **Free tier limits:** * 15 requests per minute (Flash models) * 60 requests per minute (Pro models) * 32,000 tokens per minute **Best practices:** ```javascript { "rate_limiting": { "requests_per_minute": 12, // Stay below limit "retry_delay": 5, // Wait 5s on rate limit "batch_processing": true // Process files in batches } } ``` Gemini models have large context windows. Optimize usage: * **Large repositories**: Use full context for better understanding * **Complex files**: Include more surrounding context * **API documentation**: Include related endpoints together ```javascript { "context_settings": { "max_file_size": 100000, // 100KB per file "include_dependencies": true, // Include related files "context_overlap": 0.1 // 10% overlap between chunks } } ``` ## OpenAI OpenAI's GPT models provide exceptional quality documentation with advanced reasoning capabilities. ### Setup 1. Sign up at [OpenAI Platform](https://platform.openai.com/) 2. Add payment method (required for API access) 3. Purchase credits or set up billing 4. Navigate to [API Keys](https://platform.openai.com/api-keys) OpenAI requires a paid account. Free ChatGPT accounts cannot access the API. 1. Click "Create new secret key" 2. Add a name (e.g., "DeepWiki-Development") 3. Copy the key (starts with `sk-`) 4. Store securely (you won't see it again) API key generated and stored securely ```env OPENAI_API_KEY=sk-proj-...your_actual_key_here # Optional: Custom endpoint for compatible services OPENAI_BASE_URL=https://api.openai.com/v1 ``` ### Available Models **Best for:** State-of-the-art documentation generation with advanced reasoning * **Speed:** Fast to moderate (3-8 seconds per request) * **Quality:** Next-generation AI capabilities with superior understanding * **Context:** 256K tokens input/output (estimated) * **Temperature:** 1.0 (default for creative yet accurate responses) * **Availability:** Rolling out to API users (check availability in your region) **Ideal for:** * Cutting-edge documentation projects * Complex architectural documentation * Multi-language codebases * Advanced technical analysis * Projects requiring latest AI capabilities GPT-5 is now the default model in DeepWiki as of commit 05693d5. Ensure your OpenAI account has access to GPT-5 API. **Best for:** High-quality, comprehensive documentation * **Speed:** Moderate (5-10 seconds per request) * **Quality:** Exceptional writing and analysis * **Context:** 128K tokens input/output * **Cost:** $15/1M input tokens, $60/1M output tokens * **Temperature:** 0.7 (default) * **Top-p:** 0.8 (default) **Ideal for:** * Production documentation * Complex enterprise applications * Publication-quality content * Detailed architectural analysis * Fallback when GPT-5 is unavailable **Best for:** Enhanced reasoning and analysis * **Speed:** Moderate to slow (8-15 seconds) * **Quality:** Superior technical analysis * **Context:** 128K tokens * **Cost:** Premium pricing **Ideal for:** * Complex system analysis * Advanced architectural documentation * Research and academic projects **Best for:** Complex problem solving and analysis * **o1-preview:** Advanced reasoning, slower but thorough * **o1-mini:** Faster reasoning for simpler tasks * **Cost:** Higher than standard GPT-4 **Ideal for:** * Complex debugging documentation * System optimization analysis * Security assessment documentation **Best for:** Budget-conscious high-quality documentation * **Speed:** Fast (3-6 seconds per request) * **Quality:** Very good for most tasks * **Context:** 128K tokens * **Cost:** Lower than GPT-4o **Ideal for:** * Regular documentation updates * Smaller projects with quality requirements * Development and testing workflows ### Cost Optimization Monitor and optimize token consumption: ```javascript { "token_optimization": { "max_input_tokens": 100000, // Limit input size "target_output_tokens": 4000, // Reasonable output length "preprocessing": true, // Clean input before sending "compression": "smart" // Remove redundant content } } ``` **Cost calculation example:** * Large repository: \~200K input tokens, 8K output tokens * GPT-5 cost: Pricing to be announced (expected similar or slightly higher than GPT-4o) * GPT-4o cost: $3.00 input + $0.48 output = \$3.48 per generation * Monthly usage (10 repos): \~\$35-50/month (estimated) **Match model to task complexity:** 1. **Simple projects:** Use o4-mini for cost savings 2. **Standard projects:** Use gpt-5 for latest capabilities or gpt-4o for proven reliability 3. **Complex analysis:** Use gpt-5 for advanced reasoning or o1 series for deep insights 4. **Budget constraints:** Start with o4-mini, upgrade if needed 5. **Cutting-edge needs:** Use gpt-5 for state-of-the-art performance ```javascript { "auto_model_selection": { "repository_size": { "small": "o4-mini", // < 100 files "medium": "gpt-5", // 100-1000 files (if available, else gpt-4o) "large": "gpt-5" // 1000+ files (if available, else gpt-4o) }, "complexity_factors": [ "multiple_languages", "microservice_architecture", "complex_algorithms" ] } } ``` ## OpenRouter OpenRouter provides access to 100+ AI models through a single API, perfect for comparison and specialized needs. ### Setup 1. Sign up at [OpenRouter](https://openrouter.ai/) 2. Verify your email address 3. Add payment method for paid models 4. Navigate to the Keys section Some models are free, others require credits. Check individual model pricing. 1. Click "Create Key" 2. Name your key (e.g., "DeepWiki-Prod") 3. Copy the key (starts with `sk-or-`) 4. Optionally set spending limits OpenRouter API key generated with spending limits configured ```env OPENROUTER_API_KEY=sk-or-...your_actual_key_here ``` ### Popular Models **Models:** `anthropic/claude-3.5-sonnet`, `anthropic/claude-3-haiku` **Best for:** * Excellent code analysis and explanation * Clear, structured documentation * Complex reasoning tasks * Safe, helpful responses **Pricing:** $3/1M input tokens, $15/1M output tokens (3.5 Sonnet) **Use cases:** * API documentation generation * Code architecture explanation * Security-focused analysis **Models:** `google/gemini-pro`, `google/gemini-pro-vision` **Best for:** * Multimodal analysis (code + diagrams) * Fast processing * Good balance of quality and speed **Pricing:** Often lower than direct Google API **Use cases:** * Visual diagram analysis * Multi-language projects * Quick documentation updates **Models:** `meta-llama/llama-3-70b`, `mistralai/mixtral-8x7b` **Best for:** * Cost-effective documentation * Privacy-conscious projects * Experimentation and development **Pricing:** Usually $0.50-$2.00 per 1M tokens **Use cases:** * Large-scale documentation projects * Internal/proprietary code analysis * Development and testing **Models:** `deepseek/deepseek-coder`, `phind/phind-codellama` **Best for:** * Code-specific analysis * Programming language expertise * Technical documentation **Use cases:** * Algorithm explanation * Code optimization documentation * Programming tutorial generation ### Model Comparison Strategy Start with a reliable, fast model: ```javascript { "baseline_model": "anthropic/claude-3.5-sonnet", "test_repository": "https://github.com/small/test-repo" } ``` Compare models for your specific use case: ```javascript { "comparison_models": [ "openai/gpt-4o", "google/gemini-pro", "meta-llama/llama-3-70b" ], "evaluation_criteria": [ "accuracy", "completeness", "code_understanding", "diagram_quality", "cost_per_generation" ] } ``` Select the best model based on results: ```javascript { "selected_model": "anthropic/claude-3.5-sonnet", "reason": "Best code analysis with reasonable cost", "fallback_model": "google/gemini-pro", "reason_fallback": "Faster generation when speed needed" } ``` ## Azure OpenAI Enterprise-grade OpenAI models with enhanced security, compliance, and control. ### Setup 1. Sign in to [Azure Portal](https://portal.azure.com/) 2. Create new Azure OpenAI resource 3. Choose region (check model availability) 4. Configure pricing tier and network settings 5. Wait for deployment completion Azure OpenAI may require approval for access. Check the application status. 1. Go to Azure OpenAI Studio 2. Navigate to Deployments 3. Deploy required models (GPT-4, GPT-3.5-turbo, etc.) 4. Note deployment names and endpoints Models deployed and endpoints configured Collect the required information: * **Endpoint:** `https://your-resource.openai.azure.com` * **API Key:** From resource keys section * **API Version:** e.g., `2024-02-15-preview` ```env AZURE_OPENAI_API_KEY=abc123...your_actual_key_here AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com AZURE_OPENAI_VERSION=2024-02-15-preview ``` ### Enterprise Features **Key benefits:** * Data processed within your Azure tenant * No data used for model training * GDPR, SOC 2, HIPAA compliance available * Private networking with VNet integration **Configuration:** ```javascript { "privacy_settings": { "data_residency": "eu-west", // Keep data in specific region "logging": "minimal", // Reduce data logging "retention": "30_days", // Automatic data deletion "private_endpoint": true // Use private networking } } ``` **Built-in safety features:** * Automatic content filtering for harmful content * Customizable filter levels * Compliance with organizational policies **Configuration:** ```javascript { "content_filter": { "severity_level": "medium", // low, medium, high "categories": ["hate", "violence", "self_harm", "sexual"], "custom_blocklists": ["internal_terms"], "action_on_filter": "block" // block, warn, log } } ``` **Enterprise-grade performance:** * Dedicated capacity options * Predictable performance * Custom rate limits * Multi-region deployment **Configuration:** ```javascript { "performance_settings": { "capacity_type": "provisioned", // provisioned vs pay-per-token "throughput_units": 100, // Dedicated throughput "auto_scaling": true, // Scale with demand "load_balancing": "round_robin" // Distribute across regions } } ``` ## AWS Bedrock AWS-hosted AI models with enterprise features and AWS service integration. ### Setup 1. Ensure you have an AWS account 2. Enable AWS Bedrock in your region 3. Request access to required models (may require approval) 4. Create IAM user with Bedrock permissions Bedrock is not available in all AWS regions. Check regional availability. Create IAM policy for Bedrock access: ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream" ], "Resource": "*" } ] } ``` ```env AWS_ACCESS_KEY_ID=AKIA...your_access_key_here AWS_SECRET_ACCESS_KEY=your_secret_access_key_here AWS_REGION=us-east-1 ``` AWS credentials configured and Bedrock access verified ### Available Models **Models:** * `anthropic.claude-3-sonnet-20240229-v1:0` * `anthropic.claude-3-haiku-20240307-v1:0` * `anthropic.claude-3-opus-20240229-v1:0` **Best for:** Code analysis, documentation, safety-conscious generation **Pricing:** \$3-15 per 1M tokens depending on model **Models:** * `amazon.titan-text-express-v1` * `amazon.titan-text-lite-v1` **Best for:** Cost-effective text generation **Pricing:** \$0.50-2.00 per 1M tokens **Models:** * `ai21.j2-ultra-v1` * `ai21.j2-mid-v1` **Best for:** Long-form documentation, detailed analysis **Pricing:** Varies by model ## Ollama (Local Models) Run AI models locally for complete privacy, cost control, and offline capability. ### Setup ```bash # Using Homebrew brew install ollama # Or download installer from https://ollama.ai curl -fsSL https://ollama.ai/install.sh | sh ``` ```bash # Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Start as system service sudo systemctl enable ollama sudo systemctl start ollama ``` Download and install from [Ollama website](https://ollama.ai/download/windows) Or use Windows Subsystem for Linux (WSL) with Linux instructions. ```bash # Run Ollama in Docker docker run -d \ -v ollama:/root/.ollama \ -p 11434:11434 \ --name ollama \ ollama/ollama # Pull and run a model docker exec -it ollama ollama run llama3:8b ``` Download models you want to use: ```bash # Recommended models for documentation ollama pull qwen3:8b # Excellent for code ollama pull llama3:8b # Good general model ollama pull qwen3:1.7b # Lightweight option # Verify installation ollama list ``` Models downloaded and verified ```env OLLAMA_HOST=http://localhost:11434 ``` For remote Ollama servers: ```env OLLAMA_HOST=http://ollama-server.internal:11434 ``` ### Model Selection **qwen3:8b (Recommended)** * **Size:** 4.8GB download * **RAM:** 8GB required * **Strengths:** Excellent code understanding, multilingual * **Best for:** Most documentation tasks **deepseek-coder:6.7b** * **Size:** 3.8GB download * **RAM:** 6GB required * **Strengths:** Specialized for code generation and analysis * **Best for:** Technical documentation, API docs **llama3:8b** * **Size:** 4.7GB download * **RAM:** 8GB required * **Strengths:** Well-balanced, good reasoning * **Best for:** General documentation, explanations **llama3:70b** (High-end) * **Size:** 40GB download * **RAM:** 64GB+ required * **Strengths:** Excellent quality, very detailed * **Best for:** High-quality documentation with powerful hardware **qwen3:1.7b** * **Size:** 1.0GB download * **RAM:** 2GB required * **Strengths:** Fast, efficient, good for simple tasks * **Best for:** Quick documentation, low-resource environments **tinyllama:1.1b** * **Size:** 637MB download * **RAM:** 1GB required * **Strengths:** Very fast, minimal resources * **Best for:** Testing, simple explanations ### Performance Optimization **Minimum specs by model size:** * **1B-3B models:** 4GB RAM, any modern CPU * **7B-8B models:** 8GB RAM, modern CPU (preferably 8+ cores) * **13B models:** 16GB RAM, high-performance CPU * **70B+ models:** 64GB+ RAM, server-grade hardware **GPU acceleration (optional):** ```bash # Enable GPU support (NVIDIA) ollama pull llama3:8b CUDA_VISIBLE_DEVICES=0 ollama run llama3:8b # Check GPU usage nvidia-smi ``` **Optimize memory usage:** ```bash # Set memory limits export OLLAMA_MAX_LOADED_MODELS=2 export OLLAMA_MAX_QUEUE=4 # Configure model parameters export OLLAMA_NUM_PARALLEL=2 export OLLAMA_FLASH_ATTENTION=1 ``` **Model configuration:** ```javascript { "model_config": { "num_ctx": 4096, // Context window size "num_predict": 2048, // Max output tokens "temperature": 0.7, // Randomness "top_p": 0.8, // Nucleus sampling "repeat_penalty": 1.1 // Avoid repetition } } ``` ## Multi-Provider Strategy ### Provider Selection Matrix | Project Type | Primary | Fallback | Reason | | --------------- | ------------- | --------------------- | ----------------------- | | **Open Source** | Google Gemini | OpenRouter | Free tier, good quality | | **Enterprise** | Azure OpenAI | OpenAI | Security, compliance | | **Startup** | OpenRouter | Google | Cost optimization | | **Research** | OpenAI GPT-4o | Claude via OpenRouter | Highest quality | | **Personal** | Ollama | Google | Privacy, no cost | | Size | Primary | Reason | | ----------------------- | ------------------- | ---------------------------- | | **Small (\<100 files)** | Google Gemini Flash | Fast, sufficient quality | | **Medium (100-1000)** | OpenAI GPT-4o | Better architecture analysis | | **Large (1000+)** | Claude 3.5 Sonnet | Excellent at large contexts | | **Enterprise** | Azure OpenAI | Security and compliance | | Use Case | Best Provider | Model | Why | | --------------------- | -------------------------- | ----------------- | ------------------- | | **API Documentation** | OpenAI | GPT-4o | Structured output | | **Architecture Docs** | Anthropic (via OpenRouter) | Claude 3.5 Sonnet | System thinking | | **Code Comments** | Google | Gemini Flash | Speed + accuracy | | **Security Docs** | Azure OpenAI | GPT-4o | Enterprise security | | **Cost-Conscious** | Ollama | Qwen3:8b | No API costs | ### Auto-Failover Configuration ```javascript { "provider_strategy": { "primary": { "provider": "google", "model": "gemini-2.0-flash", "timeout": 30 }, "fallback_chain": [ { "provider": "openrouter", "model": "anthropic/claude-3.5-sonnet", "condition": "rate_limit_exceeded" }, { "provider": "ollama", "model": "qwen3:8b", "condition": "network_error" } ], "retry_logic": { "max_retries": 3, "backoff_factor": 2, "jitter": true } } } ``` ## Next Steps Set up access control for your DeepWiki deployment Create your first repository documentation Deploy with multiple providers for production use Integrate provider selection into your workflows # Quick Start Guide Source: https://asyncfunc.mintlify.app/getting-started/quick-start Get DeepWiki-Open running in under 5 minutes with Docker or manual setup Get DeepWiki-Open up and running quickly with either Docker (recommended) or manual setup. ## Docker Setup (Recommended) The fastest way to get started with DeepWiki-Open is using Docker Compose. ```bash git clone https://github.com/AsyncFuncAI/deepwiki-open.git cd deepwiki-open ``` Repository cloned successfully Create a `.env` file with your API keys: ```bash .env # Required: Choose at least one AI provider GOOGLE_API_KEY=your_google_api_key OPENAI_API_KEY=your_openai_api_key # Optional: Additional providers OPENROUTER_API_KEY=your_openrouter_api_key AZURE_OPENAI_API_KEY=your_azure_openai_api_key AZURE_OPENAI_ENDPOINT=your_azure_openai_endpoint AZURE_OPENAI_VERSION=your_azure_openai_version OLLAMA_HOST=http://localhost:11434 ``` At minimum, you need either `GOOGLE_API_KEY` or `OPENAI_API_KEY` to get started. ```bash docker-compose up ``` This will start both the backend API server (port 8001) and frontend web app (port 3000). Open your browser to [http://localhost:3000](http://localhost:3000) You should see the DeepWiki interface ready to generate your first wiki! ## Manual Setup For development or custom configurations, you can set up DeepWiki manually. Create a `.env` file in the project root: ```bash Minimal Setup GOOGLE_API_KEY=your_google_api_key OPENAI_API_KEY=your_openai_api_key ``` ```bash Complete Setup # AI Provider Keys GOOGLE_API_KEY=your_google_api_key OPENAI_API_KEY=your_openai_api_key OPENROUTER_API_KEY=your_openrouter_api_key # Azure OpenAI (Enterprise) AZURE_OPENAI_API_KEY=your_azure_openai_api_key AZURE_OPENAI_ENDPOINT=your_azure_openai_endpoint AZURE_OPENAI_VERSION=your_azure_openai_version # Local/External Services OLLAMA_HOST=http://localhost:11434 OPENAI_BASE_URL=https://api.openai.com/v1 # Server Configuration PORT=8001 SERVER_BASE_URL=http://localhost:8001 # Authorization (Optional) DEEPWIKI_AUTH_MODE=false DEEPWIKI_AUTH_CODE=your_secret_code # Logging LOG_LEVEL=INFO LOG_FILE_PATH=./api/logs/application.log ``` ```bash # Install Python dependencies pip install -r api/requirements.txt # Start the API server python -m api.main ``` The API server will start on port 8001 by default. Open a new terminal and run: ```bash npm install npm run dev ``` ```bash yarn install yarn dev ``` The frontend will be available at [http://localhost:3000](http://localhost:3000) ## Generate Your First Wiki In the DeepWiki interface, enter a GitHub, GitLab, or BitBucket repository URL: * `https://github.com/openai/codex` * `https://github.com/microsoft/autogen` * `https://gitlab.com/gitlab-org/gitlab` * `https://bitbucket.org/redradish/atlassian_app_versions` Start with a smaller repository for your first test to see faster results. If accessing a private repository: 1. Click "+ Add access tokens" 2. Enter your GitHub, GitLab, or BitBucket personal access token Ensure your token has appropriate repository access permissions. Choose your preferred AI model provider and model: * **Default**: `gemini-2.0-flash` * **Also available**: `gemini-1.5-flash`, `gemini-1.0-pro` * **Best for**: Fast, reliable documentation generation * **Default**: `gpt-4o` * **Also available**: `o1`, `o3`, `o4-mini` * **Best for**: High-quality, detailed documentation * **Access to**: Claude, Llama, Mistral, and 100+ models * **Best for**: Trying different models without multiple API keys Click "Generate Wiki" and watch the magic happen! Generation time varies by repository size. Smaller repos take 30 seconds to 2 minutes, while larger ones may take 5-10 minutes. ## API Key Setup 1. Visit [Google AI Studio](https://makersuite.google.com/app/apikey) 2. Create a new API key 3. Add to `.env` as `GOOGLE_API_KEY=your_key_here` Google Gemini offers generous free tier limits and fast performance. 1. Visit [OpenAI Platform](https://platform.openai.com/api-keys) 2. Create a new API key 3. Add to `.env` as `OPENAI_API_KEY=your_key_here` OpenAI requires a paid account for API access. Check your usage limits. 1. Sign up at [OpenRouter](https://openrouter.ai/) 2. Get your API key from the dashboard 3. Add to `.env` as `OPENROUTER_API_KEY=your_key_here` OpenRouter provides access to 100+ models through one API, great for experimentation. 1. Create an Azure OpenAI resource in [Azure Portal](https://portal.azure.com/) 2. Deploy a model (e.g., GPT-4o) 3. Get your endpoint, API key, and API version 4. Add all three to your `.env` file Azure OpenAI is ideal for enterprise deployments with enhanced security. ## Verification Visit [http://localhost:8001/docs](http://localhost:8001/docs) to see the FastAPI documentation and test endpoints. ```bash # Test API health curl http://localhost:8001/health ``` The frontend at [http://localhost:3000](http://localhost:3000) should show: * Repository input field * Model selection dropdown * Generate Wiki button Check that your environment variables are loaded correctly: ```bash # In the API logs, you should see python -m api.main # INFO - Starting Streaming API on port 8001 # No warnings about missing API keys for your chosen provider ``` ## Next Steps Explore wiki generation, Ask feature, and DeepResearch Set up multiple AI providers and choose the best models Deploy DeepWiki for production use Integrate DeepWiki into your existing workflows ## Troubleshooting * Ensure the backend is running on port 8001 * Check firewall settings * Verify no other services are using port 8001 * Check `.env` file exists in project root * Verify API keys are correctly formatted * Ensure no extra spaces in environment variables * Double-check API key accuracy * Verify API key permissions and quotas * Test API key with provider's documentation For more detailed troubleshooting, see the [Troubleshooting Guide](/reference/troubleshooting). # Ask Feature Guide Source: https://asyncfunc.mintlify.app/guides/ask-feature Learn how to use the RAG-powered Q&A system to interact with your code documentation # Ask Feature Guide The Ask feature is a powerful RAG (Retrieval Augmented Generation) system that enables natural language conversations with your code documentation. It combines advanced retrieval techniques with AI to provide accurate, contextual answers about your codebase. ## What is the Ask Feature? The Ask feature transforms your static documentation into an interactive knowledge base where you can: * **Ask natural questions** about your code, APIs, and documentation * **Get contextual answers** with relevant code snippets and explanations * **Maintain conversation history** for follow-up questions * **Access real-time streaming responses** for immediate feedback * **Query programmatically** via API for integration into other tools The Ask feature uses Retrieval Augmented Generation (RAG) to combine the best of both worlds: precise information retrieval from your docs with the natural language capabilities of large language models. ## How RAG Works with Code ### 1. Document Indexing When your wiki is generated, DeepWikiOpen creates semantic embeddings of your documentation: ```mermaid graph TD A[Code Repository] --> B[Documentation Generation] B --> C[Text Chunking] C --> D[Embedding Creation] D --> E[Vector Database] E --> F[Searchable Index] ``` ### 2. Query Processing When you ask a question, the system: 1. **Converts your question** into a semantic embedding 2. **Searches the vector database** for relevant documentation chunks 3. **Retrieves contextual information** from multiple sources 4. **Generates a comprehensive answer** using the retrieved context ```typescript Query Processing Flow // 1. Question embedding const questionEmbedding = await embedQuery(userQuestion); // 2. Similarity search const relevantChunks = await vectorStore.similaritySearch( questionEmbedding, { k: 5, threshold: 0.7 } ); // 3. Context preparation const context = relevantChunks.map(chunk => ({ content: chunk.content, source: chunk.metadata.source, score: chunk.score })); // 4. Response generation const response = await generateResponse(userQuestion, context); ``` ```python Python Example # Vector similarity search import numpy as np from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') def find_relevant_docs(question, doc_embeddings, docs, top_k=5): question_embedding = model.encode([question]) # Calculate cosine similarity similarities = np.dot(question_embedding, doc_embeddings.T) top_indices = np.argsort(similarities[0])[-top_k:][::-1] return [docs[i] for i in top_indices] ``` ## Using the Ask Interface ### Basic Question Interface The Ask feature provides an intuitive chat interface in your generated wiki: Ask Feature Interface ### Conversation Flow Type your question in natural language in the Ask input field ``` How do I authenticate users in this API? ``` Watch as the system processes your query with streaming responses ``` Searching documentation... ⚡ Found 3 relevant sections... 📚 Generating response... 🤖 ``` Get a comprehensive answer with code examples and source references Continue the conversation with contextual follow-up queries ## Types of Questions That Work Best ### ✅ Excellent Question Types **Examples:** * "How do I set up authentication?" * "How to handle errors in the API?" * "How do I deploy this application?" **Examples:** * "What is the UserService class?" * "What does the config.json file contain?" * "What are the available endpoints?" **Examples:** * "Where is the database connection configured?" * "Where are the API routes defined?" * "Which file contains the main application logic?" **Examples:** * "What's the recommended way to handle validation?" * "Best practices for error handling?" * "How should I structure my tests?" ### ❌ Less Effective Question Types These types of questions may not yield optimal results: * Vague questions without context ("How does this work?") * Questions about code not in your documentation * Highly specific implementation details not documented * Questions requiring real-time data or external information ## Example Questions and Responses ### API Documentation Questions **User:** "How do I authenticate requests to the API?" **AI:** Based on your API documentation, authentication is handled using JWT tokens. Here's how to authenticate requests: ## Authentication Method ```typescript // Include the JWT token in the Authorization header const response = await fetch('/api/users', { headers: { 'Authorization': `Bearer ${jwtToken}`, 'Content-Type': 'application/json' } }); ``` ## Getting a Token First, authenticate with your credentials: ```typescript const authResponse = await fetch('/api/auth/login', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ email: 'user@example.com', password: 'yourpassword' }) }); const { token } = await authResponse.json(); ``` **Sources:** * `/docs/api/authentication.md` * `/examples/auth-example.js` ### Code Structure Questions **User:** "What's the structure of the database models?" **AI:** Your application uses several database models organized in the `/models` directory: ## Core Models ```typescript // User Model interface User { id: string; email: string; name: string; createdAt: Date; updatedAt: Date; } // Project Model interface Project { id: string; title: string; description: string; ownerId: string; members: string[]; } ``` ## Model Relationships ```mermaid graph TD A[User] -->|owns| B[Project] A -->|member of| B B -->|contains| C[Task] C -->|assigned to| A ``` **Sources:** * `/models/User.ts` * `/models/Project.ts` * `/docs/database-schema.md` ## Conversation History and Context ### Context Retention The Ask feature maintains conversation context throughout your session: ```typescript // Example conversation flow const conversation = [ { role: 'user', content: 'How do I set up the database?', timestamp: '2024-01-15T10:00:00Z' }, { role: 'assistant', content: 'To set up the database, you need to...', sources: ['/docs/setup.md', '/config/database.js'], timestamp: '2024-01-15T10:00:05Z' }, { role: 'user', content: 'What about migrations?', // Context: Still about database setup timestamp: '2024-01-15T10:02:00Z' } ]; ``` ### Follow-up Question Examples **Q:** "What API endpoints are available?" **A:** Lists all available endpoints with descriptions... **Follow-up:** "How do I use the user endpoint?" **Follow-up:** "What's the rate limiting for these endpoints?" **Follow-up:** "Are there any authentication requirements?" **Q:** "How does error handling work?" **A:** Explains the error handling strategy with examples... **Follow-up:** "What about async errors?" **Follow-up:** "How do I log these errors?" **Follow-up:** "Are there custom error types?" ## Streaming Responses and Real-time Interaction ### Streaming Implementation The Ask feature provides real-time streaming responses for immediate feedback: ```typescript Client-Side Streaming const askQuestion = async (question: string) => { const response = await fetch('/api/ask', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ question }) }); const reader = response.body?.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { const data = JSON.parse(line.slice(6)); updateUI(data); // Real-time UI updates } } } }; ``` ```python Server-Side Streaming from fastapi import FastAPI from fastapi.responses import StreamingResponse import json app = FastAPI() @app.post("/api/ask") async def ask_question(question: dict): def generate_response(): # Search for relevant documents yield f"data: {json.dumps({'type': 'status', 'message': 'Searching...'})}\n\n" relevant_docs = search_documents(question['question']) yield f"data: {json.dumps({'type': 'status', 'message': f'Found {len(relevant_docs)} relevant sections'})}\n\n" # Generate streaming response for chunk in generate_answer(question['question'], relevant_docs): yield f"data: {json.dumps({'type': 'content', 'chunk': chunk})}\n\n" yield f"data: {json.dumps({'type': 'complete'})}\n\n" return StreamingResponse(generate_response(), media_type="text/plain") ``` ### Visual Feedback Streaming Response Animation The interface provides visual indicators for: * **Search progress** with animated loading states * **Token-by-token streaming** for real-time response building * **Source highlighting** as references are found * **Completion status** when the response is finished ## API Usage for Programmatic Q\&A ### REST API Endpoints ```bash cURL Example curl -X POST "https://your-wiki.com/api/ask" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{ "question": "How do I implement user authentication?", "conversation_id": "conv_123", "include_sources": true, "stream": false }' ``` ```javascript JavaScript SDK import { DeepWikiClient } from '@deepwiki/client'; const client = new DeepWikiClient({ apiKey: 'your-api-key', baseUrl: 'https://your-wiki.com' }); // Simple question const response = await client.ask({ question: "How do I set up the development environment?", includeSource: true }); console.log(response.answer); console.log(response.sources); // Streaming question const stream = client.askStreaming({ question: "Explain the authentication flow", conversationId: "conv_123" }); for await (const chunk of stream) { if (chunk.type === 'content') { process.stdout.write(chunk.text); } } ``` ```python Python SDK from deepwiki import DeepWikiClient client = DeepWikiClient( api_key="your-api-key", base_url="https://your-wiki.com" ) # Synchronous question response = client.ask( question="What are the available API endpoints?", include_sources=True ) print(response.answer) for source in response.sources: print(f"Source: {source.file} (Score: {source.relevance_score})") # Asynchronous streaming async for chunk in client.ask_streaming( question="How does error handling work?", conversation_id="conv_456" ): if chunk.type == "content": print(chunk.text, end="", flush=True) ``` ### API Response Format ```json { "id": "ask_123456789", "question": "How do I authenticate users?", "answer": "To authenticate users in this API, you use JWT tokens...", "sources": [ { "file": "/docs/authentication.md", "title": "Authentication Guide", "relevance_score": 0.95, "excerpt": "JWT tokens are used for authentication..." }, { "file": "/examples/auth.js", "title": "Authentication Example", "relevance_score": 0.87, "excerpt": "const token = jwt.sign(payload, secret);" } ], "conversation_id": "conv_123", "timestamp": "2024-01-15T10:30:00Z", "processing_time_ms": 1250 } ``` ## Best Practices for Effective Questions ### 🎯 Writing Better Questions **Instead of:** "How does this work?" **Try:** "How does user authentication work in the login endpoint?" **Instead of:** "Fix this error" **Try:** "How do I handle the 'Database connection failed' error when starting the server?" **Good:** "What configuration options are available for the email service?" **Less Good:** "How do I integrate with an undocumented third-party service?" **Better:** "How do I format a POST request to create a new user with email and password fields?" **vs:** "How do I create users?" ### 📝 Question Templates ``` How do I [specific action] in [specific context]? Examples: - How do I configure Redis caching in the production environment? - How do I validate user input in the registration form? - How do I set up database migrations for the User model? ``` ``` What is [specific component/concept] and how does it [function/relate]? Examples: - What is the AuthMiddleware class and how does it validate tokens? - What is the config.database object and how does it connect to PostgreSQL? - What is the UserService and how does it interact with the database? ``` ``` Where can I find [specific functionality/configuration] for [context]? Examples: - Where can I find the API rate limiting configuration? - Where are the email templates stored for password reset? - Where is the logging configuration for production deployment? ``` ``` What's the recommended way to [specific task] in this codebase? Examples: - What's the recommended way to handle async errors in API endpoints? - What's the recommended way to structure test files for components? - What's the recommended way to manage environment variables? ``` ### 💡 Advanced Query Techniques Break complex questions into parts: **Instead of:** "How do I set up the database, configure authentication, and deploy to production?" **Try:** 1. "How do I set up the database for development?" 2. "How do I configure JWT authentication?" (follow-up) 3. "What's the deployment process for production?" (follow-up) Build on previous answers: **After asking about authentication:** * "How do I test these authentication endpoints?" * "What error codes does this authentication return?" * "How do I refresh expired tokens mentioned earlier?" Request concrete examples: **Good questions:** * "Can you show me an example of making a POST request to create a user?" * "What does a typical error response look like?" * "Can you provide a sample configuration file?" ## Limitations and Troubleshooting ### Known Limitations **Current limitations of the Ask feature:** * **Documentation Scope**: Only searches indexed documentation and code files * **Real-time Data**: Cannot access live databases or external APIs * **Code Execution**: Cannot run or test code, only explain existing code * **Version Specificity**: May not distinguish between different versions of your code * **Complex Logic**: May struggle with highly complex business logic without clear documentation ### Common Issues and Solutions **Symptoms:** The AI responds with "I couldn't find relevant information" **Solutions:** * Check if your question relates to documented features * Try rephrasing with different terminology * Ensure your documentation was properly indexed * Ask more specific questions about documented components **Example Fix:** * ❌ "How do I use advanced features?" * ✅ "How do I configure the email notification system?" **Symptoms:** The response doesn't match your actual codebase **Solutions:** * Verify your documentation is up-to-date * Re-index your documentation if you've made recent changes * Provide more context in your question * Ask follow-up questions for clarification **Example:** ``` Q: "The authentication process you described doesn't match my code" Follow-up: "Can you show me specifically how JWT tokens are validated in the middleware?" ``` **Symptoms:** Questions take a long time to process **Solutions:** * Check if you're asking very broad questions (try being more specific) * Verify your internet connection for streaming responses * Consider if your documentation is very large (may require longer processing) * Use the API with appropriate timeout settings **Optimization:** ```javascript // Set appropriate timeouts for API calls const response = await client.ask({ question: "How do I configure logging?", timeout: 30000 // 30 seconds }); ``` **Symptoms:** Follow-up questions don't reference previous conversation **Solutions:** * Ensure you're using the same conversation session * Check that cookies/session storage is enabled * For API usage, pass the same `conversation_id` * Start a new conversation if context becomes confusing **API Example:** ```javascript // Maintain conversation context const conversationId = 'conv_' + Date.now(); const response1 = await client.ask({ question: "How do I set up authentication?", conversationId }); const response2 = await client.ask({ question: "What about authorization for these endpoints?", conversationId // Same ID maintains context }); ``` ### Performance Optimization Tips **Optimize your questions for better performance:** ```markdown ✅ Good Performance: - Specific questions about documented features - Questions referencing specific file names or classes - Well-structured questions with clear context ❌ Poor Performance: - Extremely broad questions requiring full codebase analysis - Questions about undocumented or external dependencies - Vague questions without specific context ``` **Optimize API calls:** ```javascript // Use appropriate parameters const response = await client.ask({ question: "How do I configure the database?", maxSources: 3, // Limit sources for faster processing includeSourceText: false, // Skip full source text if not needed stream: true // Use streaming for better perceived performance }); ``` **Manage conversations efficiently:** ```javascript // Clear conversation context when changing topics if (isNewTopic(question)) { conversationId = generateNewConversationId(); } // Archive old conversations to improve performance if (conversationAge > MAX_CONVERSATION_AGE) { await client.archiveConversation(conversationId); conversationId = generateNewConversationId(); } ``` ### Debug Mode Enable debug mode to troubleshoot issues: ```javascript Client Debug const client = new DeepWikiClient({ apiKey: 'your-api-key', debug: true // Enables detailed logging }); const response = await client.ask({ question: "How does caching work?", debug: true // Request debug information }); console.log('Debug Info:', response.debug); // Shows: query processing time, sources found, embedding similarity scores ``` ```python Python Debug client = DeepWikiClient( api_key="your-api-key", debug=True ) response = client.ask( question="How does error handling work?", debug=True ) print("Sources found:", len(response.sources)) print("Processing time:", response.processing_time_ms) print("Query embedding dimensions:", response.debug.embedding_dimensions) ``` *** **Still having issues?** * Check the [API Documentation](/api-reference) for detailed endpoint information * Review [Common Patterns](/guides/common-patterns) for question examples * Contact support with specific error messages and question examples * Join our community Discord for real-time help ## Next Steps Detailed API documentation for programmatic access Learn how to integrate Ask into your applications Explore advanced Ask feature capabilities Tips and patterns for effective documentation Q\&A # null Source: https://asyncfunc.mintlify.app/guides/deep-research # Deep Research Guide Deep Research is DeepWikiOpen's advanced multi-turn analysis feature that enables comprehensive investigation of complex topics through iterative AI-powered research cycles. ## What is Deep Research? Deep Research is an intelligent research methodology that breaks down complex questions into manageable components, conducting thorough analysis through multiple iterative turns. Unlike traditional single-response queries, Deep Research employs a systematic approach: 1. **Planning Phase**: AI analyzes the question and creates a structured research plan 2. **Execution Phase**: Multiple research turns explore different aspects 3. **Synthesis Phase**: Findings are consolidated into comprehensive conclusions ### Core Components * **Multi-turn Analysis**: Sequential research iterations building on previous findings * **Adaptive Planning**: Research direction adjusts based on discovered information * **Source Integration**: Combines multiple authoritative sources * **Progressive Refinement**: Each turn deepens understanding and fills knowledge gaps ## Deep Research vs Regular Ask | Feature | Regular Ask | Deep Research | | -------------- | ------------------------------- | ----------------------------- | | **Scope** | Single focused question | Complex, multi-faceted topics | | **Process** | One-turn response | Multi-turn iterative analysis | | **Depth** | Surface-level or specific facts | Comprehensive investigation | | **Sources** | Limited source consultation | Extensive source integration | | **Time** | Immediate response | Extended research process | | **Complexity** | Simple to moderate | High complexity topics | ### When to Use Deep Research **Choose Deep Research for:** * Complex historical events with multiple perspectives * Scientific topics requiring interdisciplinary analysis * Market research spanning multiple industries * Policy analysis with various stakeholder viewpoints * Technical investigations requiring multiple sources * Comparative studies across different domains **Choose Regular Ask for:** * Quick fact checks * Simple definitions * Specific date/number queries * Direct procedural questions * Basic how-to inquiries ## The Deep Research Process ### 1. Research Planning The AI begins by analyzing your question and creating a structured research plan: ```json { "research_question": "How has climate change affected global food security?", "research_plan": [ "Analyze climate change impacts on agriculture", "Examine regional variations in food production", "Investigate supply chain disruptions", "Study adaptation strategies and technologies", "Evaluate policy responses and effectiveness" ], "estimated_turns": 4-6, "key_sources": ["IPCC reports", "FAO data", "agricultural journals"] } ``` ### 2. Iterative Research Turns Each research turn focuses on specific aspects: **Turn 1: Foundation Building** * Establish core concepts and definitions * Identify primary data sources * Map the research landscape **Turn 2-N: Deep Dive Analysis** * Explore specific subtopics * Gather supporting evidence * Identify patterns and connections **Final Turn: Synthesis** * Integrate findings across all turns * Draw comprehensive conclusions * Highlight limitations and future research needs ### 3. Progress Updates Throughout the process, you receive real-time updates: ``` 🔍 Research Progress: Turn 2/5 📊 Sources analyzed: 23 🎯 Current focus: Regional climate impacts on crop yields ⏱️ Estimated completion: 3-4 minutes ``` ## Types of Questions Suitable for Deep Research ### Scientific & Technical Topics **Example**: "What are the latest developments in quantum computing and their potential applications?" **Research Approach**: * Current quantum computing technologies * Recent breakthroughs and publications * Industry applications and use cases * Challenges and limitations * Future prospects and timeline ### Historical Analysis **Example**: "How did the Silk Road influence cultural exchange between East and West?" **Research Approach**: * Historical context and timeline * Trade routes and major cities * Cultural, religious, and technological exchanges * Economic impacts on civilizations * Long-term historical significance ### Policy & Social Issues **Example**: "What are the effects of universal basic income pilot programs worldwide?" **Research Approach**: * UBI pilot program overview * Implementation models across countries * Economic outcomes and metrics * Social and behavioral impacts * Policy implications and scalability ### Business & Market Analysis **Example**: "How is artificial intelligence transforming the healthcare industry?" **Research Approach**: * AI applications in healthcare sectors * Market size and growth projections * Key players and technologies * Regulatory challenges and compliance * Patient outcomes and case studies ## Research Stages and Iteration Process ### Stage 1: Question Decomposition ```mermaid graph TD A[Complex Question] --> B[Identify Key Components] B --> C[Create Research Subtopics] C --> D[Prioritize Investigation Areas] D --> E[Establish Success Criteria] ``` ### Stage 2: Information Gathering Each turn systematically explores different aspects: 1. **Primary Source Analysis**: Academic papers, official reports, government data 2. **Cross-referencing**: Validating information across multiple sources 3. **Trend Identification**: Recognizing patterns and emerging themes 4. **Gap Analysis**: Identifying areas needing additional investigation ### Stage 3: Synthesis and Conclusions ```mermaid graph LR A[Turn 1 Findings] --> D[Synthesis Engine] B[Turn 2 Findings] --> D C[Turn N Findings] --> D D --> E[Integrated Analysis] E --> F[Comprehensive Conclusions] ``` ## Example Deep Research Session ### Question: "How has remote work adoption changed business operations post-pandemic?" #### Research Plan ``` 📋 Research Plan (5 turns estimated) 1. Pre-pandemic vs current remote work statistics 2. Technology infrastructure adaptations 3. Management and productivity impacts 4. Employee experience and workplace culture 5. Long-term business strategy changes ``` #### Turn 1: Statistical Foundation ``` 🔍 Turn 1/5: Baseline Statistics 📊 Key Findings: - Remote work adoption increased from 24% (2019) to 71% (2023) - Hybrid models now represent 49% of work arrangements - Technology spending on collaboration tools up 300% 📚 Sources: Bureau of Labor Statistics, McKinsey Global Institute, Gartner ⏱️ Next: Technology infrastructure analysis ``` #### Turn 2: Technology Infrastructure ``` 🔍 Turn 2/5: Technology Adaptations 🛠️ Key Findings: - Cloud infrastructure adoption accelerated by 3-5 years - Cybersecurity spending increased 40% for remote access - Video conferencing usage up 2000% since 2020 💡 Emerging Trend: AI-powered productivity tools gaining adoption ⏱️ Next: Management impact analysis ``` #### Turn 3: Management Evolution ``` 🔍 Turn 3/5: Management & Productivity 📈 Key Findings: - 67% of managers report improved productivity metrics - Performance evaluation shifted to outcome-based models - Middle management roles evolved toward coaching ⚠️ Challenge: Maintaining team cohesion and company culture ⏱️ Next: Employee experience deep dive ``` #### Turn 4: Employee Experience ``` 🔍 Turn 4/5: Employee Perspectives 👥 Key Findings: - 87% prefer hybrid/remote work options - Work-life balance improved for 74% of workers - Career development concerns increased 23% 🔄 Adaptation: New onboarding and mentorship programs ⏱️ Next: Strategic business implications ``` #### Turn 5: Strategic Business Changes ``` 🔍 Turn 5/5: Long-term Strategy Impact 🏢 Key Findings: - Office space reduction: average 30% smaller footprint - Geographic talent acquisition expanded 340% - Cost savings average $11K per remote employee annually 🎯 Strategic Shift: From location-based to talent-based hiring ⏱️ Finalizing comprehensive analysis... ``` #### Final Synthesis ``` 📋 COMPREHENSIVE ANALYSIS: Remote Work Business Transformation 🔍 EXECUTIVE SUMMARY The pandemic-driven shift to remote work has fundamentally transformed business operations, creating a permanent change in workplace dynamics with significant strategic implications. 📊 KEY TRANSFORMATIONS 1. Operational Model: Hybrid-first approach now standard (71% adoption) 2. Technology Infrastructure: Accelerated digital transformation ($2.3T investment) 3. Management Philosophy: Outcome-based performance evaluation 4. Talent Strategy: Geographic boundaries eliminated for recruitment 5. Cost Structure: Average 20-30% reduction in operational expenses 🎯 STRATEGIC IMPLICATIONS - Companies maintaining remote flexibility report 25% better talent retention - Productivity gains sustained at 13-50% above pre-pandemic levels - New competitive advantages through expanded talent pools - Risk mitigation through distributed workforce models ⚠️ ONGOING CHALLENGES - Maintaining organizational culture and team cohesion - Ensuring equitable career development opportunities - Managing cybersecurity in distributed environments - Balancing flexibility with collaboration needs 🔮 FUTURE OUTLOOK The remote work transformation appears permanent, with 83% of companies planning long-term hybrid models. Success factors include: - Investment in collaborative technologies - Adaptive management training - Culture-building initiatives - Flexible policy frameworks 📚 Research based on 47 sources including McKinsey, Deloitte, Harvard Business Review, and primary data from Fortune 500 companies. ``` ## API Usage for Automated Research ### Basic Deep Research Request ```python import requests # Start deep research session response = requests.post('https://api.deepwikiopen.com/v1/deep-research', { 'question': 'How is blockchain technology affecting supply chain management?', 'max_turns': 6, 'focus_areas': ['technology', 'case_studies', 'challenges', 'future_trends'], 'source_preferences': ['academic', 'industry_reports', 'case_studies'] }) research_id = response.json()['research_id'] ``` ### Monitor Research Progress ```python # Check research status status_response = requests.get(f'https://api.deepwikiopen.com/v1/deep-research/{research_id}/status') progress = status_response.json() print(f"Progress: {progress['current_turn']}/{progress['total_turns']}") print(f"Current focus: {progress['current_focus']}") print(f"Sources analyzed: {progress['sources_count']}") ``` ### Retrieve Results ```python # Get final results results_response = requests.get(f'https://api.deepwikiopen.com/v1/deep-research/{research_id}/results') research_data = results_response.json() # Access structured findings for turn in research_data['turns']: print(f"Turn {turn['number']}: {turn['focus']}") print(f"Key findings: {turn['key_findings']}") print(f"Sources: {len(turn['sources'])}") # Get final synthesis synthesis = research_data['synthesis'] print(f"Comprehensive analysis: {synthesis['executive_summary']}") ``` ### Advanced API Configuration ```python # Advanced research configuration advanced_config = { 'question': 'Impact of AI on financial services', 'research_parameters': { 'depth_level': 'comprehensive', # standard, comprehensive, exhaustive 'time_constraint': '15_minutes', 'source_diversity': 'high', 'regional_focus': ['north_america', 'europe', 'asia'], 'time_range': '2020-2024', 'confidence_threshold': 0.8 }, 'output_preferences': { 'include_citations': True, 'visual_elements': ['charts', 'timelines'], 'summary_length': 'detailed', 'technical_level': 'expert' } } ``` ## Performance Considerations ### Research Duration | Question Complexity | Typical Turns | Duration | Sources | | ------------------- | ------------- | ------------- | -------- | | **Simple** | 2-3 turns | 2-4 minutes | 10-20 | | **Moderate** | 3-5 turns | 5-8 minutes | 25-50 | | **Complex** | 5-8 turns | 8-15 minutes | 50-100+ | | **Exhaustive** | 8-12 turns | 15-25 minutes | 100-200+ | ### Optimization Strategies **For Faster Results:** * Use focused question framing * Specify known time constraints * Limit source types if appropriate * Set explicit scope boundaries **For Deeper Analysis:** * Allow flexible turn limits * Include interdisciplinary sources * Enable follow-up question generation * Request comprehensive citations ### Resource Usage ```python # Monitor token consumption research_stats = { 'input_tokens': 2500, # Question + context 'processing_tokens': 45000, # Multi-turn analysis 'output_tokens': 8000, # Final synthesis 'total_cost': '$0.32' # Estimated API cost } ``` ## Best Practices and Optimization Tips ### Question Formulation **✅ Effective Deep Research Questions:** * "How has cryptocurrency regulation evolved globally and what are the implications for adoption?" * "What are the environmental and economic impacts of vertical farming technologies?" * "How do different countries approach data privacy legislation and enforcement?" **❌ Less Suitable for Deep Research:** * "What is the capital of France?" (too simple) * "How do I change a tire?" (procedural, not analytical) * "What's the weather today?" (current, specific data) ### Scope Management ```markdown ## Research Scope Framework ### 🎯 Well-Scoped Questions - Clear temporal boundaries (e.g., "since 2020", "in the next decade") - Defined geographic focus (e.g., "in developed nations", "across Asia-Pacific") - Specific industry/domain (e.g., "in healthcare", "for SMB companies") - Measurable aspects (e.g., "economic impact", "user adoption rates") ### ⚠️ Overly Broad Questions - "How does technology affect society?" → Too general - "What is the future of business?" → Too vague - "How does everything connect?" → No clear scope ``` ### Source Quality Optimization **High-Quality Source Preferences:** * Peer-reviewed academic papers * Government and institutional reports * Industry analysis from recognized firms * Primary research and surveys * Expert interviews and case studies **Source Validation Indicators:** * Author credentials and affiliations * Publication date and relevance * Citation count and impact factor * Methodology transparency * Cross-reference validation ### Iterative Refinement ```python # Example of research refinement workflow refinement_process = { 'initial_question': 'How is AI changing education?', 'refined_questions': [ 'How is AI being implemented in K-12 education systems?', 'What are the learning outcomes of AI-powered educational tools?', 'How do educators perceive AI integration in classrooms?', 'What are the equity implications of AI in education?' ], 'focus_areas': [ 'implementation_models', 'effectiveness_metrics', 'teacher_training', 'accessibility_concerns' ] } ``` ### Quality Assurance **Research Quality Metrics:** * **Source Diversity**: Multiple perspectives and methodologies * **Temporal Coverage**: Recent and historical context * **Evidence Strength**: Primary data and expert consensus * **Bias Mitigation**: Balanced viewpoint representation * **Completeness**: Addressing all aspects of the question ### Common Pitfalls to Avoid 1. **Question Too Narrow**: Results in shallow, limited analysis 2. **Unrealistic Time Expectations**: Complex topics need adequate research time 3. **Source Bias**: Over-relying on single source types or viewpoints 4. **Scope Creep**: Allowing research to expand beyond useful boundaries 5. **Insufficient Context**: Not providing relevant background information ### Integration with Regular Ask ```mermaid flowchart TD A[User Question] --> B{Complexity Assessment} B -->|Simple/Direct| C[Regular Ask] B -->|Complex/Multi-faceted| D[Deep Research] C --> E[Immediate Response] D --> F[Multi-turn Analysis] F --> G[Comprehensive Report] G --> H{Follow-up Needed?} H -->|Yes| I[Additional Regular Ask] H -->|No| J[Research Complete] ``` ## Advanced Features ### Real-time Research Updates Enable live progress tracking: ```javascript // WebSocket connection for real-time updates const ws = new WebSocket('wss://api.deepwikiopen.com/v1/research-stream'); ws.onmessage = function(event) { const update = JSON.parse(event.data); console.log(`Turn ${update.turn}: ${update.status}`); console.log(`Sources analyzed: ${update.sources_count}`); console.log(`Current focus: ${update.current_focus}`); }; ``` ### Collaborative Research Multiple researchers can contribute to the same deep research session: ```python # Add collaborative researchers collaboration_config = { 'research_id': research_id, 'collaborators': [ {'role': 'domain_expert', 'focus': 'technical_analysis'}, {'role': 'market_analyst', 'focus': 'commercial_implications'}, {'role': 'policy_expert', 'focus': 'regulatory_landscape'} ], 'coordination_mode': 'parallel_with_synthesis' } ``` ### Research Templates Use pre-configured research templates for common analysis types: ```python # Market analysis template market_template = { 'template_id': 'market_analysis_v2', 'standard_turns': [ 'market_size_and_trends', 'competitive_landscape', 'customer_segments', 'regulatory_environment', 'future_projections' ], 'required_sources': ['industry_reports', 'financial_data', 'market_surveys'], 'output_format': 'executive_summary_with_appendices' } ``` ## Conclusion Deep Research represents a paradigm shift in AI-powered information analysis, enabling comprehensive investigation of complex topics through systematic, iterative research processes. By understanding when and how to leverage this powerful feature, users can obtain insights that go far beyond traditional search and single-turn AI responses. The key to successful deep research lies in proper question formulation, appropriate scope management, and leveraging the full potential of multi-turn analysis. As AI capabilities continue to evolve, Deep Research will become an increasingly valuable tool for researchers, analysts, and decision-makers across all industries. *** *For more information about DeepWikiOpen's Deep Research capabilities, visit our [API documentation](/api/deep-research) or explore our [example research sessions](/examples/deep-research).* # null Source: https://asyncfunc.mintlify.app/guides/docker-deployment # Docker Deployment Guide This comprehensive guide covers everything you need to know about deploying DeepWikiOpen using Docker, from basic setups to production-grade deployments. ## Table of Contents * [Prerequisites](#prerequisites) * [Quick Start with Pre-built Images](#quick-start-with-pre-built-images) * [Building Custom Images](#building-custom-images) * [Docker Compose Setup](#docker-compose-setup) * [Environment Configuration](#environment-configuration) * [Volume Mounts and Data Persistence](#volume-mounts-and-data-persistence) * [Container Networking](#container-networking) * [Health Checks and Monitoring](#health-checks-and-monitoring) * [Scaling Strategies](#scaling-strategies) * [Security Considerations](#security-considerations) * [Troubleshooting](#troubleshooting) * [Production Deployments](#production-deployments) ## Prerequisites Before you begin, ensure you have the following installed: * **Docker** (version 20.10+) * **Docker Compose** (version 2.0+) * **Git** (for cloning repositories) ### Installation Verification ```bash # Check Docker version docker --version docker-compose --version # Verify Docker is running docker info ``` ## Quick Start with Pre-built Images ### Using GitHub Container Registry DeepWikiOpen provides pre-built images through GitHub Container Registry (GHCR). This is the fastest way to get started. ```bash # Pull the latest image docker pull ghcr.io/your-org/deepwikiopen:latest # Run a simple container docker run -d \ --name deepwikiopen \ -p 3000:3000 \ -e NODE_ENV=production \ ghcr.io/your-org/deepwikiopen:latest ``` ### Available Image Tags | Tag | Description | Use Case | | --------- | --------------------- | --------------------------------- | | `latest` | Latest stable release | Production | | `develop` | Development branch | Testing | | `v1.2.3` | Specific version | Production pinning | | `slim` | Minimal image size | Resource-constrained environments | ### Basic Docker Run Command ```bash docker run -d \ --name deepwikiopen-app \ --restart unless-stopped \ -p 3000:3000 \ -e DATABASE_URL="postgresql://user:pass@host:5432/db" \ -e JWT_SECRET="your-secret-key" \ -e REDIS_URL="redis://redis:6379" \ -v deepwikiopen-data:/app/data \ ghcr.io/your-org/deepwikiopen:latest ``` ## Building Custom Images ### Dockerfile Explanation Here's a production-ready Dockerfile with explanations: ```dockerfile # Use Node.js LTS Alpine for smaller image size FROM node:18-alpine AS base # Install system dependencies RUN apk add --no-cache \ python3 \ make \ g++ \ cairo-dev \ jpeg-dev \ pango-dev \ musl-dev \ giflib-dev \ pixman-dev \ pangomm-dev \ libjpeg-turbo-dev \ freetype-dev # Set working directory WORKDIR /app # Copy package files COPY package*.json ./ COPY yarn.lock ./ # Install dependencies FROM base AS dependencies RUN npm ci --only=production --frozen-lockfile # Development dependencies for building FROM base AS dev-dependencies RUN npm ci --frozen-lockfile # Build stage FROM dev-dependencies AS build COPY . . RUN npm run build RUN npm run test:unit # Production stage FROM base AS production # Create non-root user for security RUN addgroup -g 1001 -S nodejs && \ adduser -S nextjs -u 1001 # Copy production dependencies COPY --from=dependencies /app/node_modules ./node_modules # Copy built application COPY --from=build --chown=nextjs:nodejs /app/.next ./.next COPY --from=build --chown=nextjs:nodejs /app/public ./public COPY --from=build --chown=nextjs:nodejs /app/package.json ./package.json # Create data directory RUN mkdir -p /app/data && chown -R nextjs:nodejs /app/data # Switch to non-root user USER nextjs # Expose port EXPOSE 3000 # Health check HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \ CMD curl -f http://localhost:3000/api/health || exit 1 # Start the application CMD ["npm", "start"] ``` ### Building the Image ```bash # Build with default tag docker build -t deepwikiopen:local . # Build with specific tag and build args docker build \ --build-arg NODE_ENV=production \ --build-arg BUILD_DATE=$(date -u +'%Y-%m-%dT%H:%M:%SZ') \ -t deepwikiopen:v1.0.0 . # Multi-platform build docker buildx build \ --platform linux/amd64,linux/arm64 \ -t deepwikiopen:multi-arch \ --push . ``` ### Optimized Development Dockerfile ```dockerfile FROM node:18-alpine AS development # Install development tools RUN apk add --no-cache \ git \ curl \ vim WORKDIR /app # Copy package files COPY package*.json ./ RUN npm install # Copy source code COPY . . # Expose port and start dev server EXPOSE 3000 CMD ["npm", "run", "dev"] ``` ## Docker Compose Setup ### Complete Production Setup ```yaml # docker-compose.yml version: '3.8' services: # Main application app: image: ghcr.io/your-org/deepwikiopen:latest container_name: deepwikiopen-app restart: unless-stopped ports: - "3000:3000" environment: - NODE_ENV=production - DATABASE_URL=postgresql://postgres:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB} - REDIS_URL=redis://redis:6379 - JWT_SECRET=${JWT_SECRET} - NEXTAUTH_URL=${NEXTAUTH_URL} - NEXTAUTH_SECRET=${NEXTAUTH_SECRET} volumes: - app-data:/app/data - app-logs:/app/logs depends_on: postgres: condition: service_healthy redis: condition: service_healthy networks: - deepwikiopen-network healthcheck: test: ["CMD", "curl", "-f", "http://localhost:3000/api/health"] interval: 30s timeout: 10s retries: 3 start_period: 40s # PostgreSQL database postgres: image: postgres:15-alpine container_name: deepwikiopen-postgres restart: unless-stopped environment: - POSTGRES_USER=postgres - POSTGRES_PASSWORD=${POSTGRES_PASSWORD} - POSTGRES_DB=${POSTGRES_DB} - POSTGRES_INITDB_ARGS=--encoding=UTF-8 --lc-collate=C --lc-ctype=C volumes: - postgres-data:/var/lib/postgresql/data - ./docker/postgres/init:/docker-entrypoint-initdb.d ports: - "5432:5432" networks: - deepwikiopen-network healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres"] interval: 10s timeout: 5s retries: 5 # Redis cache redis: image: redis:7-alpine container_name: deepwikiopen-redis restart: unless-stopped command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD} volumes: - redis-data:/data - ./docker/redis/redis.conf:/usr/local/etc/redis/redis.conf ports: - "6379:6379" networks: - deepwikiopen-network healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 10s timeout: 3s retries: 3 # Elasticsearch for search elasticsearch: image: elasticsearch:8.8.0 container_name: deepwikiopen-elasticsearch restart: unless-stopped environment: - discovery.type=single-node - ES_JAVA_OPTS=-Xms512m -Xmx512m - xpack.security.enabled=false volumes: - elasticsearch-data:/usr/share/elasticsearch/data ports: - "9200:9200" networks: - deepwikiopen-network healthcheck: test: ["CMD-SHELL", "curl -f http://localhost:9200/_cluster/health"] interval: 30s timeout: 10s retries: 3 # Nginx reverse proxy nginx: image: nginx:alpine container_name: deepwikiopen-nginx restart: unless-stopped ports: - "80:80" - "443:443" volumes: - ./docker/nginx/nginx.conf:/etc/nginx/nginx.conf - ./docker/nginx/conf.d:/etc/nginx/conf.d - ./docker/ssl:/etc/nginx/ssl - nginx-logs:/var/log/nginx depends_on: - app networks: - deepwikiopen-network # Monitoring with Prometheus prometheus: image: prom/prometheus:latest container_name: deepwikiopen-prometheus restart: unless-stopped ports: - "9090:9090" volumes: - ./docker/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml - prometheus-data:/prometheus command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' - '--web.console.libraries=/etc/prometheus/console_libraries' - '--web.console.templates=/etc/prometheus/consoles' networks: - deepwikiopen-network # Grafana for visualization grafana: image: grafana/grafana:latest container_name: deepwikiopen-grafana restart: unless-stopped ports: - "3001:3000" environment: - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD} volumes: - grafana-data:/var/lib/grafana - ./docker/grafana/provisioning:/etc/grafana/provisioning networks: - deepwikiopen-network volumes: app-data: driver: local app-logs: driver: local postgres-data: driver: local redis-data: driver: local elasticsearch-data: driver: local nginx-logs: driver: local prometheus-data: driver: local grafana-data: driver: local networks: deepwikiopen-network: driver: bridge ipam: config: - subnet: 172.20.0.0/16 ``` ### Development Compose Setup ```yaml # docker-compose.dev.yml version: '3.8' services: app: build: context: . dockerfile: Dockerfile.dev container_name: deepwikiopen-dev ports: - "3000:3000" - "9229:9229" # Node.js debugger environment: - NODE_ENV=development - DATABASE_URL=postgresql://postgres:password@postgres:5432/deepwikiopen_dev - REDIS_URL=redis://redis:6379 volumes: - .:/app - /app/node_modules - dev-logs:/app/logs depends_on: - postgres - redis networks: - dev-network command: npm run dev postgres: image: postgres:15-alpine container_name: deepwikiopen-postgres-dev environment: - POSTGRES_USER=postgres - POSTGRES_PASSWORD=password - POSTGRES_DB=deepwikiopen_dev volumes: - postgres-dev-data:/var/lib/postgresql/data ports: - "5432:5432" networks: - dev-network redis: image: redis:7-alpine container_name: deepwikiopen-redis-dev ports: - "6379:6379" networks: - dev-network volumes: postgres-dev-data: dev-logs: networks: dev-network: driver: bridge ``` ## Environment Configuration ### Environment Variables Structure Create a comprehensive `.env` file: ```bash # .env # Application Settings NODE_ENV=production PORT=3000 APP_URL=https://your-domain.com APP_NAME="DeepWikiOpen" # Database Configuration DATABASE_URL=postgresql://username:password@postgres:5432/deepwikiopen DB_HOST=postgres DB_PORT=5432 DB_NAME=deepwikiopen DB_USER=username DB_PASSWORD=secure_password # Redis Configuration REDIS_URL=redis://redis:6379 REDIS_HOST=redis REDIS_PORT=6379 REDIS_PASSWORD=redis_password # Authentication JWT_SECRET=your-super-secret-jwt-key-change-this NEXTAUTH_URL=https://your-domain.com NEXTAUTH_SECRET=another-super-secret-key # OAuth Providers GITHUB_CLIENT_ID=your_github_client_id GITHUB_CLIENT_SECRET=your_github_client_secret GOOGLE_CLIENT_ID=your_google_client_id GOOGLE_CLIENT_SECRET=your_google_client_secret # Email Configuration SMTP_HOST=smtp.gmail.com SMTP_PORT=587 SMTP_USER=your-email@gmail.com SMTP_PASS=your-app-password # Storage Configuration AWS_ACCESS_KEY_ID=your_access_key AWS_SECRET_ACCESS_KEY=your_secret_key AWS_REGION=us-east-1 S3_BUCKET_NAME=your-s3-bucket # Monitoring SENTRY_DSN=https://your-sentry-dsn NEW_RELIC_LICENSE_KEY=your-new-relic-key # Security CORS_ORIGIN=https://your-domain.com RATE_LIMIT_MAX=100 RATE_LIMIT_WINDOW_MS=900000 # Feature Flags ENABLE_SEARCH=true ENABLE_ANALYTICS=true ENABLE_NOTIFICATIONS=true ``` ### Environment Files for Different Stages ```bash # .env.local (development) NODE_ENV=development DATABASE_URL=postgresql://postgres:password@localhost:5432/deepwikiopen_dev REDIS_URL=redis://localhost:6379 JWT_SECRET=dev-secret # .env.staging NODE_ENV=staging DATABASE_URL=postgresql://user:pass@staging-db:5432/deepwikiopen_staging REDIS_URL=redis://staging-redis:6379 # .env.production NODE_ENV=production DATABASE_URL=postgresql://user:pass@prod-db:5432/deepwikiopen REDIS_URL=redis://prod-redis:6379 ``` ### Docker Compose Environment Override ```yaml # docker-compose.override.yml version: '3.8' services: app: environment: - DEBUG=true - LOG_LEVEL=debug volumes: - ./logs:/app/logs ``` ## Volume Mounts and Data Persistence ### Volume Types and Use Cases ```yaml # Named volumes (recommended for production) volumes: # Database data persistence postgres-data: driver: local driver_opts: type: none o: bind device: /opt/deepwikiopen/postgres # Application data app-data: driver: local driver_opts: type: none o: bind device: /opt/deepwikiopen/data # Logs app-logs: driver: local driver_opts: type: none o: bind device: /opt/deepwikiopen/logs # Backups backup-data: driver: local driver_opts: type: none o: bind device: /opt/deepwikiopen/backups ``` ### Bind Mounts for Development ```yaml services: app: volumes: # Source code (development) - ./src:/app/src - ./public:/app/public - ./package.json:/app/package.json # Configuration files - ./config:/app/config - ./docker/app/entrypoint.sh:/app/entrypoint.sh # Exclude node_modules - /app/node_modules ``` ### NFS Volumes for Multi-Host Setup ```yaml volumes: shared-data: driver: local driver_opts: type: nfs o: addr=nfs.example.com,rw device: ":/path/to/shared/data" ``` ### Volume Backup Strategy ```bash # Backup script #!/bin/bash # Create backup directory mkdir -p /backups/$(date +%Y%m%d) # Backup PostgreSQL docker exec deepwikiopen-postgres pg_dump -U postgres deepwikiopen > \ /backups/$(date +%Y%m%d)/postgres-backup.sql # Backup volumes docker run --rm \ -v deepwikiopen_postgres-data:/source \ -v /backups/$(date +%Y%m%d):/backup \ alpine tar czf /backup/postgres-data.tar.gz -C /source . docker run --rm \ -v deepwikiopen_app-data:/source \ -v /backups/$(date +%Y%m%d):/backup \ alpine tar czf /backup/app-data.tar.gz -C /source . ``` ## Container Networking ### Custom Bridge Network ```yaml networks: deepwikiopen-network: driver: bridge ipam: driver: default config: - subnet: 172.20.0.0/16 gateway: 172.20.0.1 ``` ### Service Discovery ```yaml services: app: networks: deepwikiopen-network: aliases: - webapp - api postgres: networks: deepwikiopen-network: aliases: - database - db ``` ### Port Configuration ```yaml services: # Internal communication only app-internal: expose: - "3000" networks: - internal # External access app-external: ports: - "80:3000" # HTTP - "443:3000" # HTTPS - "3000:3000" # Direct access networks: - external networks: internal: driver: bridge internal: true external: driver: bridge ``` ### Network Security ```yaml # docker-compose.security.yml version: '3.8' services: app: networks: - frontend - backend postgres: networks: - backend # No external ports exposed redis: networks: - backend # No external ports exposed nginx: networks: - frontend ports: - "80:80" - "443:443" networks: frontend: driver: bridge backend: driver: bridge internal: true ``` ## Health Checks and Monitoring ### Application Health Checks ```dockerfile # Dockerfile health check HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ CMD curl -f http://localhost:3000/api/health || exit 1 ``` ```yaml # Docker Compose health checks services: app: healthcheck: test: ["CMD", "curl", "-f", "http://localhost:3000/api/health"] interval: 30s timeout: 10s retries: 3 start_period: 40s depends_on: postgres: condition: service_healthy redis: condition: service_healthy ``` ### Custom Health Check Endpoint ```javascript // pages/api/health.js export default function handler(req, res) { const checks = { status: 'ok', timestamp: new Date().toISOString(), checks: { database: 'checking...', redis: 'checking...', memory: process.memoryUsage(), uptime: process.uptime() } }; // Check database connection try { // Your database check logic checks.checks.database = 'healthy'; } catch (error) { checks.checks.database = 'unhealthy'; checks.status = 'error'; } // Check Redis connection try { // Your Redis check logic checks.checks.redis = 'healthy'; } catch (error) { checks.checks.redis = 'unhealthy'; checks.status = 'error'; } const statusCode = checks.status === 'ok' ? 200 : 503; res.status(statusCode).json(checks); } ``` ### Monitoring with Docker Stats ```bash # Monitor container resources docker stats deepwikiopen-app # Get detailed container information docker inspect deepwikiopen-app # View container logs docker logs -f deepwikiopen-app # Execute commands in running container docker exec -it deepwikiopen-app sh ``` ### Prometheus Metrics ```yaml # docker/prometheus/prometheus.yml global: scrape_interval: 15s scrape_configs: - job_name: 'deepwikiopen' static_configs: - targets: ['app:3000'] metrics_path: '/api/metrics' - job_name: 'postgres' static_configs: - targets: ['postgres-exporter:9187'] - job_name: 'redis' static_configs: - targets: ['redis-exporter:9121'] - job_name: 'nginx' static_configs: - targets: ['nginx-exporter:9113'] ``` ## Scaling Strategies ### Docker Compose Scale ```bash # Scale application horizontally docker-compose up -d --scale app=3 # Scale with load balancer docker-compose -f docker-compose.yml -f docker-compose.scale.yml up -d ``` ```yaml # docker-compose.scale.yml version: '3.8' services: app: deploy: replicas: 3 nginx: depends_on: - app volumes: - ./docker/nginx/nginx-scale.conf:/etc/nginx/nginx.conf ``` ### Load Balancer Configuration ```nginx # docker/nginx/nginx-scale.conf upstream app_servers { server app_1:3000; server app_2:3000; server app_3:3000; } server { listen 80; server_name localhost; location / { proxy_pass http://app_servers; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } } ``` ### Docker Swarm Setup ```bash # Initialize Docker Swarm docker swarm init # Deploy stack docker stack deploy -c docker-compose.swarm.yml deepwikiopen # Scale services docker service scale deepwikiopen_app=5 ``` ```yaml # docker-compose.swarm.yml version: '3.8' services: app: image: ghcr.io/your-org/deepwikiopen:latest deploy: replicas: 3 placement: constraints: - node.role == worker resources: limits: cpus: '0.5' memory: 512M reservations: cpus: '0.25' memory: 256M restart_policy: condition: on-failure delay: 5s max_attempts: 3 update_config: parallelism: 1 delay: 10s failure_action: rollback order: start-first postgres: image: postgres:15-alpine deploy: replicas: 1 placement: constraints: - node.role == manager resources: limits: memory: 1G reservations: memory: 512M networks: deepwikiopen-network: driver: overlay attachable: true ``` ### Auto-scaling with Docker Swarm ```bash # Create auto-scaling service docker service create \ --name deepwikiopen-app \ --replicas 2 \ --limit-cpu 0.5 \ --limit-memory 512m \ --reserve-cpu 0.25 \ --reserve-memory 256m \ --update-parallelism 1 \ --update-delay 10s \ ghcr.io/your-org/deepwikiopen:latest ``` ## Security Considerations ### Container Security Best Practices ```dockerfile # Use non-root user FROM node:18-alpine RUN addgroup -g 1001 -S nodejs && \ adduser -S nextjs -u 1001 # Set proper file permissions COPY --chown=nextjs:nodejs . . USER nextjs # Use read-only root filesystem docker run --read-only --tmpfs /tmp deepwikiopen:latest # Drop capabilities docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE deepwikiopen:latest # Set security options docker run --security-opt=no-new-privileges:true deepwikiopen:latest ``` ### Secrets Management ```yaml # docker-compose.secrets.yml version: '3.8' services: app: secrets: - db_password - jwt_secret environment: - DATABASE_PASSWORD_FILE=/run/secrets/db_password - JWT_SECRET_FILE=/run/secrets/jwt_secret secrets: db_password: file: ./secrets/db_password.txt jwt_secret: file: ./secrets/jwt_secret.txt ``` ### Network Security ```yaml services: app: networks: - frontend # Only expose necessary ports expose: - "3000" postgres: networks: - backend # No external ports # Use internal network only networks: frontend: driver: bridge backend: driver: bridge internal: true # No external access ``` ### Security Scanning ```bash # Scan images for vulnerabilities docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \ -v /tmp:/tmp anchore/grype:latest \ ghcr.io/your-org/deepwikiopen:latest # Use Trivy for comprehensive scanning docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \ aquasec/trivy image ghcr.io/your-org/deepwikiopen:latest ``` ### Resource Limits ```yaml services: app: deploy: resources: limits: cpus: '0.50' memory: 512M pids: 100 reservations: cpus: '0.25' memory: 256M ulimits: nofile: soft: 1024 hard: 2048 ``` ## Troubleshooting ### Common Issues and Solutions #### Container Won't Start ```bash # Check container logs docker logs deepwikiopen-app # Check container configuration docker inspect deepwikiopen-app # Debug with shell access docker run -it --entrypoint /bin/sh ghcr.io/your-org/deepwikiopen:latest ``` #### Database Connection Issues ```bash # Test database connection docker exec deepwikiopen-app nc -zv postgres 5432 # Check PostgreSQL logs docker logs deepwikiopen-postgres # Test with psql docker exec -it deepwikiopen-postgres psql -U postgres -d deepwikiopen ``` #### Memory Issues ```bash # Monitor memory usage docker stats deepwikiopen-app # Check for memory leaks docker exec deepwikiopen-app cat /proc/meminfo # Increase memory limits docker run -m 1g deepwikiopen:latest ``` #### Permission Issues ```bash # Check file permissions docker exec deepwikiopen-app ls -la /app # Fix ownership docker exec deepwikiopen-app chown -R nextjs:nodejs /app/data # Run as different user docker run --user 1001:1001 deepwikiopen:latest ``` ### Debugging Tools ```bash # Enter running container docker exec -it deepwikiopen-app sh # Copy files from container docker cp deepwikiopen-app:/app/logs ./local-logs # Run health checks manually docker exec deepwikiopen-app curl -f http://localhost:3000/api/health # Check network connectivity docker exec deepwikiopen-app nslookup postgres docker exec deepwikiopen-app ping redis ``` ### Performance Debugging ```bash # Monitor container performance docker stats --no-stream deepwikiopen-app # Profile application docker exec deepwikiopen-app node --prof app.js # Check disk usage docker exec deepwikiopen-app df -h docker system df ``` ## Production Deployments ### Production-Ready Compose File ```yaml # docker-compose.prod.yml version: '3.8' x-common-variables: &common-variables POSTGRES_DB: ${POSTGRES_DB:-deepwikiopen} POSTGRES_USER: ${POSTGRES_USER:-postgres} POSTGRES_PASSWORD: ${POSTGRES_PASSWORD} services: app: image: ghcr.io/your-org/deepwikiopen:${TAG:-latest} restart: unless-stopped environment: <<: *common-variables NODE_ENV: production DATABASE_URL: postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB} volumes: - app-data:/app/data:rw - app-logs:/app/logs:rw networks: - app-network depends_on: postgres: condition: service_healthy healthcheck: test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/api/health"] interval: 30s timeout: 10s retries: 3 deploy: resources: limits: memory: 1G cpus: '1.0' reservations: memory: 512M cpus: '0.5' postgres: image: postgres:15-alpine restart: unless-stopped environment: <<: *common-variables POSTGRES_INITDB_ARGS: "--encoding=UTF-8 --lc-collate=C --lc-ctype=C" volumes: - postgres-data:/var/lib/postgresql/data:rw - postgres-backups:/backups:rw networks: - app-network healthcheck: test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"] interval: 10s timeout: 5s retries: 5 deploy: resources: limits: memory: 2G cpus: '1.0' redis: image: redis:7-alpine restart: unless-stopped command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD} volumes: - redis-data:/data:rw networks: - app-network healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 30s timeout: 3s retries: 3 nginx: image: nginx:alpine restart: unless-stopped ports: - "80:80" - "443:443" volumes: - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro - ./nginx/ssl:/etc/nginx/ssl:ro - nginx-logs:/var/log/nginx:rw depends_on: - app networks: - app-network volumes: app-data: driver: local driver_opts: type: none o: bind device: /opt/deepwikiopen/data app-logs: driver: local driver_opts: type: none o: bind device: /opt/deepwikiopen/logs postgres-data: driver: local driver_opts: type: none o: bind device: /opt/deepwikiopen/postgres postgres-backups: driver: local driver_opts: type: none o: bind device: /opt/deepwikiopen/backups redis-data: driver: local nginx-logs: driver: local networks: app-network: driver: bridge ``` ### Deployment Scripts ```bash #!/bin/bash # deploy.sh set -e echo "🚀 Starting production deployment..." # Load environment variables source .env.production # Pull latest images echo "📥 Pulling latest images..." docker-compose -f docker-compose.prod.yml pull # Create backup before deployment echo "💾 Creating backup..." ./scripts/backup.sh # Stop services gracefully echo "🛑 Stopping services..." docker-compose -f docker-compose.prod.yml down --remove-orphans # Start services echo "▶️ Starting services..." docker-compose -f docker-compose.prod.yml up -d # Wait for health checks echo "⏳ Waiting for services to be healthy..." timeout 300 docker-compose -f docker-compose.prod.yml exec app \ bash -c 'while [[ "$(curl -s -o /dev/null -w ''%{http_code}'' localhost:3000/api/health)" != "200" ]]; do sleep 5; done' # Run database migrations echo "🔄 Running database migrations..." docker-compose -f docker-compose.prod.yml exec app npm run migrate # Clean up old images echo "🧹 Cleaning up..." docker image prune -f echo "✅ Deployment completed successfully!" ``` ### Zero-Downtime Deployment ```bash #!/bin/bash # zero-downtime-deploy.sh set -e # Blue-green deployment script CURRENT_COLOR=$(docker-compose -f docker-compose.prod.yml ps -q app | head -1) NEW_COLOR=$([ "$CURRENT_COLOR" == "blue" ] && echo "green" || echo "blue") echo "🔄 Starting zero-downtime deployment (switching to $NEW_COLOR)..." # Start new version alongside current docker-compose -f docker-compose.$NEW_COLOR.yml up -d # Health check new version echo "⏳ Waiting for new version to be ready..." timeout 300 bash -c "until curl -f http://localhost:3001/api/health; do sleep 5; done" # Switch traffic echo "🔀 Switching traffic..." ./scripts/switch-traffic.sh $NEW_COLOR # Stop old version echo "🛑 Stopping old version..." docker-compose -f docker-compose.$CURRENT_COLOR.yml down echo "✅ Zero-downtime deployment completed!" ``` ### Monitoring and Alerting ```yaml # monitoring/docker-compose.monitoring.yml version: '3.8' services: prometheus: image: prom/prometheus:latest volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml - prometheus-data:/prometheus ports: - "9090:9090" grafana: image: grafana/grafana:latest environment: - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD} volumes: - grafana-data:/var/lib/grafana - ./grafana/provisioning:/etc/grafana/provisioning ports: - "3001:3000" alertmanager: image: prom/alertmanager:latest volumes: - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml ports: - "9093:9093" volumes: prometheus-data: grafana-data: ``` ### SSL/TLS Configuration ```nginx # nginx/nginx.conf server { listen 443 ssl http2; server_name your-domain.com; ssl_certificate /etc/nginx/ssl/cert.pem; ssl_certificate_key /etc/nginx/ssl/key.pem; ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512:ECDHE-RSA-AES256-GCM-SHA384; ssl_prefer_server_ciphers off; ssl_session_cache shared:SSL:10m; location / { proxy_pass http://app:3000; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } } server { listen 80; server_name your-domain.com; return 301 https://$server_name$request_uri; } ``` This comprehensive Docker deployment guide covers everything from basic setups to production-grade deployments. Use the appropriate sections based on your deployment needs and gradually implement more advanced features as your application grows. Remember to: * Always test deployments in a staging environment first * Keep your images updated with security patches * Monitor resource usage and adjust limits accordingly * Implement proper backup and disaster recovery procedures * Use secrets management for sensitive data * Regular security audits of your containers and configurations # Internationalization (i18n) Guide Source: https://asyncfunc.mintlify.app/guides/internationalization Comprehensive guide to DeepWiki's multi-language support, language detection, and global configuration options DeepWiki offers robust internationalization support with built-in language detection, comprehensive translation coverage, and intelligent AI model responses in your preferred language. ## Overview DeepWiki provides comprehensive internationalization (i18n) support designed for global teams and diverse repositories. The platform automatically detects languages, provides localized interfaces, and generates documentation that respects cultural and linguistic preferences. ### Key Features * **10+ Supported Languages**: Native support for major world languages * **Automatic Language Detection**: Smart detection from browser settings and repository content * **Contextual AI Responses**: AI models understand and respond in the appropriate language * **Cultural Adaptation**: Documentation generated with cultural considerations * **Multi-language Repository Handling**: Support for repositories containing multiple languages ## Supported Languages DeepWiki currently supports the following languages: **Code**: `en` Default language with full feature support **Code**: `zh` 中文 - Full localization support **Code**: `zh-tw` 繁體中文 - Traditional Chinese variant **Code**: `ja` 日本語 - Complete Japanese localization **Code**: `kr` 한국어 - Korean language support **Code**: `es` Español - Spanish localization **Code**: `vi` Tiếng Việt - Vietnamese support **Code**: `pt-br` Português Brasileiro - Brazilian variant **Code**: `fr` Français - French localization **Code**: `ru` Русский - Russian language support ## Language Detection ### Automatic Browser Detection DeepWiki automatically detects your preferred language using a sophisticated algorithm: ```typescript // Language detection priority 1. Stored user preference (localStorage) 2. Browser language settings (navigator.language) 3. Repository primary language 4. System default (English) ``` The system examines `navigator.language` and extracts the language code: ```javascript // Examples of browser language detection 'en-US' → 'en' (English) 'ja-JP' → 'ja' (Japanese) 'zh-CN' → 'zh' (Simplified Chinese) 'zh-TW' → 'zh-tw' (Traditional Chinese) 'es-ES' → 'es' (Spanish) ``` When a language isn't supported, DeepWiki uses intelligent fallbacks: ```javascript // Fallback examples 'pt-PT' → 'pt-br' (Portuguese variants) 'zh-HK' → 'zh-tw' (Traditional Chinese variants) 'en-GB' → 'en' (English variants) 'fr-CA' → 'fr' (French variants) ``` ### Repository Language Detection For documentation generation, DeepWiki analyzes repository content: Detects the main programming language used in the repository Identifies existing documentation language (README, comments) Considers regional coding patterns and naming conventions ## UI Language Configuration ### Manual Language Selection Users can manually override automatic detection through the language selector: ```tsx Language Selector Component import { useLanguage } from '@/contexts/LanguageContext'; function LanguageSelector() { const { language, setLanguage, supportedLanguages } = useLanguage(); return ( ); } ``` ```javascript Storage Persistence // Language preference is automatically stored localStorage.setItem('language', 'ja'); // Persists across sessions document.documentElement.lang = 'ja'; // Updates HTML lang attribute ``` ### Language Context Integration ```tsx import { useLanguage } from '@/contexts/LanguageContext'; function MyComponent() { const { messages, language } = useLanguage(); return (

{messages.common.appName}

Current language: {language}

); } ```
```typescript import { getRequestConfig } from 'next-intl/server'; export default getRequestConfig(async ({ locale }) => { return { locale: locale, messages: (await import(`./messages/${locale}.json`)).default }; }); ```
## API Language Configuration ### Request Headers Configure language preferences through API requests: ```bash curl -X POST https://api.deepwiki.ai/generate-wiki \ -H "Accept-Language: ja,en;q=0.9" \ -H "Content-Type: application/json" \ -d '{ "repo_url": "owner/repo", "language": "ja", "wiki_language": "ja" }' ``` ### Configuration Files Set default languages in your configuration: ```json lang.json { "supported_languages": { "en": "English", "ja": "Japanese (日本語)", "zh": "Mandarin Chinese (中文)", "zh-tw": "Traditional Chinese (繁體中文)", "es": "Spanish (Español)", "kr": "Korean (한국어)", "vi": "Vietnamese (Tiếng Việt)", "pt-br": "Brazilian Portuguese (Português Brasileiro)", "fr": "Français (French)", "ru": "Русский (Russian)" }, "default": "en" } ``` ```yaml Environment Variables # .env.local DEEPWIKI_DEFAULT_LANGUAGE=en DEEPWIKI_UI_LANGUAGE=auto DEEPWIKI_WIKI_LANGUAGE=auto DEEPWIKI_AI_LANGUAGE_CONTEXT=true ``` ## AI Model Language Context ### Language-Aware Generation DeepWiki's AI models understand linguistic context and generate appropriate documentation: ```markdown # User Authentication System This module provides secure user authentication with JWT tokens. ## Features - Password hashing with bcrypt - JWT token generation and validation - Role-based access control (RBAC) ``` ```markdown # ユーザー認証システム このモジュールはJWTトークンを使用した安全なユーザー認証を提供します。 ## 機能 - bcryptによるパスワードハッシュ化 - JWTトークンの生成と検証 - ロールベースアクセス制御(RBAC) ``` ```markdown # 用户身份验证系统 该模块提供基于JWT令牌的安全用户身份验证。 ## 功能特性 - 使用bcrypt进行密码哈希加密 - JWT令牌生成和验证 - 基于角色的访问控制(RBAC) ``` ### Model Configuration Configure AI models for multilingual responses: ```json { "generator": { "language_context": true, "cultural_adaptation": true, "preserve_technical_terms": true, "localization_depth": "comprehensive" }, "embedder": { "multilingual_embeddings": true, "cross_language_similarity": true } } ``` ## Multi-Language Repository Handling ### Repository Analysis DeepWiki intelligently handles repositories with multiple languages: Identifies the main programming language and documentation language Recognizes additional languages and their contexts Appropriately processes files with mixed language content ### Documentation Strategy Generate single documentation in the dominant language with technical terms preserved Create separate documentation versions for major languages in the repository ### Code Examples ```typescript // English-dominant repository with Japanese comments class UserService { /** * ユーザーを作成します * Creates a new user account */ async createUser(userData: UserData): Promise { // データバリデーション (Data validation) const validatedData = this.validateUserData(userData); // ユーザー保存 (Save user) return await this.userRepository.save(validatedData); } } ``` Generated documentation preserves both contexts: ```markdown ## UserService クラス `UserService` class provides comprehensive user management functionality with bilingual support. ### createUser メソッド Creates a new user account with data validation. **Parameters:** - `userData: UserData` - User information to be processed **Returns:** - `Promise` - Created user object **Implementation Notes:** - データバリデーション (Data validation) ensures input integrity - ユーザー保存 (User saving) persists data to repository ``` ## Cultural Considerations ### Regional Preferences DeepWiki adapts to regional documentation preferences: ```markdown # Quick Start Guide Get started with DeepWiki in 3 easy steps: 1. Clone the repository 2. Install dependencies 3. Run the application ## Prerequisites - Node.js 18+ - Git ``` ```markdown # クイックスタートガイド DeepWikiを3つの簡単なステップで始めましょう: 【手順1】リポジトリをクローンする 【手順2】依存関係をインストールする 【手順3】アプリケーションを実行する ## 前提条件 - Node.js 18以上 - Git ``` ### Technical Term Handling Keep technical terms in original language **Example**: `useState` remains `useState` in all languages Provide local explanations for complex terms **Example**: "JWT (JSON Web Token / JSONウェブトークン)" ## Best Practices for International Teams ### 1. Language Strategy Choose a primary language for technical documentation (usually English) Decide which content needs full localization vs. technical preservation Establish terminology guidelines for mixed-language projects ### 2. Repository Organization ``` docs/ ├── en/ # English documentation ├── ja/ # Japanese documentation ├── zh/ # Chinese documentation └── shared/ # Language-neutral resources ├── diagrams/ └── code-samples/ ``` ### 3. Development Guidelines ```typescript // Use primary language for code comments // 主要言語でコードコメントを記述 /** * User authentication service * ユーザー認証サービス * @param credentials - Login credentials / ログイン資格情報 */ ``` ```yaml # Generate API docs in multiple languages openapi: 3.0.0 info: title: en: "DeepWiki API" ja: "DeepWiki API" zh: "DeepWiki API" description: en: "AI-powered documentation generation API" ja: "AI駆動ドキュメント生成API" zh: "AI驱动的文档生成API" ``` ### 4. Quality Assurance Have native speakers review translated documentation for accuracy and cultural appropriateness Ensure technical terms and concepts remain accurate across all languages Use automated tools to check consistency between language versions Test documentation with users from different linguistic backgrounds ## Advanced Configuration ### Custom Language Support Extend DeepWiki with additional languages: ```json Add New Language { "supported_languages": { "de": "Deutsch (German)", "it": "Italiano (Italian)", "nl": "Nederlands (Dutch)" } } ``` ```javascript Message Files // messages/de.json { "common": { "appName": "DeepWiki-Open", "generateWiki": "Wiki Generieren", "loading": "Laden..." } } ``` ### Language-Specific Features ```json { "language_features": { "ja": { "ruby_annotations": true, "vertical_text_support": true }, "zh": { "traditional_characters": true, "pinyin_support": true }, "ar": { "rtl_support": true, "arabic_numerals": true } } } ``` ## Troubleshooting ### Common Issues **Solution**: Check browser language settings and clear localStorage: ```javascript localStorage.removeItem('language'); location.reload(); ``` **Cause**: Repository contains multiple languages without clear primary language. **Solution**: Explicitly set the wiki language in generation settings. **Check**: * API language header configuration * Model language context settings * Repository language detection accuracy **Solution**: Add missing keys to message files: ```json // messages/[lang].json { "newFeature": { "title": "Translated Title", "description": "Translated Description" } } ``` ### Debug Mode Enable language debugging for troubleshooting: ```javascript // Enable in browser console localStorage.setItem('deepwiki_debug_i18n', 'true'); // View language detection logs console.log('Browser language:', navigator.language); console.log('Detected language:', detectedLanguage); console.log('Available languages:', supportedLanguages); ``` ## Migration Guide ### Updating Language Support When upgrading DeepWiki versions: ```bash cp api/config/lang.json api/config/lang.json.backup ``` Merge new language options with existing preferences Add new translation keys to all language files Verify all languages work correctly after update Always test language functionality after updates, as new features may require additional translations. ## API Reference ### Language Configuration Endpoints ```bash # Get supported languages GET /api/lang/config # Set user language preference POST /api/user/language { "language": "ja", "persist": true } # Generate wiki with specific language POST /api/wiki/generate { "repo_url": "owner/repo", "language": "zh", "wiki_language": "zh" } ``` ### Language Detection API ```bash # Detect repository language POST /api/detect/language { "repo_url": "owner/repo" } # Response { "primary_language": "JavaScript", "documentation_language": "en", "suggested_wiki_language": "en", "confidence": 0.95 } ``` DeepWiki's internationalization system ensures that teams around the world can generate high-quality documentation in their preferred languages while maintaining technical accuracy and cultural appropriateness. # null Source: https://asyncfunc.mintlify.app/guides/manual-setup # Manual Setup Guide A comprehensive guide for developers who prefer hands-on control over their DeepWiki-Open development environment. ## Prerequisites Before starting, ensure you have the following installed on your system: * **Python 3.12+** (Required by pyproject.toml) * **Node.js 18+** (Required for Next.js) * **Git** (For repository cloning) * **Basic terminal/command line knowledge** ## 1. Environment Setup ### 1.1 Python Environment Setup #### Option A: Using Virtual Environment (Recommended) ```bash # Create a virtual environment python -m venv deepwiki-env # Activate the virtual environment # On Windows: deepwiki-env\Scripts\activate # On macOS/Linux: source deepwiki-env/bin/activate # Verify Python version python --version # Should be 3.12+ ``` #### Option B: Using Conda ```bash # Create conda environment conda create -n deepwiki python=3.12 conda activate deepwiki # Verify installation python --version which python # Should point to conda environment ``` #### Option C: Using pyenv (Advanced) ```bash # Install Python 3.12 if not available pyenv install 3.12.0 pyenv local 3.12.0 # Create virtual environment python -m venv deepwiki-env source deepwiki-env/bin/activate ``` ### 1.2 Node.js and Package Manager Setup #### Install Node.js **Option A: Using Node Version Manager (Recommended)** ```bash # Install nvm (macOS/Linux) curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash source ~/.bashrc # Install and use Node.js LTS nvm install --lts nvm use --lts nvm alias default node ``` **Option B: Direct Installation** Download from [nodejs.org](https://nodejs.org/) or use package managers: ```bash # macOS with Homebrew brew install node # Ubuntu/Debian curl -fsSL https://deb.nodesource.com/setup_lts.x | sudo -E bash - sudo apt-get install -y nodejs # CentOS/RHEL/Fedora curl -fsSL https://rpm.nodesource.com/setup_lts.x | sudo bash - sudo yum install -y nodejs ``` #### Choose Package Manager ```bash # npm (comes with Node.js) npm --version # Yarn (optional, faster alternative) npm install -g yarn yarn --version # pnpm (optional, efficient alternative) npm install -g pnpm pnpm --version ``` ## 2. Project Setup ### 2.1 Clone and Initial Setup ```bash # Clone the repository git clone https://github.com/AsyncFuncAI/deepwiki-open.git cd deepwiki-open # Create necessary directories mkdir -p logs mkdir -p ~/.adalflow/{repos,databases,wikicache} ``` ### 2.2 Python Dependencies Installation #### Using pip with requirements.txt ```bash # Ensure virtual environment is activated # Install backend dependencies pip install -r api/requirements.txt # Verify installation pip list | grep fastapi pip list | grep uvicorn ``` #### Using uv (Modern Python Package Manager) ```bash # Install uv if not available pip install uv # Install dependencies using uv uv pip install -r api/requirements.txt # Alternative: Use pyproject.toml uv pip install -e . ``` #### Troubleshooting Python Dependencies ```bash # If you encounter version conflicts pip install --upgrade pip pip install --no-cache-dir -r api/requirements.txt # For Apple Silicon Macs (M1/M2) pip install --no-cache-dir --compile --no-use-pep517 numpy pip install -r api/requirements.txt # For systems with limited resources pip install --no-cache-dir -r api/requirements.txt ``` ### 2.3 Node.js Dependencies Installation ```bash # Using npm npm install # Using yarn yarn install # Using pnpm pnpm install # Verify installation npm list --depth=0 # or ls node_modules/ ``` ## 3. Environment Configuration ### 3.1 Environment Variables Setup Create a `.env` file in the project root: ```bash # Create .env file touch .env ``` **Basic Configuration:** ```env # Required API Keys (choose at least one) GOOGLE_API_KEY=your_google_api_key_here OPENAI_API_KEY=your_openai_api_key_here # Optional API Keys OPENROUTER_API_KEY=your_openrouter_api_key_here AZURE_OPENAI_API_KEY=your_azure_openai_api_key_here AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/ AZURE_OPENAI_VERSION=2023-12-01-preview # Ollama Configuration (if using local models) OLLAMA_HOST=http://localhost:11434 # Server Configuration PORT=8001 SERVER_BASE_URL=http://localhost:8001 # Authorization (optional) DEEPWIKI_AUTH_MODE=false DEEPWIKI_AUTH_CODE=your_secret_code_here # Logging Configuration LOG_LEVEL=INFO LOG_FILE_PATH=./api/logs/application.log # Custom Configuration Directory (optional) DEEPWIKI_CONFIG_DIR=./api/config # OpenAI Base URL (for custom endpoints) OPENAI_BASE_URL=https://api.openai.com/v1 ``` **Development Configuration:** ```env # Development-specific settings LOG_LEVEL=DEBUG NODE_ENV=development NEXT_PUBLIC_API_URL=http://localhost:8001 ``` **Production Configuration:** ```env # Production-specific settings LOG_LEVEL=WARNING NODE_ENV=production NEXT_PUBLIC_API_URL=https://your-domain.com/api ``` ### 3.2 API Key Acquisition #### Google AI Studio 1. Visit [Google AI Studio](https://makersuite.google.com/app/apikey) 2. Create a new project or select existing 3. Generate API key 4. Copy to `GOOGLE_API_KEY` in `.env` #### OpenAI Platform 1. Visit [OpenAI Platform](https://platform.openai.com/api-keys) 2. Create account and add billing information 3. Generate new secret key 4. Copy to `OPENAI_API_KEY` in `.env` #### OpenRouter 1. Visit [OpenRouter](https://openrouter.ai/) 2. Sign up and add credits 3. Generate API key from dashboard 4. Copy to `OPENROUTER_API_KEY` in `.env` #### Azure OpenAI 1. Go to [Azure Portal](https://portal.azure.com/) 2. Create Azure OpenAI resource 3. Get keys and endpoint from resource 4. Configure all three Azure variables in `.env` ## 4. Database and Storage Setup ### 4.1 Local Storage Directories DeepWiki-Open uses local file storage. Create required directories: ```bash # Create storage directories mkdir -p ~/.adalflow/repos # Cloned repositories mkdir -p ~/.adalflow/databases # Vector embeddings mkdir -p ~/.adalflow/wikicache # Generated wikis mkdir -p ./api/logs # Application logs # Set appropriate permissions chmod 755 ~/.adalflow chmod 755 ~/.adalflow/repos chmod 755 ~/.adalflow/databases chmod 755 ~/.adalflow/wikicache chmod 755 ./api/logs ``` ### 4.2 FAISS Vector Database DeepWiki uses FAISS for vector storage (included in requirements): ```bash # Verify FAISS installation python -c "import faiss; print('FAISS version:', faiss.__version__)" # For GPU acceleration (optional) pip install faiss-gpu # Only if you have CUDA ``` ### 4.3 Storage Configuration Edit `api/config/embedder.json` to customize storage settings: ```json { "embedder": { "model": "text-embedding-ada-002", "provider": "openai" }, "retriever": { "similarity_top_k": 5, "vector_store_type": "faiss" }, "text_splitter": { "type": "recursive_character", "chunk_size": 1000, "chunk_overlap": 200 } } ``` ## 5. Service Configuration ### 5.1 Backend API Configuration #### FastAPI Server Settings Create `api/config/server.json`: ```json { "host": "0.0.0.0", "port": 8001, "reload": true, "workers": 1, "log_config": { "version": 1, "disable_existing_loggers": false, "formatters": { "default": { "format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s" } }, "handlers": { "default": { "formatter": "default", "class": "logging.StreamHandler", "stream": "ext://sys.stdout" } }, "root": { "level": "INFO", "handlers": ["default"] } } } ``` #### CORS Configuration The API allows all origins by default. For production, modify `api/api.py`: ```python from fastapi.middleware.cors import CORSMiddleware app.add_middleware( CORSMiddleware, allow_origins=["http://localhost:3000", "https://yourdomain.com"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"], ) ``` ### 5.2 Frontend Configuration #### Next.js Configuration Edit `next.config.ts`: ```typescript import type { NextConfig } from 'next'; const nextConfig: NextConfig = { env: { NEXT_PUBLIC_API_URL: process.env.NEXT_PUBLIC_API_URL || 'http://localhost:8001', }, async rewrites() { return [ { source: '/api/:path*', destination: `${process.env.NEXT_PUBLIC_API_URL}/api/:path*`, }, ]; }, }; export default nextConfig; ``` #### Internationalization Setup Configure supported languages in `src/i18n.ts`: ```typescript import {notFound} from 'next/navigation'; import {getRequestConfig} from 'next-intl/server'; export const locales = ['en', 'zh', 'ja', 'es', 'fr', 'ko', 'vi', 'pt-br', 'ru', 'zh-tw']; export default getRequestConfig(async ({locale}) => { if (!locales.includes(locale as any)) notFound(); return { messages: (await import(`./messages/${locale}.json`)).default }; }); ``` ## 6. Development vs Production Configurations ### 6.1 Development Configuration **Backend Development:** ```bash # Install development dependencies pip install -r api/requirements.txt pip install pytest black flake8 mypy # Additional dev tools # Run in development mode cd api python -m uvicorn main:app --reload --port 8001 --log-level debug ``` **Frontend Development:** ```bash # Enable development features export NODE_ENV=development export NEXT_PUBLIC_API_URL=http://localhost:8001 # Run development server npm run dev # or yarn dev ``` **Development `.env`:** ```env NODE_ENV=development LOG_LEVEL=DEBUG NEXT_PUBLIC_API_URL=http://localhost:8001 DEEPWIKI_AUTH_MODE=false ``` ### 6.2 Production Configuration **Backend Production:** ```bash # Install production server pip install gunicorn # Create gunicorn configuration touch gunicorn.conf.py ``` `gunicorn.conf.py`: ```python import multiprocessing bind = "0.0.0.0:8001" workers = multiprocessing.cpu_count() * 2 + 1 worker_class = "uvicorn.workers.UvicornWorker" worker_connections = 1000 max_requests = 10000 max_requests_jitter = 1000 timeout = 300 keepalive = 5 preload_app = True ``` **Frontend Production:** ```bash # Build for production npm run build # Start production server npm start ``` **Production `.env`:** ```env NODE_ENV=production LOG_LEVEL=WARNING NEXT_PUBLIC_API_URL=https://your-domain.com DEEPWIKI_AUTH_MODE=true DEEPWIKI_AUTH_CODE=your-secure-code ``` ## 7. Process Management ### 7.1 Using PM2 (Recommended) #### Install PM2 ```bash npm install -g pm2 ``` #### Create PM2 Configuration Create `ecosystem.config.js`: ```javascript module.exports = { apps: [ { name: 'deepwiki-api', script: 'python', args: '-m uvicorn api.main:app --host 0.0.0.0 --port 8001', cwd: '/path/to/deepwiki-open', interpreter: '/path/to/deepwiki-env/bin/python', env: { NODE_ENV: 'production', LOG_LEVEL: 'INFO' }, instances: 1, autorestart: true, watch: false, max_memory_restart: '2G', error_file: './logs/api-error.log', out_file: './logs/api-out.log', log_file: './logs/api-combined.log' }, { name: 'deepwiki-frontend', script: 'npm', args: 'start', cwd: '/path/to/deepwiki-open', env: { NODE_ENV: 'production', PORT: 3000 }, instances: 1, autorestart: true, watch: false, max_memory_restart: '1G', error_file: './logs/frontend-error.log', out_file: './logs/frontend-out.log', log_file: './logs/frontend-combined.log' } ] }; ``` #### PM2 Commands ```bash # Start services pm2 start ecosystem.config.js # Monitor services pm2 monit # View logs pm2 logs # Restart services pm2 restart all # Stop services pm2 stop all # Save PM2 configuration pm2 save # Setup PM2 to start on boot pm2 startup ``` ### 7.2 Using systemd (Linux) #### Backend Service Create `/etc/systemd/system/deepwiki-api.service`: ```ini [Unit] Description=DeepWiki API Server After=network.target [Service] Type=exec User=yourusername Group=yourusername WorkingDirectory=/path/to/deepwiki-open Environment=PATH=/path/to/deepwiki-env/bin EnvironmentFile=/path/to/deepwiki-open/.env ExecStart=/path/to/deepwiki-env/bin/python -m uvicorn api.main:app --host 0.0.0.0 --port 8001 Restart=always RestartSec=10 [Install] WantedBy=multi-user.target ``` #### Frontend Service Create `/etc/systemd/system/deepwiki-frontend.service`: ```ini [Unit] Description=DeepWiki Frontend Server After=network.target deepwiki-api.service [Service] Type=exec User=yourusername Group=yourusername WorkingDirectory=/path/to/deepwiki-open Environment=NODE_ENV=production Environment=PORT=3000 EnvironmentFile=/path/to/deepwiki-open/.env ExecStart=/usr/bin/npm start Restart=always RestartSec=10 [Install] WantedBy=multi-user.target ``` #### systemd Commands ```bash # Reload systemd configuration sudo systemctl daemon-reload # Enable services to start on boot sudo systemctl enable deepwiki-api.service sudo systemctl enable deepwiki-frontend.service # Start services sudo systemctl start deepwiki-api.service sudo systemctl start deepwiki-frontend.service # Check status sudo systemctl status deepwiki-api.service sudo systemctl status deepwiki-frontend.service # View logs sudo journalctl -u deepwiki-api.service -f sudo journalctl -u deepwiki-frontend.service -f ``` ## 8. Monitoring and Logging Setup ### 8.1 Application Logging #### Python Logging Configuration Create `api/logging_config.py`: ```python import logging import logging.handlers import os from pathlib import Path def setup_logging(): log_level = os.getenv('LOG_LEVEL', 'INFO').upper() log_file = os.getenv('LOG_FILE_PATH', './api/logs/application.log') # Create logs directory Path(log_file).parent.mkdir(parents=True, exist_ok=True) # Configure logging logging.basicConfig( level=getattr(logging, log_level), format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[ logging.StreamHandler(), logging.handlers.RotatingFileHandler( log_file, maxBytes=10*1024*1024, # 10MB backupCount=5 ) ] ) ``` #### Next.js Logging Create `src/utils/logger.ts`: ```typescript interface LogEntry { timestamp: string; level: 'info' | 'warn' | 'error' | 'debug'; message: string; data?: any; } class Logger { private isDevelopment = process.env.NODE_ENV === 'development'; private log(level: LogEntry['level'], message: string, data?: any) { const entry: LogEntry = { timestamp: new Date().toISOString(), level, message, data }; if (this.isDevelopment) { console[level](entry); } // Send to backend logging endpoint in production if (!this.isDevelopment && level === 'error') { this.sendToServer(entry); } } private async sendToServer(entry: LogEntry) { try { await fetch('/api/logs', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(entry) }); } catch (error) { console.error('Failed to send log to server:', error); } } info(message: string, data?: any) { this.log('info', message, data); } warn(message: string, data?: any) { this.log('warn', message, data); } error(message: string, data?: any) { this.log('error', message, data); } debug(message: string, data?: any) { this.log('debug', message, data); } } export const logger = new Logger(); ``` ### 8.2 Health Monitoring #### Health Check Endpoint Add to `api/api.py`: ```python @app.get("/health") async def health_check(): return { "status": "healthy", "timestamp": datetime.utcnow().isoformat(), "version": "0.1.0", "services": { "api": "running", "storage": "accessible" if os.path.exists(os.path.expanduser("~/.adalflow")) else "unavailable" } } ``` #### Monitoring Script Create `scripts/monitor.py`: ```python #!/usr/bin/env python3 import requests import time import sys import os def check_service(url, service_name): try: response = requests.get(url, timeout=10) if response.status_code == 200: print(f"✅ {service_name} is healthy") return True else: print(f"❌ {service_name} returned status {response.status_code}") return False except requests.exceptions.RequestException as e: print(f"❌ {service_name} is unreachable: {e}") return False def main(): api_url = os.getenv('SERVER_BASE_URL', 'http://localhost:8001') frontend_url = os.getenv('FRONTEND_URL', 'http://localhost:3000') services = [ (f"{api_url}/health", "API Server"), (frontend_url, "Frontend Server") ] all_healthy = True for url, name in services: if not check_service(url, name): all_healthy = False if not all_healthy: sys.exit(1) print("🎉 All services are healthy!") if __name__ == "__main__": main() ``` ### 8.3 Performance Monitoring #### Simple Performance Tracking Create `scripts/performance_monitor.sh`: ```bash #!/bin/bash # Configuration API_URL="http://localhost:8001" LOG_FILE="./logs/performance.log" # Create logs directory mkdir -p logs # Function to log with timestamp log_with_timestamp() { echo "$(date '+%Y-%m-%d %H:%M:%S') $1" >> "$LOG_FILE" } # Monitor API response time monitor_api() { start_time=$(date +%s.%N) response=$(curl -s -w "%{http_code}" -o /dev/null "$API_URL/health") end_time=$(date +%s.%N) response_time=$(echo "$end_time - $start_time" | bc) if [ "$response" = "200" ]; then log_with_timestamp "API_HEALTH_OK response_time=${response_time}s" else log_with_timestamp "API_HEALTH_ERROR http_code=$response" fi } # Monitor system resources monitor_resources() { # CPU usage cpu_usage=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1) # Memory usage memory_usage=$(free | grep Mem | awk '{printf "%.1f", $3/$2 * 100.0}') # Disk usage disk_usage=$(df -h . | tail -1 | awk '{print $5}' | cut -d'%' -f1) log_with_timestamp "RESOURCES cpu=${cpu_usage}% memory=${memory_usage}% disk=${disk_usage}%" } # Main monitoring loop while true; do monitor_api monitor_resources sleep 60 # Monitor every minute done ``` ## 9. Backup and Maintenance ### 9.1 Data Backup Strategy #### Backup Script Create `scripts/backup.sh`: ```bash #!/bin/bash # Configuration BACKUP_DIR="$HOME/deepwiki-backups" DATE=$(date +%Y%m%d_%H%M%S) BACKUP_NAME="deepwiki_backup_$DATE" # Create backup directory mkdir -p "$BACKUP_DIR" # Backup function create_backup() { echo "🔄 Starting backup process..." # Create backup folder BACKUP_PATH="$BACKUP_DIR/$BACKUP_NAME" mkdir -p "$BACKUP_PATH" # Backup configuration echo "📁 Backing up configuration..." cp -r api/config "$BACKUP_PATH/" cp .env "$BACKUP_PATH/" 2>/dev/null || echo "No .env file found" # Backup generated wikis echo "📚 Backing up wiki cache..." if [ -d "$HOME/.adalflow/wikicache" ]; then cp -r "$HOME/.adalflow/wikicache" "$BACKUP_PATH/" fi # Backup vector databases echo "🗄️ Backing up databases..." if [ -d "$HOME/.adalflow/databases" ]; then cp -r "$HOME/.adalflow/databases" "$BACKUP_PATH/" fi # Backup logs echo "📊 Backing up logs..." cp -r logs "$BACKUP_PATH/" 2>/dev/null || echo "No logs directory found" # Create archive echo "🗜️ Creating archive..." cd "$BACKUP_DIR" tar -czf "$BACKUP_NAME.tar.gz" "$BACKUP_NAME" rm -rf "$BACKUP_NAME" echo "✅ Backup completed: $BACKUP_DIR/$BACKUP_NAME.tar.gz" # Cleanup old backups (keep last 7 days) find "$BACKUP_DIR" -name "deepwiki_backup_*.tar.gz" -mtime +7 -delete echo "🧹 Cleaned up old backups" } # Restore function restore_backup() { if [ -z "$1" ]; then echo "Usage: $0 restore " exit 1 fi BACKUP_FILE="$1" if [ ! -f "$BACKUP_FILE" ]; then echo "❌ Backup file not found: $BACKUP_FILE" exit 1 fi echo "🔄 Restoring from backup: $BACKUP_FILE" # Extract backup TEMP_DIR=$(mktemp -d) tar -xzf "$BACKUP_FILE" -C "$TEMP_DIR" # Restore configuration echo "📁 Restoring configuration..." cp -r "$TEMP_DIR"/*/config api/ 2>/dev/null || echo "No config backup found" cp "$TEMP_DIR"/*/.env . 2>/dev/null || echo "No .env backup found" # Restore wiki cache echo "📚 Restoring wiki cache..." mkdir -p "$HOME/.adalflow" cp -r "$TEMP_DIR"/*/wikicache "$HOME/.adalflow/" 2>/dev/null || echo "No wikicache backup found" # Restore databases echo "🗄️ Restoring databases..." cp -r "$TEMP_DIR"/*/databases "$HOME/.adalflow/" 2>/dev/null || echo "No databases backup found" # Cleanup rm -rf "$TEMP_DIR" echo "✅ Restore completed" } # Main script case "$1" in "backup") create_backup ;; "restore") restore_backup "$2" ;; *) echo "Usage: $0 {backup|restore }" echo "Example: $0 backup" echo "Example: $0 restore ~/deepwiki-backups/deepwiki_backup_20231201_120000.tar.gz" exit 1 ;; esac ``` ### 9.2 Maintenance Tasks #### Database Cleanup Script Create `scripts/maintenance.py`: ```python #!/usr/bin/env python3 import os import shutil import glob from datetime import datetime, timedelta from pathlib import Path def cleanup_old_repositories(days_old=30): """Remove repositories older than specified days""" repos_dir = Path.home() / ".adalflow" / "repos" if not repos_dir.exists(): print("No repositories directory found") return cutoff_date = datetime.now() - timedelta(days=days_old) cleaned_count = 0 for repo_dir in repos_dir.iterdir(): if repo_dir.is_dir(): mod_time = datetime.fromtimestamp(repo_dir.stat().st_mtime) if mod_time < cutoff_date: print(f"Removing old repository: {repo_dir.name}") shutil.rmtree(repo_dir) cleaned_count += 1 print(f"Cleaned up {cleaned_count} old repositories") def cleanup_old_wikis(days_old=30): """Remove wiki cache older than specified days""" wiki_dir = Path.home() / ".adalflow" / "wikicache" if not wiki_dir.exists(): print("No wiki cache directory found") return cutoff_date = datetime.now() - timedelta(days=days_old) cleaned_count = 0 for wiki_file in wiki_dir.glob("*.json"): mod_time = datetime.fromtimestamp(wiki_file.stat().st_mtime) if mod_time < cutoff_date: print(f"Removing old wiki: {wiki_file.name}") wiki_file.unlink() cleaned_count += 1 print(f"Cleaned up {cleaned_count} old wiki files") def cleanup_logs(days_old=7): """Remove log files older than specified days""" logs_dir = Path("logs") if not logs_dir.exists(): print("No logs directory found") return cutoff_date = datetime.now() - timedelta(days=days_old) cleaned_count = 0 for log_file in logs_dir.glob("*.log*"): if log_file.is_file(): mod_time = datetime.fromtimestamp(log_file.stat().st_mtime) if mod_time < cutoff_date: print(f"Removing old log: {log_file.name}") log_file.unlink() cleaned_count += 1 print(f"Cleaned up {cleaned_count} old log files") def optimize_vector_databases(): """Optimize vector databases by removing unused indexes""" db_dir = Path.home() / ".adalflow" / "databases" if not db_dir.exists(): print("No databases directory found") return repos_dir = Path.home() / ".adalflow" / "repos" active_repos = set() if repos_dir.exists(): active_repos = {repo.name for repo in repos_dir.iterdir() if repo.is_dir()} cleaned_count = 0 for db_dir_item in db_dir.iterdir(): if db_dir_item.is_dir() and db_dir_item.name not in active_repos: print(f"Removing unused database: {db_dir_item.name}") shutil.rmtree(db_dir_item) cleaned_count += 1 print(f"Cleaned up {cleaned_count} unused databases") def main(): print(f"🧹 Starting maintenance tasks at {datetime.now()}") try: cleanup_old_repositories(30) cleanup_old_wikis(30) cleanup_logs(7) optimize_vector_databases() print("✅ Maintenance tasks completed successfully") except Exception as e: print(f"❌ Error during maintenance: {e}") if __name__ == "__main__": main() ``` #### Automated Maintenance with Cron Add to crontab (`crontab -e`): ```bash # Daily maintenance at 2 AM 0 2 * * * /path/to/deepwiki-open/scripts/maintenance.py >> /path/to/deepwiki-open/logs/maintenance.log 2>&1 # Weekly backup on Sundays at 3 AM 0 3 * * 0 /path/to/deepwiki-open/scripts/backup.sh backup >> /path/to/deepwiki-open/logs/backup.log 2>&1 # Performance monitoring every minute * * * * * /path/to/deepwiki-open/scripts/performance_monitor.sh ``` ## 10. Troubleshooting ### 10.1 Common Issues and Solutions #### Python Environment Issues ```bash # Issue: ModuleNotFoundError # Solution: Verify virtual environment activation which python pip list | grep fastapi # Issue: Permission denied # Solution: Check file permissions chmod +x scripts/*.sh chmod +x scripts/*.py # Issue: Port already in use # Solution: Find and kill process lsof -ti:8001 | xargs kill -9 lsof -ti:3000 | xargs kill -9 ``` #### Node.js Issues ```bash # Issue: npm ERR! permission denied # Solution: Use nvm or fix npm permissions npm config set prefix '~/.npm-global' export PATH=~/.npm-global/bin:$PATH # Issue: Module not found # Solution: Clear cache and reinstall rm -rf node_modules package-lock.json npm cache clean --force npm install ``` #### API Connection Issues ```bash # Check if services are running curl -I http://localhost:8001/health curl -I http://localhost:3000 # Check firewall settings # Ubuntu/Debian sudo ufw status sudo ufw allow 8001 sudo ufw allow 3000 # CentOS/RHEL sudo firewall-cmd --list-ports sudo firewall-cmd --add-port=8001/tcp --permanent sudo firewall-cmd --add-port=3000/tcp --permanent sudo firewall-cmd --reload ``` ### 10.2 Performance Optimization #### System Optimization ```bash # Increase file descriptor limits echo "* soft nofile 65536" | sudo tee -a /etc/security/limits.conf echo "* hard nofile 65536" | sudo tee -a /etc/security/limits.conf # Optimize Python performance export PYTHONUNBUFFERED=1 export PYTHONDONTWRITEBYTECODE=1 # Node.js optimization export NODE_OPTIONS="--max-old-space-size=4096" ``` #### Application Optimization Edit `api/main.py` for production optimizations: ```python import uvicorn from api.api import app if __name__ == "__main__": uvicorn.run( "api.api:app", host="0.0.0.0", port=8001, workers=4, # Adjust based on CPU cores loop="uvloop", # Performance improvement http="httptools", # Performance improvement access_log=False, # Disable in production server_header=False, # Security date_header=False, # Performance ) ``` ## 11. Security Considerations ### 11.1 API Security #### Rate Limiting Add to `api/api.py`: ```python from slowapi import Limiter, _rate_limit_exceeded_handler from slowapi.util import get_remote_address from slowapi.errors import RateLimitExceeded limiter = Limiter(key_func=get_remote_address) app.state.limiter = limiter app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler) @app.get("/api/wiki/generate") @limiter.limit("5/minute") async def generate_wiki(request: Request, ...): # Implementation pass ``` #### Input Validation ```python from pydantic import BaseModel, validator import re class RepositoryRequest(BaseModel): repo_url: str access_token: Optional[str] = None @validator('repo_url') def validate_repo_url(cls, v): pattern = r'^https?://(github|gitlab|bitbucket)\.(com|org)/[\w\-\.]+/[\w\-\.]+/?$' if not re.match(pattern, v): raise ValueError('Invalid repository URL format') return v ``` ### 11.2 Environment Security ```bash # Secure .env file chmod 600 .env # Use environment-specific configurations # Development export DEEPWIKI_ENV=development # Production export DEEPWIKI_ENV=production ``` ## 12. Advanced Configuration ### 12.1 Custom Model Configurations Edit `api/config/generator.json`: ```json { "providers": { "google": { "default_model": "gemini-2.0-flash", "models": ["gemini-2.0-flash", "gemini-1.5-flash", "gemini-1.0-pro"], "api_base": "https://generativelanguage.googleapis.com/v1beta", "parameters": { "temperature": 0.7, "top_p": 0.9, "max_tokens": 8192 } }, "openai": { "default_model": "gpt-4o", "models": ["gpt-4o", "gpt-4-turbo", "gpt-3.5-turbo"], "api_base": "https://api.openai.com/v1", "parameters": { "temperature": 0.7, "top_p": 1.0, "max_tokens": 4096 } } } } ``` ### 12.2 Custom Embedding Configuration Edit `api/config/embedder.json`: ```json { "embedder": { "provider": "openai", "model": "text-embedding-ada-002", "dimensions": 1536, "batch_size": 100 }, "retriever": { "similarity_top_k": 5, "similarity_threshold": 0.7, "vector_store_type": "faiss", "index_type": "IndexFlatIP" }, "text_splitter": { "type": "recursive_character", "chunk_size": 1000, "chunk_overlap": 200, "separators": ["\n\n", "\n", " ", ""] } } ``` ## Conclusion This manual setup guide provides comprehensive control over your DeepWiki-Open installation. The manual approach offers: * **Full Control**: Complete visibility into every component and configuration * **Customization**: Ability to modify any aspect of the system * **Debugging**: Direct access to logs and processes for troubleshooting * **Performance Tuning**: Fine-grained control over resource allocation * **Security**: Implementation of custom security measures Choose the components and configurations that best fit your development workflow and production requirements. Regular maintenance and monitoring will ensure optimal performance and reliability of your DeepWiki-Open installation. For additional support, refer to the project's GitHub repository or community forums. # Mermaid Diagrams in DeepWiki Source: https://asyncfunc.mintlify.app/guides/mermaid-diagrams Comprehensive guide to creating, customizing, and managing Mermaid diagrams in DeepWiki documentation # Mermaid Diagrams in DeepWiki DeepWiki leverages the power of Mermaid to automatically generate beautiful, interactive diagrams from your codebase and documentation. This guide covers everything you need to know about working with Mermaid diagrams in DeepWiki. ## What are Mermaid Diagrams? Mermaid is a JavaScript-based diagramming tool that renders markdown-inspired text definitions to create diagrams dynamically. In DeepWiki, these diagrams help visualize: * Code architecture and relationships * Data flow and process workflows * System interactions and dependencies * Database schemas and entity relationships * Project timelines and milestones ### Benefits of Mermaid Diagrams * **Version Control Friendly**: Text-based definitions that work with Git * **Automatically Generated**: DeepWiki creates diagrams from your code structure * **Interactive**: Zoom, pan, and click functionality * **Customizable**: Extensive styling and theming options * **Lightweight**: No external image files needed * **Accessible**: Screen reader compatible with proper markup ## Types of Diagrams Generated DeepWiki automatically generates various diagram types based on your codebase analysis: ### 1. Flowcharts Perfect for visualizing code logic, decision trees, and process flows. ```mermaid graph TD A[Start] --> B{Is user authenticated?} B -->|Yes| C[Load Dashboard] B -->|No| D[Show Login Form] D --> E[Validate Credentials] E -->|Valid| F[Create Session] E -->|Invalid| G[Show Error] F --> C G --> D C --> H[End] ``` ### 2. Sequence Diagrams Ideal for API interactions, user flows, and system communications. ```mermaid sequenceDiagram participant User participant Frontend participant API participant Database User->>Frontend: Submit Form Frontend->>API: POST /api/users API->>Database: INSERT user data Database-->>API: Success response API-->>Frontend: User created (201) Frontend-->>User: Success message ``` ### 3. Class Diagrams Generated from your codebase to show object-oriented relationships. ```mermaid classDiagram class User { -String id -String email -String password +login() +logout() +updateProfile() } class Admin { -List~Permission~ permissions +manageUsers() +viewAnalytics() } class Article { -String title -String content -Date publishedAt +publish() +archive() } User <|-- Admin User ||--o{ Article : creates ``` ### 4. Entity Relationship Diagrams Database schema visualization from your models and migrations. ```mermaid erDiagram USERS { uuid id PK string email UK string password_hash timestamp created_at timestamp updated_at } ARTICLES { uuid id PK uuid author_id FK string title text content enum status timestamp published_at timestamp created_at timestamp updated_at } COMMENTS { uuid id PK uuid article_id FK uuid user_id FK text content timestamp created_at } USERS ||--o{ ARTICLES : writes ARTICLES ||--o{ COMMENTS : has USERS ||--o{ COMMENTS : writes ``` ### 5. State Diagrams System state transitions and workflow states. ```mermaid stateDiagram-v2 [*] --> Draft Draft --> Review : submit Review --> Approved : approve Review --> Rejected : reject Review --> Draft : request_changes Approved --> Published : publish Published --> Archived : archive Rejected --> Draft : revise Archived --> [*] ``` ### 6. Git Graphs Version control branch visualization. ```mermaid gitgraph commit commit branch feature checkout feature commit commit checkout main commit merge feature commit branch hotfix checkout hotfix commit checkout main merge hotfix ``` ### 7. Gantt Charts Project timelines and development schedules. ```mermaid gantt title DeepWiki Development Timeline dateFormat YYYY-MM-DD section Phase 1 Research & Planning :a1, 2024-01-01, 30d Design Architecture :a2, after a1, 20d section Phase 2 Core Development :b1, after a2, 45d Testing & QA :b2, after b1, 15d section Phase 3 Documentation :c1, after b2, 10d Deployment :c2, after c1, 5d ``` ## Automatic Diagram Generation DeepWiki's AI analyzes your codebase and automatically generates appropriate diagrams: ### Code Analysis Process 1. **File Structure Analysis**: Scans directories, files, and imports 2. **Dependency Mapping**: Identifies relationships between modules 3. **Data Flow Detection**: Traces function calls and data transformations 4. **Pattern Recognition**: Identifies common architectural patterns 5. **Diagram Selection**: Chooses optimal diagram types for each context ### Automatic Generation Features * **Smart Context Detection**: Different diagrams for different code contexts * **Relationship Inference**: Automatically detects class inheritance and composition * **API Flow Mapping**: Generates sequence diagrams from API endpoint analysis * **Database Schema Extraction**: Creates ERDs from model definitions * **Workflow Recognition**: Identifies state machines and business processes ## Interactive Features DeepWiki's Mermaid diagrams are fully interactive: ### Navigation Controls * **Zoom**: Mouse wheel or pinch gestures * **Pan**: Click and drag to move around large diagrams * **Reset**: Double-click to return to original view * **Full Screen**: Expand diagrams for detailed viewing ### Click Interactions ```mermaid graph LR A[Component A] --> B[Component B] B --> C[Component C] C --> D[Component D] click A "https://github.com/user/repo/blob/main/src/ComponentA.js" click B "https://github.com/user/repo/blob/main/src/ComponentB.js" click C "https://github.com/user/repo/blob/main/src/ComponentC.js" click D "https://github.com/user/repo/blob/main/src/ComponentD.js" ``` * **Source Code Links**: Click nodes to jump to corresponding code * **Documentation Links**: Navigate to related documentation sections * **External Resources**: Link to external APIs, services, or documentation ## Diagram Customization ### Theme Configuration DeepWiki supports multiple built-in themes and custom styling: ```javascript // Custom theme configuration { "theme": "dark", "themeVariables": { "primaryColor": "#ff6b6b", "primaryTextColor": "#ffffff", "primaryBorderColor": "#ff4757", "lineColor": "#70a1ff", "sectionBkgColor": "#2f3542", "altSectionBkgColor": "#57606f", "gridColor": "#3d4454", "secondaryColor": "#7bed9f", "tertiaryColor": "#5352ed" } } ``` ### Style Customization Apply custom CSS classes to diagram elements: ```mermaid graph TD A[Normal Node] B[Highlighted Node]:::highlight C[Warning Node]:::warning D[Success Node]:::success classDef highlight fill:#ff6b6b,stroke:#ff4757,color:#fff classDef warning fill:#ffa502,stroke:#ff8c00,color:#fff classDef success fill:#2ed573,stroke:#26de81,color:#fff ``` ### Advanced Styling Options * **Custom CSS Classes**: Define reusable styles * **Node Shapes**: Choose from various shapes (rectangles, circles, diamonds) * **Edge Styles**: Customize line types (solid, dashed, dotted) * **Color Schemes**: Brand-specific color palettes * **Font Customization**: Typography and sizing options ## Troubleshooting Diagram Rendering ### Common Issues and Solutions #### 1. Diagram Not Rendering **Symptoms**: Blank space where diagram should appear **Solutions**: * Check Mermaid syntax for errors * Verify JavaScript is enabled in browser * Clear browser cache and refresh * Check browser console for error messages #### 2. Syntax Errors **Symptoms**: Error messages in diagram area **Common fixes**: ```mermaid // ❌ Incorrect graph TD A -> B // Wrong arrow syntax // ✅ Correct graph TD A --> B ``` #### 3. Performance Issues with Large Diagrams **Symptoms**: Slow loading or browser freezing **Solutions**: * Break large diagrams into smaller sections * Use subgraphs to organize complex diagrams * Implement lazy loading for diagram-heavy pages * Consider static image generation for very complex diagrams #### 4. Mobile Rendering Issues **Symptoms**: Diagrams too small or not responsive **Solutions**: * Use responsive container sizing * Implement touch-friendly zoom controls * Consider simplified mobile versions * Test on various device sizes ### Debug Mode Enable debug mode for detailed error information: ```javascript // Enable Mermaid debug mode mermaid.initialize({ startOnLoad: true, logLevel: 'debug', securityLevel: 'loose' }); ``` ## Manual Diagram Editing ### Inline Diagram Creation Add custom diagrams directly in your documentation: ````markdown ## System Architecture ```mermaid graph TB subgraph "Frontend" UI[React UI] Store[Redux Store] end subgraph "Backend" API[Express API] Auth[Auth Service] DB[(Database)] end UI --> Store Store --> API API --> Auth API --> DB ```` ```` ### Dynamic Diagram Generation Use DeepWiki's API to generate diagrams programmatically: ```javascript // Generate diagram from code analysis const diagram = await deepwiki.generateDiagram({ type: 'flowchart', source: 'src/components/', options: { theme: 'dark', direction: 'TD' } }); ```` ### Diagram Templates Create reusable diagram templates: ```yaml # diagram-templates.yaml api_flow: type: sequence participants: - Client - API Gateway - Microservice - Database interactions: - Client->API Gateway: Request - API Gateway->Microservice: Forward - Microservice->Database: Query - Database-->Microservice: Result - Microservice-->API Gateway: Response - API Gateway-->Client: Response ``` ## Exporting Diagrams ### SVG Export Export diagrams as scalable SVG files: ```javascript // Export current diagram as SVG const svgData = mermaid.render('diagram-id', diagramDefinition); const svgBlob = new Blob([svgData], {type: 'image/svg+xml'}); const downloadUrl = URL.createObjectURL(svgBlob); ``` ### PNG/JPEG Export Convert to raster formats for presentations: ```javascript // Convert SVG to PNG using canvas const canvas = document.createElement('canvas'); const ctx = canvas.getContext('2d'); const img = new Image(); img.onload = () => { canvas.width = img.width; canvas.height = img.height; ctx.drawImage(img, 0, 0); const pngUrl = canvas.toDataURL('image/png'); }; img.src = 'data:image/svg+xml;base64,' + btoa(svgData); ``` ### PDF Integration Include diagrams in PDF documentation: ```javascript // Generate PDF with embedded diagrams const pdf = new jsPDF(); const diagramSvg = await mermaid.render('diagram', definition); pdf.addSVG(diagramSvg, 10, 10, 190, 100); pdf.save('documentation.pdf'); ``` ## Best Practices for Diagram-Heavy Documentation ### 1. Strategic Diagram Placement * **Overview First**: Start with high-level architecture diagrams * **Progressive Detail**: Follow with detailed component diagrams * **Context Switching**: Use diagrams to transition between topics * **Reference Points**: Include diagrams as visual anchors ### 2. Diagram Organization ```markdown ## Architecture Overview ### Frontend Architecture ### Backend Services ### Data Flow ### Database Schema ``` ### 3. Accessibility Considerations * **Alt Text**: Provide descriptive text alternatives * **High Contrast**: Ensure sufficient color contrast * **Keyboard Navigation**: Support keyboard-only users * **Screen Reader Support**: Include semantic markup ```mermaid graph TD A[Start Process] B{Decision Point} C[Path A] D[Path B] E[End Process] A --> B B -->|Condition 1| C B -->|Condition 2| D C --> E D --> E %% Accessibility improvements A -.- AA["`**Start**: Initialize the process with user input`"] B -.- BB["`**Decision**: Evaluate user permissions and route accordingly`"] E -.- EE["`**End**: Complete process and notify user`"] ``` ### 4. Performance Optimization * **Lazy Loading**: Load diagrams on demand * **Caching**: Cache generated diagrams * **Compression**: Optimize SVG output * **Progressive Enhancement**: Provide fallback text descriptions ### 5. Version Control Best Practices * **Atomic Commits**: Include diagram updates with related code changes * **Meaningful Names**: Use descriptive diagram IDs and titles * **Documentation**: Comment complex diagram definitions * **Review Process**: Include diagrams in code review workflows ## Advanced Examples ### Complex System Architecture ```mermaid graph TB subgraph "Client Layer" Web[Web App] Mobile[Mobile App] API_GW[API Gateway] end subgraph "Service Layer" Auth[Auth Service] User[User Service] Content[Content Service] Analytics[Analytics Service] end subgraph "Data Layer" UserDB[(User DB)] ContentDB[(Content DB)] Cache[(Redis Cache)] Queue[(Message Queue)] end subgraph "External Services" CDN[CDN] Email[Email Service] Payment[Payment Gateway] end Web --> API_GW Mobile --> API_GW API_GW --> Auth API_GW --> User API_GW --> Content API_GW --> Analytics Auth --> UserDB User --> UserDB Content --> ContentDB Analytics --> Queue User --> Cache Content --> Cache Content --> CDN User --> Email User --> Payment style Auth fill:#ff9999 style User fill:#99ff99 style Content fill:#9999ff style Analytics fill:#ffff99 ``` ### Database Relationship Diagram ```mermaid erDiagram ORGANIZATION { uuid id PK string name string slug UK text description timestamp created_at timestamp updated_at } USERS { uuid id PK uuid organization_id FK string email UK string name enum role timestamp last_login timestamp created_at timestamp updated_at } PROJECTS { uuid id PK uuid organization_id FK uuid owner_id FK string name string repository_url enum status json config timestamp last_processed timestamp created_at timestamp updated_at } WIKI_PAGES { uuid id PK uuid project_id FK string path string title text content text summary json metadata timestamp generated_at timestamp updated_at } DIAGRAMS { uuid id PK uuid page_id FK string type text definition json config string svg_data timestamp generated_at } ORGANIZATION ||--o{ USERS : contains ORGANIZATION ||--o{ PROJECTS : owns USERS ||--o{ PROJECTS : manages PROJECTS ||--o{ WIKI_PAGES : contains WIKI_PAGES ||--o{ DIAGRAMS : includes ``` ### API Interaction Flow ```mermaid sequenceDiagram participant Client participant Gateway participant Auth participant Wiki participant AI participant Storage Note over Client,Storage: Wiki Generation Request Client->>Gateway: POST /api/wiki/generate Gateway->>Auth: Validate token Auth-->>Gateway: User info Gateway->>Wiki: Process repository Wiki->>Storage: Fetch repository data Storage-->>Wiki: Repository files Wiki->>AI: Analyze codebase AI-->>Wiki: Structure analysis Wiki->>AI: Generate content AI-->>Wiki: Wiki pages Wiki->>Storage: Save pages Wiki->>Wiki: Generate diagrams Wiki-->>Gateway: Generation complete Gateway-->>Client: Success response Note over Client,Storage: Real-time Updates loop Real-time updates Wiki->>Client: WebSocket update Client->>Client: Update UI end ``` ## Conclusion Mermaid diagrams in DeepWiki provide a powerful way to visualize and communicate complex software architectures, processes, and relationships. By leveraging automatic generation, interactive features, and extensive customization options, you can create documentation that is both informative and engaging. Remember to: * Use appropriate diagram types for different contexts * Optimize for performance with large documentation sites * Consider accessibility in diagram design * Maintain diagrams as living documentation alongside your code * Leverage automation while retaining the flexibility for manual customization With these tools and techniques, you can create documentation that truly brings your codebase to life through visual storytelling. # Private Repository Access Source: https://asyncfunc.mintlify.app/guides/private-repositories Complete guide to accessing private repositories on GitHub, GitLab, and BitBucket with personal access tokens, security best practices, and troubleshooting DeepWiki-Open can access private repositories across multiple platforms using personal access tokens. This comprehensive guide covers token creation, management, security best practices, and troubleshooting for GitHub, GitLab, and BitBucket. ## Overview Private repository access requires authentication tokens that prove your authorization to access the repository content. DeepWiki supports multiple platforms and token types with different permission models. Classic and fine-grained personal access tokens with repository scope Personal access tokens with read\_repository scope App passwords with repository read permissions Organization and enterprise-specific considerations ## GitHub Access Tokens GitHub offers two types of personal access tokens with different scopes and capabilities. ### Classic Personal Access Tokens 1. Go to GitHub.com and sign in 2. Click your profile picture → **Settings** 3. In the left sidebar, click **Developer settings** 4. Click **Personal access tokens** → **Tokens (classic)** Classic tokens provide broad access but are easier to set up for multiple repositories. 1. Click **Generate new token** → **Generate new token (classic)** 2. Enter a descriptive note (e.g., "DeepWiki Documentation Access") 3. Set expiration (recommended: 90 days for security) 4. Select scopes based on your needs: **Required scope:** `repo` * Full control of private repositories * Includes read/write access to code, issues, pull requests * **Use when:** Accessing private repositories you own or have been granted access to **Required scope:** `public_repo` * Access to public repositories only * More limited permissions * **Use when:** Only accessing public repositories (DeepWiki works without tokens for public repos) **Additional scope:** `read:org` * Read organization membership * Required for some organization private repositories * **Use when:** Accessing private repositories owned by organizations 1. Click **Generate token** 2. **Important:** Copy the token immediately - you won't see it again 3. Store securely (see security best practices below) GitHub classic tokens start with `ghp_` and are 40 characters long. Never share or commit tokens to code repositories. ### Fine-Grained Personal Access Tokens (Beta) For more granular control over repository access: 1. In **Developer settings** → **Personal access tokens** 2. Click **Fine-grained tokens** → **Generate new token** 3. Configure token details: * **Token name:** Descriptive name * **Expiration:** 90 days recommended * **Resource owner:** Select your account or organization Choose repository access level: **Best for:** Specific repositories * Click "Selected repositories" * Choose specific repositories from dropdown * More secure, limited scope **Permissions needed:** * **Repository permissions:** Contents (Read) * **Metadata:** Read **Best for:** Multiple repositories * Select "All repositories" under resource owner * Grants access to all current and future repositories * Less secure but more convenient **Permissions needed:** * **Repository permissions:** Contents (Read) * **Metadata:** Read Set minimum required permissions: ```json { "Contents": "Read", "Metadata": "Read", "Pull requests": "Read", "Issues": "Read" } ``` Fine-grained tokens provide better security through specific repository and permission selection. ### GitHub Enterprise For GitHub Enterprise Server instances: **Token creation process:** 1. Navigate to your GitHub Enterprise instance 2. Follow same steps as GitHub.com 3. Ensure your DeepWiki instance can reach enterprise server **Additional considerations:** * Network connectivity requirements * Certificate trust for self-signed certificates * Custom API endpoints configuration ```javascript { "repo_url": "https://github.enterprise.com/company/private-repo", "access_token": "ghp_xxxxxxxxxxxxxxxxxxxx", "enterprise_base_url": "https://github.enterprise.com" } ``` **With SAML SSO enabled:** 1. Create personal access token normally 2. Authorize token for SSO: * Go to token settings * Click "Configure SSO" next to your organization * Click "Authorize" **Enterprise considerations:** * Organization policies may restrict token creation * Admin approval may be required * Token expiration policies may be enforced ## GitLab Access Tokens GitLab uses personal access tokens with specific scope-based permissions. ### Personal Access Token Creation 1. Sign in to GitLab.com or your GitLab instance 2. Click your avatar → **Edit profile** 3. In the left sidebar, click **Access Tokens** GitLab tokens are more granular than GitHub classic tokens, allowing precise permission control. 1. Click **Add new token** 2. Configure token settings: * **Token name:** Descriptive name (e.g., "DeepWiki Access") * **Expiration date:** Set appropriate expiration * **Select scopes:** Choose required permissions **For private repository access:** * ✅ `read_repository` - Clone and pull from repositories * ⚠️ `read_user` - Read user information (optional) * ⚠️ `read_api` - Read API access (if using API features) **Avoid unnecessary scopes:** * ❌ `write_repository` - Not needed for documentation * ❌ `api` - Full API access (too broad) **For group projects:** * Add `read_repository` scope * Ensure you're a group member * Check group access policies **For specific projects:** * Token automatically inherits project access * No additional configuration needed * Respects project visibility settings 1. Click **Create personal access token** 2. Copy the generated token immediately 3. Store securely with appropriate labels GitLab tokens start with `glpat-` followed by 20 characters. They cannot be viewed again after creation. ### GitLab Self-Managed For self-hosted GitLab instances: **Configuration requirements:** * Same token creation process * Verify network connectivity to your GitLab instance * Check SSL certificate configuration ```javascript { "repo_url": "https://gitlab.company.com/team/private-project", "access_token": "glpat-xxxxxxxxxxxxxxxxxxxx", "gitlab_base_url": "https://gitlab.company.com" } ``` **Instance-level settings:** * Personal access token policies * Maximum token lifetime * Required scopes restrictions * API rate limiting **Security policies:** * LDAP/SAML integration impacts * IP allowlists for API access * Audit logging for token usage ## BitBucket Access BitBucket uses app passwords instead of traditional tokens, with different permission models. ### App Password Creation 1. Sign in to BitBucket.org 2. Click your avatar → **Personal BitBucket settings** 3. In the left menu, click **App passwords** BitBucket app passwords are repository-specific credentials with granular permissions. 1. Click **Create app password** 2. Configure password settings: * **Label:** Descriptive name (e.g., "DeepWiki Documentation") * **Permissions:** Select required access levels **Required permissions:** * ✅ **Repositories: Read** - Access repository content * ✅ **Pull requests: Read** - Access PR information (optional) * ⚠️ **Issues: Read** - Access issues (optional) **Avoid unnecessary permissions:** * ❌ **Repositories: Write** - Not needed for documentation * ❌ **Repositories: Admin** - Administrative access * ❌ **Account** - Account-level access **Team repositories:** * Ensure you have repository access * App password inherits team permissions * Verify with team administrators **Personal repositories:** * Full access to your own repositories * No additional configuration needed 1. Click **Create** 2. Copy the generated app password 3. Store with username for authentication BitBucket app passwords are unique strings (not prefixed). You'll need both your username and app password for authentication. ### BitBucket Server/Data Center For on-premises BitBucket instances: **BitBucket Server uses personal access tokens:** 1. Go to your BitBucket Server instance 2. Click your avatar → **Manage account** 3. Click **Personal access tokens** 4. Create token with **Repository read** permission ```javascript { "repo_url": "https://bitbucket.company.com/projects/TEAM/repos/private-repo", "access_token": "your-personal-access-token", "bitbucket_base_url": "https://bitbucket.company.com" } ``` **Alternative authentication method:** * Use username and app password * Base64 encode credentials * Less secure than tokens ```javascript { "repo_url": "https://bitbucket.company.com/projects/TEAM/repos/private-repo", "username": "your-username", "password": "your-app-password" } ``` ## Token Security Best Practices ### Secure Token Storage **Local development:** ```bash # Use environment files with restricted permissions echo "GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx" > .env.local chmod 600 .env.local # Add to gitignore echo ".env*" >> .gitignore echo "*.token" >> .gitignore ``` **Configuration management:** ```javascript // Use environment variables const config = { github_token: process.env.GITHUB_TOKEN, gitlab_token: process.env.GITLAB_TOKEN, bitbucket_token: process.env.BITBUCKET_TOKEN }; // Never hardcode tokens // ❌ Bad const token = "ghp_actual_token_here"; // ✅ Good const token = process.env.GITHUB_TOKEN; ``` **Container secrets:** ```docker # Docker secrets docker secret create github_token /path/to/token/file # Docker Compose version: '3.8' services: deepwiki: image: deepwiki-open secrets: - github_token environment: - GITHUB_TOKEN_FILE=/run/secrets/github_token ``` **Kubernetes secrets:** ```yaml apiVersion: v1 kind: Secret metadata: name: repository-tokens type: Opaque data: github-token: gitlab-token: --- apiVersion: apps/v1 kind: Deployment metadata: name: deepwiki spec: template: spec: containers: - name: deepwiki env: - name: GITHUB_TOKEN valueFrom: secretKeyRef: name: repository-tokens key: github-token ``` **Automated rotation strategy:** 1. **Generate new tokens** before current ones expire 2. **Test new tokens** in staging environment 3. **Update production** with new tokens 4. **Revoke old tokens** after successful deployment 5. **Monitor for failures** and rollback if needed **Rotation schedule:** * **High-security environments:** 30-60 days * **Standard environments:** 90 days * **Development environments:** 180 days ```bash #!/bin/bash # Token rotation script example NEW_TOKEN="ghp_new_token_here" OLD_TOKEN="ghp_old_token_here" # Test new token if curl -H "Authorization: token $NEW_TOKEN" https://api.github.com/user; then # Update production kubectl patch secret repository-tokens -p='{"data":{"github-token":"'$(echo -n $NEW_TOKEN | base64)'"}}' # Revoke old token (manual step) echo "Update successful. Revoke old token: $OLD_TOKEN" else echo "New token validation failed. Aborting rotation." exit 1 fi ``` ### Access Control **Minimize token permissions:** * Use read-only scopes when possible * Avoid admin or write permissions * Prefer fine-grained tokens over classic tokens * Regular audit of token permissions Review token permissions quarterly to ensure they match current needs. **Restrict token usage:** ```nginx # Nginx configuration for IP restrictions location /api/wiki/generate { allow 192.168.1.0/24; # Internal network allow 10.0.0.0/8; # Private network deny all; # Block external access proxy_pass http://deepwiki-backend; } ``` **API endpoint protection:** ```python # Rate limiting by token from functools import lru_cache @lru_cache(maxsize=1000) def get_rate_limit(token: str): return RateLimiter(requests_per_minute=10) def validate_token_request(token: str): rate_limiter = get_rate_limit(token) if not rate_limiter.allow_request(): raise RateLimitExceeded("Token rate limit exceeded") ``` **Token usage monitoring:** ```javascript { "token_monitoring": { "track_usage": true, "alert_on_failures": true, "log_access_patterns": true, "detect_anomalies": true }, "alerts": { "token_expiry_warning": "7d", "unusual_usage_pattern": true, "failed_authentication": { "threshold": 5, "window": "1h" } } } ``` **Security event logging:** ```python import logging security_logger = logging.getLogger('security') def log_token_usage(token_hash: str, repo_url: str, success: bool): security_logger.info({ 'event': 'token_usage', 'token_hash': token_hash, 'repository': repo_url, 'success': success, 'timestamp': datetime.now().isoformat(), 'ip_address': get_client_ip() }) ``` ## Using Tokens with DeepWiki ### API Integration **Direct API calls:** ```bash # GitHub private repository curl -X POST "http://localhost:8001/wiki/generate" \ -H "Content-Type: application/json" \ -d '{ "repo_url": "https://github.com/company/private-repo", "access_token": "ghp_xxxxxxxxxxxxxxxxxxxx", "model_provider": "google" }' # GitLab private repository curl -X POST "http://localhost:8001/wiki/generate" \ -H "Content-Type: application/json" \ -d '{ "repo_url": "https://gitlab.com/company/private-project", "access_token": "glpat-xxxxxxxxxxxxxxxxxxxx", "model_provider": "google" }' # BitBucket private repository curl -X POST "http://localhost:8001/wiki/generate" \ -H "Content-Type: application/json" \ -d '{ "repo_url": "https://bitbucket.org/company/private-repo", "access_token": "your-app-password", "username": "your-username", "model_provider": "google" }' ``` **Python client example:** ```python import requests import os class DeepWikiClient: def __init__(self, base_url: str): self.base_url = base_url def generate_wiki(self, repo_url: str, platform: str = "github"): token_env_map = { "github": "GITHUB_TOKEN", "gitlab": "GITLAB_TOKEN", "bitbucket": "BITBUCKET_TOKEN" } token = os.getenv(token_env_map[platform]) if not token: raise ValueError(f"No token found for {platform}") payload = { "repo_url": repo_url, "access_token": token, "model_provider": "google", "force_regenerate": False } # Add username for BitBucket if platform == "bitbucket": payload["username"] = os.getenv("BITBUCKET_USERNAME") response = requests.post( f"{self.base_url}/wiki/generate", json=payload ) return response.json() # Usage client = DeepWikiClient("http://localhost:8001") result = client.generate_wiki( "https://github.com/company/private-repo", platform="github" ) ``` **JavaScript client example:** ```javascript class DeepWikiClient { constructor(baseUrl) { this.baseUrl = baseUrl; } async generateWiki(repoUrl, options = {}) { const { platform = 'github', accessToken, username, modelProvider = 'google', forceRegenerate = false } = options; const payload = { repo_url: repoUrl, access_token: accessToken, model_provider: modelProvider, force_regenerate: forceRegenerate }; // Add username for BitBucket if (platform === 'bitbucket' && username) { payload.username = username; } const response = await fetch(`${this.baseUrl}/wiki/generate`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(payload) }); return response.json(); } } // Usage with environment variables const client = new DeepWikiClient('http://localhost:8001'); const result = await client.generateWiki( 'https://github.com/company/private-repo', { platform: 'github', accessToken: process.env.GITHUB_TOKEN } ); ``` ### Web Interface Integration **Secure token input:** ```javascript import { useState } from 'react'; function TokenInput({ onTokenChange, platform }) { const [token, setToken] = useState(''); const [showToken, setShowToken] = useState(false); const handleTokenChange = (value) => { setToken(value); onTokenChange(value); }; const tokenPatterns = { github: /^(ghp_[a-zA-Z0-9]{36}|github_pat_[a-zA-Z0-9]+)$/, gitlab: /^glpat-[a-zA-Z0-9]{20}$/, bitbucket: /^[a-zA-Z0-9]+$/ }; const isValidToken = tokenPatterns[platform]?.test(token); return (
handleTokenChange(e.target.value)} placeholder={`Enter ${platform} access token`} className={isValidToken ? 'valid' : 'invalid'} />
{!isValidToken && token.length > 0 && (
Invalid {platform} token format
)}
); } ```
**Client-side validation:** ```javascript async function validateRepositoryAccess(repoUrl, accessToken, platform) { try { const response = await fetch('/api/validate-token', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ repo_url: repoUrl, access_token: accessToken, platform: platform }) }); const result = await response.json(); return { valid: result.valid, permissions: result.permissions, error: result.error }; } catch (error) { return { valid: false, error: 'Network error during validation' }; } } // Usage in form submission const handleSubmit = async (formData) => { const validation = await validateRepositoryAccess( formData.repoUrl, formData.accessToken, formData.platform ); if (!validation.valid) { setError(`Token validation failed: ${validation.error}`); return; } // Proceed with wiki generation generateWiki(formData); }; ```
## Organization and Enterprise Considerations ### GitHub Organizations **Common organizational restrictions:** * Personal access token policies * Required two-factor authentication * IP allowlists for API access * Audit logging requirements **Working with restrictions:** 1. **Check organization settings** before creating tokens 2. **Request necessary permissions** from administrators 3. **Use fine-grained tokens** when possible for better compliance 4. **Document token usage** for audit purposes ```javascript { "organization_requirements": { "two_factor_required": true, "saml_sso_required": true, "ip_allowlist_enabled": true, "allowed_ips": ["203.0.113.0/24", "198.51.100.0/24"], "audit_logging": true } } ``` **Enterprise GitHub features:** * Advanced security features * Custom SAML/OIDC integration * Advanced audit logging * Custom policies and restrictions **Token considerations:** * Shorter token lifetimes * Required approval workflows * Enhanced monitoring requirements * Integration with enterprise identity systems **Best practices:** 1. **Coordinate with security teams** on token policies 2. **Use service accounts** for automated access 3. **Implement token rotation** procedures 4. **Monitor token usage** closely ### GitLab Groups and Projects **GitLab group considerations:** * Group membership requirements * Project-level permissions inheritance * Shared runner restrictions * Group-level tokens (GitLab Premium+) ```javascript { "group_access": { "group_name": "company-dev-team", "member_role": "developer", "project_access_level": "read", "shared_runners_enabled": false } } ``` **Project-specific settings:** * Project visibility levels * Repository access controls * CI/CD variable access * Protected branch rules **Access verification:** ```bash # Test project access curl -H "Private-Token: glpat-xxxxxxxxxxxxxxxxxxxx" \ "https://gitlab.com/api/v4/projects/project-id" # Check specific permissions curl -H "Private-Token: glpat-xxxxxxxxxxxxxxxxxxxx" \ "https://gitlab.com/api/v4/projects/project-id/repository/files/README.md?ref=main" ``` ## Token Management at Scale ### Multi-Repository Management **Token management system:** ```python from dataclasses import dataclass from datetime import datetime, timedelta from typing import Dict, List import keyring @dataclass class RepositoryToken: platform: str token: str repositories: List[str] expires_at: datetime permissions: List[str] class TokenManager: def __init__(self): self.tokens: Dict[str, RepositoryToken] = {} def add_token(self, name: str, token: RepositoryToken): # Store in secure keyring keyring.set_password("deepwiki", name, token.token) self.tokens[name] = token def get_token_for_repo(self, repo_url: str) -> str: for token_data in self.tokens.values(): if any(repo in repo_url for repo in token_data.repositories): return keyring.get_password("deepwiki", token_data.token) return None def check_expiring_tokens(self, days: int = 7) -> List[str]: expiring = [] threshold = datetime.now() + timedelta(days=days) for name, token in self.tokens.items(): if token.expires_at < threshold: expiring.append(name) return expiring ``` **Rotation workflow:** ```python class TokenRotationService: def __init__(self, token_manager: TokenManager): self.token_manager = token_manager async def rotate_github_token(self, old_token_name: str): # This would integrate with GitHub API to create new tokens # Note: GitHub doesn't provide token creation API # This is a conceptual example old_token = self.token_manager.tokens[old_token_name] # Generate new token (manual process for GitHub) new_token_value = await self._prompt_for_new_token() # Test new token if await self._test_token_access(new_token_value, old_token.repositories): # Update stored token new_token = RepositoryToken( platform=old_token.platform, token=new_token_value, repositories=old_token.repositories, expires_at=datetime.now() + timedelta(days=90), permissions=old_token.permissions ) self.token_manager.add_token(old_token_name, new_token) # Schedule old token revocation await self._schedule_token_revocation(old_token.token) return True return False ``` ### Monitoring and Analytics **Track token performance:** ```python import logging from collections import defaultdict from datetime import datetime class TokenAnalytics: def __init__(self): self.usage_stats = defaultdict(list) self.error_stats = defaultdict(int) def log_token_usage(self, token_hash: str, repo_url: str, success: bool, response_time: float): event = { 'timestamp': datetime.now(), 'repository': repo_url, 'success': success, 'response_time': response_time } self.usage_stats[token_hash].append(event) if not success: self.error_stats[token_hash] += 1 def generate_usage_report(self, token_hash: str) -> dict: events = self.usage_stats[token_hash] if not events: return {'error': 'No usage data found'} successful_requests = sum(1 for e in events if e['success']) total_requests = len(events) avg_response_time = sum(e['response_time'] for e in events) / total_requests return { 'total_requests': total_requests, 'successful_requests': successful_requests, 'success_rate': successful_requests / total_requests * 100, 'average_response_time': avg_response_time, 'error_count': self.error_stats[token_hash], 'last_used': max(e['timestamp'] for e in events) } ``` **Token health checks:** ```bash #!/bin/bash # Token health check script check_github_token() { local token=$1 local response=$(curl -s -w "%{http_code}" -o /dev/null \ -H "Authorization: token $token" \ https://api.github.com/user) if [ "$response" = "200" ]; then echo "GitHub token: HEALTHY" return 0 else echo "GitHub token: FAILED (HTTP $response)" return 1 fi } check_gitlab_token() { local token=$1 local response=$(curl -s -w "%{http_code}" -o /dev/null \ -H "Private-Token: $token" \ https://gitlab.com/api/v4/user) if [ "$response" = "200" ]; then echo "GitLab token: HEALTHY" return 0 else echo "GitLab token: FAILED (HTTP $response)" return 1 fi } # Run health checks check_github_token "$GITHUB_TOKEN" check_gitlab_token "$GITLAB_TOKEN" # Alert on failures if [ $? -ne 0 ]; then # Send alert (email, Slack, etc.) echo "Token health check failed - alerting administrators" fi ``` ## Troubleshooting ### Common Issues **Symptom:** "Bad credentials" or "Invalid token" errors **Diagnosis checklist:** 1. **Token format validation:** ```bash # GitHub Classic: ghp_xxxxxxxxxxxxxxxxxxxx (40 chars) # GitHub Fine-grained: github_pat_xxxx (variable length) # GitLab: glpat-xxxxxxxxxxxxxxxxxxxx (25 chars) # BitBucket: Variable format ``` 2. **Token expiration check:** ```bash # Check GitHub token curl -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/user # Check GitLab token curl -H "Private-Token: $GITLAB_TOKEN" https://gitlab.com/api/v4/user ``` 3. **Permission verification:** ```bash # Test repository access curl -H "Authorization: token $GITHUB_TOKEN" \ https://api.github.com/repos/owner/private-repo ``` **Solutions:** * Generate new token with correct permissions * Verify token hasn't expired * Check organization SSO requirements * Confirm repository access permissions **Symptom:** "Not Found" errors for existing repositories **Common causes:** * Token lacks repository scope * Repository is private but token has public-only permissions * Organization requires SSO authorization for tokens * User doesn't have repository access **Resolution steps:** 1. **Verify repository permissions:** ```python import requests def check_repo_access(token, repo_url): # Extract owner/repo from URL parts = repo_url.split('/') owner, repo = parts[-2], parts[-1] response = requests.get( f"https://api.github.com/repos/{owner}/{repo}", headers={"Authorization": f"token {token}"} ) return { 'status_code': response.status_code, 'accessible': response.status_code == 200, 'error': response.json() if response.status_code != 200 else None } ``` 2. **Update token permissions:** * For GitHub: Add `repo` scope for private repositories * For GitLab: Add `read_repository` scope * For BitBucket: Ensure "Repositories: Read" permission 3. **Handle organization restrictions:** * Authorize token for SSO if required * Request repository access from administrators * Use organization-approved tokens **Symptom:** "API rate limit exceeded" errors **Understanding rate limits:** * **GitHub:** 5,000 requests/hour for authenticated requests * **GitLab:** 2,000 requests/minute for personal tokens * **BitBucket:** 1,000 requests/hour for authenticated requests **Mitigation strategies:** ```python import time import requests from functools import wraps def rate_limited_request(max_retries=3, delay=60): def decorator(func): @wraps(func) def wrapper(*args, **kwargs): for attempt in range(max_retries): try: return func(*args, **kwargs) except RateLimitError as e: if attempt < max_retries - 1: time.sleep(delay * (attempt + 1)) continue raise return wrapper return decorator @rate_limited_request() def make_api_request(url, headers): response = requests.get(url, headers=headers) if response.status_code == 429: # Too Many Requests raise RateLimitError("Rate limit exceeded") return response ``` **Best practices:** * Implement exponential backoff * Cache API responses when possible * Use webhooks instead of polling * Monitor rate limit headers ### Advanced Debugging **API server debugging:** ```python import logging # Configure detailed logging logging.basicConfig( level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler('deepwiki-debug.log'), logging.StreamHandler() ] ) # Log repository access attempts repo_logger = logging.getLogger('repository_access') def log_repository_access(repo_url, token_hash, success, error=None): repo_logger.debug({ 'repository': repo_url, 'token_hash': token_hash, # Never log actual token 'success': success, 'error': str(error) if error else None, 'timestamp': datetime.now().isoformat() }) ``` **Connection testing:** ```bash #!/bin/bash # Network connectivity test test_github_connectivity() { echo "Testing GitHub API connectivity..." # Test DNS resolution nslookup api.github.com # Test HTTPS connectivity curl -I https://api.github.com/ # Test authentication curl -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/user } test_enterprise_connectivity() { local enterprise_url=$1 echo "Testing enterprise connectivity: $enterprise_url" # Test SSL certificate openssl s_client -connect ${enterprise_url}:443 -servername $enterprise_url # Test API endpoint curl -I ${enterprise_url}/api/v3/ } # Run tests test_github_connectivity test_enterprise_connectivity "github.enterprise.com" ``` **Token inspection utility:** ```python import base64 import json from datetime import datetime class TokenAnalyzer: def __init__(self): self.platform_patterns = { 'github_classic': r'^ghp_[a-zA-Z0-9]{36}$', 'github_fine_grained': r'^github_pat_[a-zA-Z0-9_]+$', 'gitlab': r'^glpat-[a-zA-Z0-9]{20}$', 'bitbucket': r'^[a-zA-Z0-9]+$' } def identify_platform(self, token: str) -> str: for platform, pattern in self.platform_patterns.items(): if re.match(pattern, token): return platform return 'unknown' def analyze_github_token(self, token: str) -> dict: headers = {'Authorization': f'token {token}'} # Get token info (doesn't exist in GitHub API) # This is conceptual - GitHub doesn't provide token introspection user_response = requests.get('https://api.github.com/user', headers=headers) if user_response.status_code != 200: return {'error': 'Invalid token or insufficient permissions'} user_data = user_response.json() # Test repository access repos_response = requests.get('https://api.github.com/user/repos', headers=headers) return { 'platform': 'github', 'user': user_data.get('login'), 'scopes': user_response.headers.get('X-OAuth-Scopes', '').split(', '), 'rate_limit': { 'limit': user_response.headers.get('X-RateLimit-Limit'), 'remaining': user_response.headers.get('X-RateLimit-Remaining'), 'reset': user_response.headers.get('X-RateLimit-Reset') }, 'repository_access': repos_response.status_code == 200 } # Usage analyzer = TokenAnalyzer() result = analyzer.analyze_github_token("ghp_xxxxxxxxxxxxxxxxxxxx") print(json.dumps(result, indent=2)) ``` ## Next Steps Learn about API-level authentication and security Implement comprehensive security measures Deploy with proper token management in production Resolve common issues and get support # null Source: https://asyncfunc.mintlify.app/guides/production-setup # Production Setup Guide This guide covers the complete production deployment of DeepWikiOpen, including architecture, security, monitoring, and scaling strategies. ## Table of Contents 1. [Production Architecture](#production-architecture) 2. [Load Balancing & High Availability](#load-balancing--high-availability) 3. [SSL/TLS Certificate Setup](#ssltls-certificate-setup) 4. [Reverse Proxy Configuration](#reverse-proxy-configuration) 5. [Security Hardening](#security-hardening) 6. [Database Optimization](#database-optimization) 7. [Monitoring & Alerting](#monitoring--alerting) 8. [Log Management](#log-management) 9. [Performance Tuning](#performance-tuning) 10. [Scaling Strategies](#scaling-strategies) 11. [Disaster Recovery](#disaster-recovery) 12. [CI/CD Pipeline](#cicd-pipeline) 13. [Cloud Deployments](#cloud-deployments) 14. [On-Premise Setup](#on-premise-setup) ## Production Architecture ### Recommended Infrastructure Layout ```mermaid graph TB CDN[CDN/CloudFlare] --> LB[Load Balancer] LB --> WAF[Web Application Firewall] WAF --> RP1[Reverse Proxy 1] WAF --> RP2[Reverse Proxy 2] RP1 --> APP1[App Server 1] RP1 --> APP2[App Server 2] RP2 --> APP3[App Server 3] RP2 --> APP4[App Server 4] APP1 --> CACHE[Redis Cluster] APP2 --> CACHE APP3 --> CACHE APP4 --> CACHE APP1 --> DB_PRIMARY[Primary DB] APP2 --> DB_PRIMARY APP3 --> DB_REPLICA[Read Replica] APP4 --> DB_REPLICA DB_PRIMARY --> DB_REPLICA CACHE --> BACKUP_CACHE[Backup Redis] DB_PRIMARY --> BACKUP_DB[Database Backups] ``` ### Component Specifications **Application Servers:** * Minimum 3 instances for high availability * CPU: 4+ cores per instance * RAM: 8GB+ per instance * Storage: SSD with 100GB+ per instance **Database:** * Primary-replica configuration * Connection pooling enabled * Automated backups every 6 hours * Point-in-time recovery capability **Cache Layer:** * Redis Cluster with 3+ nodes * Sentinel for automatic failover * Memory: 4GB+ per node **Load Balancer:** * Layer 7 (HTTP/HTTPS) load balancing * Health check endpoints * SSL termination capability ## Load Balancing & High Availability ### HAProxy Configuration ```bash # /etc/haproxy/haproxy.cfg global daemon maxconn 4096 log stdout local0 ssl-default-bind-ciphers PROFILE=SYSTEM ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000ms option httplog option dontlognull option redispatch retries 3 frontend deepwikiopen_frontend bind *:80 bind *:443 ssl crt /etc/ssl/certs/deepwikiopen.pem redirect scheme https if !{ ssl_fc } # Security headers http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" http-response set-header X-Frame-Options "SAMEORIGIN" http-response set-header X-Content-Type-Options "nosniff" http-response set-header X-XSS-Protection "1; mode=block" # Rate limiting stick-table type ip size 100k expire 30s store http_req_rate(10s) http-request track-sc0 src http-request reject if { sc_http_req_rate(0) gt 20 } default_backend deepwikiopen_backend backend deepwikiopen_backend balance roundrobin option httpchk GET /health http-check expect status 200 server app1 10.0.1.10:3000 check inter 30s fall 3 rise 2 server app2 10.0.1.11:3000 check inter 30s fall 3 rise 2 server app3 10.0.1.12:3000 check inter 30s fall 3 rise 2 server app4 10.0.1.13:3000 check inter 30s fall 3 rise 2 listen stats bind *:8080 stats enable stats uri /stats stats refresh 30s stats admin if TRUE ``` ### Nginx Load Balancer Alternative ```nginx upstream deepwikiopen_backend { least_conn; server 10.0.1.10:3000 max_fails=3 fail_timeout=30s; server 10.0.1.11:3000 max_fails=3 fail_timeout=30s; server 10.0.1.12:3000 max_fails=3 fail_timeout=30s; server 10.0.1.13:3000 max_fails=3 fail_timeout=30s; } server { listen 443 ssl http2; server_name deepwikiopen.com; ssl_certificate /etc/ssl/certs/deepwikiopen.crt; ssl_certificate_key /etc/ssl/private/deepwikiopen.key; location / { proxy_pass http://deepwikiopen_backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_connect_timeout 30s; proxy_send_timeout 30s; proxy_read_timeout 30s; } location /health { access_log off; proxy_pass http://deepwikiopen_backend; proxy_connect_timeout 5s; proxy_read_timeout 5s; } } ``` ## SSL/TLS Certificate Setup ### Let's Encrypt with Certbot ```bash # Install certbot sudo apt-get update sudo apt-get install certbot python3-certbot-nginx # Obtain certificate sudo certbot --nginx -d deepwikiopen.com -d www.deepwikiopen.com # Auto-renewal setup sudo crontab -e # Add: 0 12 * * * /usr/bin/certbot renew --quiet ``` ### Custom Certificate Setup ```bash # Generate private key openssl genrsa -out deepwikiopen.key 2048 # Generate certificate signing request openssl req -new -key deepwikiopen.key -out deepwikiopen.csr # After receiving certificate from CA sudo mkdir -p /etc/ssl/certs /etc/ssl/private sudo cp deepwikiopen.crt /etc/ssl/certs/ sudo cp deepwikiopen.key /etc/ssl/private/ sudo chmod 644 /etc/ssl/certs/deepwikiopen.crt sudo chmod 600 /etc/ssl/private/deepwikiopen.key ``` ### SSL Configuration Best Practices ```nginx # Strong SSL configuration ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512:ECDHE-RSA-AES256-GCM-SHA384; ssl_prefer_server_ciphers off; ssl_session_cache shared:SSL:10m; ssl_session_tickets off; ssl_stapling on; ssl_stapling_verify on; # HSTS add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always; ``` ## Reverse Proxy Configuration ### Nginx Reverse Proxy ```nginx # /etc/nginx/sites-available/deepwikiopen server { listen 80; server_name deepwikiopen.com www.deepwikiopen.com; return 301 https://$server_name$request_uri; } server { listen 443 ssl http2; server_name deepwikiopen.com www.deepwikiopen.com; # SSL Configuration ssl_certificate /etc/ssl/certs/deepwikiopen.crt; ssl_certificate_key /etc/ssl/private/deepwikiopen.key; include /etc/nginx/snippets/ssl-params.conf; # Security headers add_header X-Frame-Options "SAMEORIGIN" always; add_header X-Content-Type-Options "nosniff" always; add_header X-XSS-Protection "1; mode=block" always; add_header Referrer-Policy "no-referrer-when-downgrade" always; add_header Content-Security-Policy "default-src 'self' http: https: data: blob: 'unsafe-inline'" always; # Gzip compression gzip on; gzip_vary on; gzip_min_length 1024; gzip_types text/plain text/css text/xml text/javascript application/javascript application/xml+rss application/json; # Static files caching location ~* \.(jpg|jpeg|gif|png|css|js|ico|xml)$ { expires 1y; add_header Cache-Control "public, immutable"; } # API endpoints location /api/ { proxy_pass http://localhost:3000/api/; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection 'upgrade'; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_cache_bypass $http_upgrade; # Rate limiting limit_req zone=api burst=20 nodelay; } # WebSocket support location /ws { proxy_pass http://localhost:3000; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; } # Main application location / { proxy_pass http://localhost:3000; proxy_http_version 1.1; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; # Timeouts proxy_connect_timeout 60s; proxy_send_timeout 60s; proxy_read_timeout 60s; } # Health check endpoint location /health { access_log off; proxy_pass http://localhost:3000/health; } } # Rate limiting zones http { limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s; limit_req_zone $binary_remote_addr zone=general:10m rate=50r/s; } ``` ### Apache Reverse Proxy ```apache # Enable required modules sudo a2enmod proxy sudo a2enmod proxy_http sudo a2enmod proxy_balancer sudo a2enmod lbmethod_byrequests sudo a2enmod headers sudo a2enmod ssl # Virtual host configuration ServerName deepwikiopen.com ServerAlias www.deepwikiopen.com # SSL Configuration SSLEngine on SSLCertificateFile /etc/ssl/certs/deepwikiopen.crt SSLCertificateKeyFile /etc/ssl/private/deepwikiopen.key SSLProtocol all -SSLv3 -TLSv1 -TLSv1.1 SSLCipherSuite ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384 # Security headers Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains" Header always set X-Frame-Options SAMEORIGIN Header always set X-Content-Type-Options nosniff # Proxy configuration ProxyPreserveHost On ProxyRequests Off BalancerMember http://10.0.1.10:3000 BalancerMember http://10.0.1.11:3000 BalancerMember http://10.0.1.12:3000 ProxySet lbmethod=byrequests ProxyPass /balancer-manager ! ProxyPass / balancer://deepwikiopen/ ProxyPassReverse / balancer://deepwikiopen/ SetHandler balancer-manager Require ip 10.0.1.0/24 ``` ## Security Hardening ### Environment Security Configuration ```bash # Production environment variables export NODE_ENV=production export DATABASE_URL="postgresql://user:password@db-host:5432/deepwikiopen" export REDIS_URL="redis://redis-host:6379" export JWT_SECRET="your-super-secure-jwt-secret-here" export ENCRYPTION_KEY="your-32-char-encryption-key-here" export SESSION_SECRET="your-session-secret-here" export RATE_LIMIT_MAX=100 export RATE_LIMIT_WINDOW=900000 export LOG_LEVEL=info export TRUST_PROXY=true export CORS_ORIGIN="https://deepwikiopen.com" ``` ### Application Security Middleware ```javascript // security.js const helmet = require('helmet'); const rateLimit = require('express-rate-limit'); const cors = require('cors'); const securityMiddleware = (app) => { // Helmet for security headers app.use(helmet({ contentSecurityPolicy: { directives: { defaultSrc: ["'self'"], scriptSrc: ["'self'", "'unsafe-inline'", "https://cdn.jsdelivr.net"], styleSrc: ["'self'", "'unsafe-inline'", "https://fonts.googleapis.com"], imgSrc: ["'self'", "data:", "https:"], fontSrc: ["'self'", "https://fonts.gstatic.com"], connectSrc: ["'self'", "wss:", "https:"], }, }, hsts: { maxAge: 31536000, includeSubDomains: true, preload: true } })); // Rate limiting const limiter = rateLimit({ windowMs: 15 * 60 * 1000, // 15 minutes max: 100, // Limit each IP to 100 requests per windowMs message: 'Too many requests from this IP', standardHeaders: true, legacyHeaders: false, }); const apiLimiter = rateLimit({ windowMs: 15 * 60 * 1000, max: 50, message: 'Too many API requests from this IP', }); app.use('/api/', apiLimiter); app.use(limiter); // CORS configuration app.use(cors({ origin: process.env.CORS_ORIGIN || 'https://deepwikiopen.com', credentials: true, optionsSuccessStatus: 200 })); // Trust proxy for accurate IP addresses app.set('trust proxy', process.env.TRUST_PROXY === 'true'); }; module.exports = securityMiddleware; ``` ### Firewall Configuration (UFW) ```bash # Reset firewall sudo ufw --force reset # Default policies sudo ufw default deny incoming sudo ufw default allow outgoing # SSH access (change port from default 22) sudo ufw allow 2222/tcp # HTTP/HTTPS sudo ufw allow 80/tcp sudo ufw allow 443/tcp # Database (only from app servers) sudo ufw allow from 10.0.1.0/24 to any port 5432 # Redis (only from app servers) sudo ufw allow from 10.0.1.0/24 to any port 6379 # Monitoring sudo ufw allow from 10.0.1.0/24 to any port 9090 sudo ufw allow from 10.0.1.0/24 to any port 3000 # Enable firewall sudo ufw enable ``` ### Fail2Ban Configuration ```ini # /etc/fail2ban/jail.local [DEFAULT] bantime = 3600 findtime = 600 maxretry = 3 backend = systemd [sshd] enabled = true port = 2222 maxretry = 3 [nginx-http-auth] enabled = true filter = nginx-http-auth logpath = /var/log/nginx/error.log maxretry = 3 [nginx-req-limit] enabled = true filter = nginx-req-limit logpath = /var/log/nginx/error.log maxretry = 10 findtime = 600 bantime = 7200 ``` ## Database Optimization ### PostgreSQL Production Configuration ```sql -- postgresql.conf optimizations shared_buffers = 2GB effective_cache_size = 6GB maintenance_work_mem = 512MB work_mem = 32MB max_connections = 200 checkpoint_completion_target = 0.9 wal_buffers = 16MB default_statistics_target = 100 -- Enable connection pooling max_pool_size = 25 min_pool_size = 5 acquire_increment = 1 max_statements = 50 max_idle_time = 1800 ``` ### Database Backup Strategy ```bash #!/bin/bash # backup-database.sh BACKUP_DIR="/backup/postgresql" DB_NAME="deepwikiopen" DB_USER="postgres" TIMESTAMP=$(date +"%Y%m%d_%H%M%S") BACKUP_FILE="${BACKUP_DIR}/${DB_NAME}_${TIMESTAMP}.sql" # Create backup directory mkdir -p $BACKUP_DIR # Create database backup pg_dump -h localhost -U $DB_USER -d $DB_NAME > $BACKUP_FILE # Compress backup gzip $BACKUP_FILE # Upload to S3 (optional) if [ "$AWS_BACKUP" = "true" ]; then aws s3 cp "${BACKUP_FILE}.gz" s3://deepwikiopen-backups/database/ fi # Clean old backups (keep 30 days) find $BACKUP_DIR -name "*.sql.gz" -mtime +30 -delete # Verify backup integrity if [ $? -eq 0 ]; then echo "Backup completed successfully: ${BACKUP_FILE}.gz" else echo "Backup failed!" >&2 exit 1 fi ``` ### Database Monitoring ```sql -- Monitor connection usage SELECT count(*) as total_connections, count(*) filter (where state = 'active') as active_connections, count(*) filter (where state = 'idle') as idle_connections FROM pg_stat_activity; -- Monitor query performance SELECT query, calls, total_time, mean_time, rows FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10; -- Monitor database size SELECT pg_database.datname, pg_database_size(pg_database.datname) as size_bytes, pg_size_pretty(pg_database_size(pg_database.datname)) as size FROM pg_database ORDER BY size_bytes DESC; ``` ## Monitoring & Alerting ### Prometheus Configuration ```yaml # prometheus.yml global: scrape_interval: 15s evaluation_interval: 15s rule_files: - "rules/*.yml" alerting: alertmanagers: - static_configs: - targets: - alertmanager:9093 scrape_configs: - job_name: 'deepwikiopen' static_configs: - targets: ['localhost:3000'] metrics_path: /metrics scrape_interval: 30s - job_name: 'node' static_configs: - targets: - 'app1:9100' - 'app2:9100' - 'app3:9100' - 'db1:9100' - job_name: 'nginx' static_configs: - targets: ['nginx:9113'] - job_name: 'postgresql' static_configs: - targets: ['db1:9187'] - job_name: 'redis' static_configs: - targets: ['redis:9121'] ``` ### Grafana Dashboard Configuration ```json { "dashboard": { "title": "DeepWikiOpen Production Monitoring", "panels": [ { "title": "Response Time", "type": "graph", "targets": [ { "expr": "rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])", "legendFormat": "Average Response Time" } ] }, { "title": "Request Rate", "type": "graph", "targets": [ { "expr": "rate(http_requests_total[5m])", "legendFormat": "Requests/sec" } ] }, { "title": "Error Rate", "type": "graph", "targets": [ { "expr": "rate(http_requests_total{status=~\"5..\"}[5m])", "legendFormat": "5xx Errors/sec" } ] } ] } } ``` ### AlertManager Rules ```yaml # rules/alerts.yml groups: - name: deepwikiopen rules: - alert: HighResponseTime expr: avg(rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])) > 2 for: 5m labels: severity: warning annotations: summary: "High response time detected" - alert: HighErrorRate expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1 for: 2m labels: severity: critical annotations: summary: "High error rate detected" - alert: DatabaseDown expr: up{job="postgresql"} == 0 for: 1m labels: severity: critical annotations: summary: "Database is down" - alert: HighMemoryUsage expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes > 0.9 for: 5m labels: severity: warning annotations: summary: "High memory usage detected" ``` ### Health Check Implementation ```javascript // health.js const express = require('express'); const { Pool } = require('pg'); const redis = require('redis'); const router = express.Router(); const dbPool = new Pool({ connectionString: process.env.DATABASE_URL, }); const redisClient = redis.createClient({ url: process.env.REDIS_URL, }); router.get('/health', async (req, res) => { const health = { status: 'healthy', timestamp: new Date().toISOString(), services: {} }; try { // Database health check const dbStart = Date.now(); await dbPool.query('SELECT 1'); health.services.database = { status: 'healthy', responseTime: Date.now() - dbStart }; } catch (error) { health.status = 'unhealthy'; health.services.database = { status: 'unhealthy', error: error.message }; } try { // Redis health check const redisStart = Date.now(); await redisClient.ping(); health.services.cache = { status: 'healthy', responseTime: Date.now() - redisStart }; } catch (error) { health.status = 'unhealthy'; health.services.cache = { status: 'unhealthy', error: error.message }; } // Memory check const memUsage = process.memoryUsage(); health.services.memory = { status: memUsage.heapUsed / memUsage.heapTotal < 0.9 ? 'healthy' : 'warning', heapUsed: memUsage.heapUsed, heapTotal: memUsage.heapTotal, external: memUsage.external }; const statusCode = health.status === 'healthy' ? 200 : 503; res.status(statusCode).json(health); }); module.exports = router; ``` ## Log Management ### Structured Logging Configuration ```javascript // logger.js const winston = require('winston'); const { ElasticsearchTransport } = require('winston-elasticsearch'); const logger = winston.createLogger({ level: process.env.LOG_LEVEL || 'info', format: winston.format.combine( winston.format.timestamp(), winston.format.errors({ stack: true }), winston.format.json() ), defaultMeta: { service: 'deepwikiopen', environment: process.env.NODE_ENV, version: process.env.APP_VERSION }, transports: [ new winston.transports.File({ filename: '/var/log/deepwikiopen/error.log', level: 'error', maxsize: 10485760, // 10MB maxFiles: 5 }), new winston.transports.File({ filename: '/var/log/deepwikiopen/combined.log', maxsize: 10485760, maxFiles: 10 }) ] }); if (process.env.NODE_ENV !== 'production') { logger.add(new winston.transports.Console({ format: winston.format.simple() })); } // Elasticsearch transport for production if (process.env.ELASTICSEARCH_URL) { logger.add(new ElasticsearchTransport({ clientOpts: { node: process.env.ELASTICSEARCH_URL }, index: 'deepwikiopen-logs' })); } module.exports = logger; ``` ### Logrotate Configuration ```bash # /etc/logrotate.d/deepwikiopen /var/log/deepwikiopen/*.log { daily rotate 30 compress delaycompress missingok notifempty create 644 deepwikiopen deepwikiopen postrotate systemctl reload deepwikiopen endscript } ``` ### ELK Stack Setup ```yaml # docker-compose.yml for ELK version: '3.7' services: elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:7.15.0 environment: - discovery.type=single-node - "ES_JAVA_OPTS=-Xms1g -Xmx1g" volumes: - elasticsearch_data:/usr/share/elasticsearch/data ports: - "9200:9200" logstash: image: docker.elastic.co/logstash/logstash:7.15.0 volumes: - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf depends_on: - elasticsearch kibana: image: docker.elastic.co/kibana/kibana:7.15.0 environment: - ELASTICSEARCH_HOSTS=http://elasticsearch:9200 ports: - "5601:5601" depends_on: - elasticsearch volumes: elasticsearch_data: ``` ## Performance Tuning ### Application Performance Optimization ```javascript // performance.js const compression = require('compression'); const responseTime = require('response-time'); const cluster = require('cluster'); const numCPUs = require('os').cpus().length; // Cluster setup for production if (cluster.isMaster && process.env.NODE_ENV === 'production') { console.log(`Master ${process.pid} is running`); // Fork workers for (let i = 0; i < numCPUs; i++) { cluster.fork(); } cluster.on('exit', (worker, code, signal) => { console.log(`Worker ${worker.process.pid} died`); cluster.fork(); }); } else { const express = require('express'); const app = express(); // Performance middleware app.use(compression({ filter: (req, res) => { if (req.headers['x-no-compression']) { return false; } return compression.filter(req, res); }, level: 6, threshold: 1024 })); app.use(responseTime((req, res, time) => { console.log(`${req.method} ${req.url} - ${time}ms`); })); // Connection pooling const { Pool } = require('pg'); const pool = new Pool({ connectionString: process.env.DATABASE_URL, max: 20, idleTimeoutMillis: 30000, connectionTimeoutMillis: 2000, }); // Cache configuration const NodeCache = require('node-cache'); const cache = new NodeCache({ stdTTL: 600, // 10 minutes checkperiod: 120, useClones: false }); // Caching middleware const cacheMiddleware = (duration) => { return (req, res, next) => { const key = req.originalUrl || req.url; const cached = cache.get(key); if (cached) { return res.json(cached); } res.sendResponse = res.json; res.json = (body) => { cache.set(key, body, duration); res.sendResponse(body); }; next(); }; }; app.use('/api/search', cacheMiddleware(300)); // 5 minutes cache console.log(`Worker ${process.pid} started`); } ``` ### Database Query Optimization ```sql -- Create appropriate indexes CREATE INDEX CONCURRENTLY idx_articles_title_search ON articles USING gin(to_tsvector('english', title)); CREATE INDEX CONCURRENTLY idx_articles_content_search ON articles USING gin(to_tsvector('english', content)); CREATE INDEX CONCURRENTLY idx_articles_created_at ON articles(created_at DESC); CREATE INDEX CONCURRENTLY idx_articles_category ON articles(category_id); -- Partitioning for large tables CREATE TABLE articles_2023 PARTITION OF articles FOR VALUES FROM ('2023-01-01') TO ('2024-01-01'); -- Query optimization examples EXPLAIN ANALYZE SELECT * FROM articles WHERE to_tsvector('english', title || ' ' || content) @@ to_tsquery('search_term') ORDER BY created_at DESC LIMIT 20; -- Connection pooling settings ALTER SYSTEM SET max_connections = 200; ALTER SYSTEM SET shared_buffers = '2GB'; ALTER SYSTEM SET effective_cache_size = '6GB'; SELECT pg_reload_conf(); ``` ### Redis Cache Optimization ```javascript // cache.js const redis = require('redis'); const { promisify } = require('util'); class CacheManager { constructor() { this.client = redis.createClient({ url: process.env.REDIS_URL, retry_strategy: (options) => { if (options.error && options.error.code === 'ECONNREFUSED') { return new Error('Redis server connection refused'); } if (options.total_retry_time > 1000 * 60 * 60) { return new Error('Retry time exhausted'); } if (options.attempt > 10) { return undefined; } return Math.min(options.attempt * 100, 3000); } }); this.getAsync = promisify(this.client.get).bind(this.client); this.setAsync = promisify(this.client.set).bind(this.client); this.delAsync = promisify(this.client.del).bind(this.client); } async get(key) { try { const result = await this.getAsync(key); return result ? JSON.parse(result) : null; } catch (error) { console.error('Cache get error:', error); return null; } } async set(key, value, ttl = 3600) { try { await this.setAsync(key, JSON.stringify(value), 'EX', ttl); return true; } catch (error) { console.error('Cache set error:', error); return false; } } async invalidate(pattern) { try { const keys = await this.client.keys(pattern); if (keys.length > 0) { await this.delAsync(keys); } return true; } catch (error) { console.error('Cache invalidate error:', error); return false; } } } module.exports = new CacheManager(); ``` ## Scaling Strategies ### Horizontal Scaling Setup ```yaml # kubernetes deployment apiVersion: apps/v1 kind: Deployment metadata: name: deepwikiopen spec: replicas: 4 selector: matchLabels: app: deepwikiopen template: metadata: labels: app: deepwikiopen spec: containers: - name: deepwikiopen image: deepwikiopen:latest ports: - containerPort: 3000 env: - name: NODE_ENV value: "production" - name: DATABASE_URL valueFrom: secretKeyRef: name: db-secret key: url resources: requests: memory: "512Mi" cpu: "250m" limits: memory: "1Gi" cpu: "500m" livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 3000 initialDelaySeconds: 5 periodSeconds: 5 --- apiVersion: v1 kind: Service metadata: name: deepwikiopen-service spec: selector: app: deepwikiopen ports: - port: 80 targetPort: 3000 type: LoadBalancer --- apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: deepwikiopen-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: deepwikiopen minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 ``` ### Auto-scaling Scripts ```bash #!/bin/bash # auto-scale.sh CURRENT_LOAD=$(uptime | awk -F'load average:' '{ print $2 }' | cut -d, -f1 | tr -d ' ') CPU_CORES=$(nproc) THRESHOLD=$(echo "$CPU_CORES * 0.8" | bc) if (( $(echo "$CURRENT_LOAD > $THRESHOLD" | bc -l) )); then echo "High load detected: $CURRENT_LOAD" # Scale up docker-compose up -d --scale deepwikiopen=6 # Update load balancer sudo systemctl reload nginx echo "Scaled up to 6 instances" elif (( $(echo "$CURRENT_LOAD < $(echo "$THRESHOLD * 0.5" | bc)" | bc -l) )); then echo "Low load detected: $CURRENT_LOAD" # Scale down docker-compose up -d --scale deepwikiopen=2 echo "Scaled down to 2 instances" fi ``` ### Database Read Replica Setup ```sql -- On primary database ALTER SYSTEM SET wal_level = replica; ALTER SYSTEM SET max_wal_senders = 3; ALTER SYSTEM SET max_replication_slots = 3; SELECT pg_reload_conf(); -- Create replication user CREATE USER replicator REPLICATION LOGIN CONNECTION LIMIT 1 ENCRYPTED PASSWORD 'secure_password'; -- On replica server pg_basebackup -h primary-server -D /var/lib/postgresql/12/main -U replicator -v -P -W -- recovery.conf on replica standby_mode = 'on' primary_conninfo = 'host=primary-server port=5432 user=replicator' trigger_file = '/tmp/postgresql.trigger' ``` ## Disaster Recovery ### Backup Strategy ```bash #!/bin/bash # disaster-recovery-backup.sh BACKUP_ROOT="/backup" S3_BUCKET="deepwikiopen-disaster-recovery" DATE=$(date +%Y%m%d_%H%M%S) # Database backup pg_dump -h localhost -U postgres deepwikiopen | gzip > "$BACKUP_ROOT/database_$DATE.sql.gz" # Application files backup tar -czf "$BACKUP_ROOT/application_$DATE.tar.gz" /var/www/deepwikiopen # Configuration backup tar -czf "$BACKUP_ROOT/config_$DATE.tar.gz" /etc/nginx /etc/ssl # Upload to S3 aws s3 sync $BACKUP_ROOT s3://$S3_BUCKET/backups/$DATE/ # Create disaster recovery snapshot aws ec2 create-snapshot --volume-id vol-1234567890abcdef0 --description "DR backup $DATE" # Test backup integrity gunzip -t "$BACKUP_ROOT/database_$DATE.sql.gz" if [ $? -eq 0 ]; then echo "Database backup verified successfully" else echo "Database backup verification failed!" >&2 exit 1 fi # Clean old backups (keep 30 days) find $BACKUP_ROOT -name "*.gz" -mtime +30 -delete aws s3api list-objects-v2 --bucket $S3_BUCKET --query "Contents[?LastModified<='$(date -d '30 days ago' -I)'].Key" --output text | xargs -I {} aws s3 rm s3://$S3_BUCKET/{} ``` ### Recovery Procedures ```bash #!/bin/bash # disaster-recovery-restore.sh BACKUP_DATE=$1 S3_BUCKET="deepwikiopen-disaster-recovery" if [ -z "$BACKUP_DATE" ]; then echo "Usage: $0 YYYYMMDD_HHMMSS" exit 1 fi echo "Starting disaster recovery for backup: $BACKUP_DATE" # Download backups from S3 aws s3 sync s3://$S3_BUCKET/backups/$BACKUP_DATE/ /tmp/restore/ # Stop application sudo systemctl stop deepwikiopen sudo systemctl stop nginx # Restore database sudo -u postgres dropdb deepwikiopen sudo -u postgres createdb deepwikiopen gunzip -c /tmp/restore/database_$BACKUP_DATE.sql.gz | sudo -u postgres psql deepwikiopen # Restore application files tar -xzf /tmp/restore/application_$BACKUP_DATE.tar.gz -C / # Restore configuration tar -xzf /tmp/restore/config_$BACKUP_DATE.tar.gz -C / # Verify restoration sudo -u postgres psql deepwikiopen -c "SELECT COUNT(*) FROM articles;" # Start services sudo systemctl start deepwikiopen sudo systemctl start nginx # Verify application is working curl -f http://localhost/health || { echo "Health check failed after restoration!" exit 1 } echo "Disaster recovery completed successfully" ``` ### RTO/RPO Documentation ```yaml # disaster-recovery-plan.yml recovery_objectives: rpo: 1_hour # Recovery Point Objective rto: 4_hours # Recovery Time Objective backup_schedule: database: full: "0 2 * * 0" # Weekly full backup incremental: "0 */6 * * *" # Every 6 hours application: frequency: "0 1 * * *" # Daily configuration: frequency: "0 3 * * 1" # Weekly recovery_procedures: - verify_backup_integrity - provision_infrastructure - restore_database - restore_application - restore_configuration - verify_functionality - update_dns - notify_stakeholders contacts: primary: "ops-team@company.com" backup: "cto@company.com" escalation: "ceo@company.com" ``` ## CI/CD Pipeline ### GitHub Actions Workflow ```yaml # .github/workflows/production-deploy.yml name: Production Deploy on: push: branches: [main] pull_request: branches: [main] env: NODE_VERSION: '18.x' REGISTRY: ghcr.io IMAGE_NAME: deepwikiopen jobs: test: runs-on: ubuntu-latest services: postgres: image: postgres:13 env: POSTGRES_PASSWORD: postgres POSTGRES_DB: deepwikiopen_test options: >- --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5 ports: - 5432:5432 redis: image: redis:6 options: >- --health-cmd "redis-cli ping" --health-interval 10s --health-timeout 5s --health-retries 5 ports: - 6379:6379 steps: - uses: actions/checkout@v3 - name: Setup Node.js uses: actions/setup-node@v3 with: node-version: ${{ env.NODE_VERSION }} cache: 'npm' - name: Install dependencies run: npm ci - name: Run linting run: npm run lint - name: Run type checking run: npm run typecheck - name: Run tests run: npm test env: DATABASE_URL: postgresql://postgres:postgres@localhost:5432/deepwikiopen_test REDIS_URL: redis://localhost:6379 - name: Run integration tests run: npm run test:integration env: DATABASE_URL: postgresql://postgres:postgres@localhost:5432/deepwikiopen_test REDIS_URL: redis://localhost:6379 security-scan: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Run security audit run: npm audit --audit-level high - name: Run Snyk security scan uses: snyk/actions/node@master env: SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }} build: needs: [test, security-scan] runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Setup Docker Buildx uses: docker/setup-buildx-action@v2 - name: Login to Container Registry uses: docker/login-action@v2 with: registry: ${{ env.REGISTRY }} username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Extract metadata id: meta uses: docker/metadata-action@v4 with: images: ${{ env.REGISTRY }}/${{ github.repository }}/${{ env.IMAGE_NAME }} tags: | type=ref,event=branch type=ref,event=pr type=sha,prefix=sha- - name: Build and push Docker image uses: docker/build-push-action@v4 with: context: . file: ./Dockerfile.production push: true tags: ${{ steps.meta.outputs.tags }} labels: ${{ steps.meta.outputs.labels }} cache-from: type=gha cache-to: type=gha,mode=max deploy-staging: needs: build runs-on: ubuntu-latest if: github.ref == 'refs/heads/main' environment: staging steps: - name: Deploy to staging run: | # Deploy to staging environment echo "Deploying to staging..." # Add your staging deployment commands here deploy-production: needs: [build, deploy-staging] runs-on: ubuntu-latest if: github.ref == 'refs/heads/main' environment: production steps: - name: Deploy to production run: | # Deploy to production environment echo "Deploying to production..." # Add your production deployment commands here - name: Run smoke tests run: | # Run post-deployment smoke tests curl -f https://deepwikiopen.com/health curl -f https://deepwikiopen.com/api/health ``` ### Production Dockerfile ```dockerfile # Dockerfile.production FROM node:18-alpine AS builder WORKDIR /app COPY package*.json ./ RUN npm ci --only=production && npm cache clean --force COPY . . RUN npm run build FROM node:18-alpine AS production RUN addgroup -g 1001 -S nodejs RUN adduser -S deepwikiopen -u 1001 WORKDIR /app COPY --from=builder --chown=deepwikiopen:nodejs /app/dist ./dist COPY --from=builder --chown=deepwikiopen:nodejs /app/node_modules ./node_modules COPY --from=builder --chown=deepwikiopen:nodejs /app/package.json ./package.json USER deepwikiopen EXPOSE 3000 ENV NODE_ENV=production ENV PORT=3000 HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ CMD node healthcheck.js CMD ["node", "dist/server.js"] ``` ## Cloud Deployments ### AWS Deployment #### ECS with Fargate ```yaml # aws-ecs-task-definition.json { "family": "deepwikiopen-production", "networkMode": "awsvpc", "requiresCompatibilities": ["FARGATE"], "cpu": "1024", "memory": "2048", "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole", "taskRoleArn": "arn:aws:iam::123456789012:role/deepwikiopenTaskRole", "containerDefinitions": [ { "name": "deepwikiopen", "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/deepwikiopen:latest", "portMappings": [ { "containerPort": 3000, "protocol": "tcp" } ], "environment": [ { "name": "NODE_ENV", "value": "production" } ], "secrets": [ { "name": "DATABASE_URL", "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:deepwikiopen/database-url" } ], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/deepwikiopen", "awslogs-region": "us-east-1", "awslogs-stream-prefix": "ecs" } }, "healthCheck": { "command": ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"], "interval": 30, "timeout": 5, "retries": 3, "startPeriod": 60 } } ] } ``` #### Terraform Infrastructure ```hcl # aws-infrastructure.tf provider "aws" { region = "us-east-1" } # VPC resource "aws_vpc" "main" { cidr_block = "10.0.0.0/16" enable_dns_hostnames = true enable_dns_support = true tags = { Name = "deepwikiopen-vpc" } } # Internet Gateway resource "aws_internet_gateway" "main" { vpc_id = aws_vpc.main.id tags = { Name = "deepwikiopen-igw" } } # Subnets resource "aws_subnet" "public" { count = 2 vpc_id = aws_vpc.main.id cidr_block = "10.0.${count.index + 1}.0/24" availability_zone = data.aws_availability_zones.available.names[count.index] map_public_ip_on_launch = true tags = { Name = "deepwikiopen-public-${count.index + 1}" } } resource "aws_subnet" "private" { count = 2 vpc_id = aws_vpc.main.id cidr_block = "10.0.${count.index + 10}.0/24" availability_zone = data.aws_availability_zones.available.names[count.index] tags = { Name = "deepwikiopen-private-${count.index + 1}" } } # RDS Database resource "aws_db_instance" "main" { identifier = "deepwikiopen-db" engine = "postgres" engine_version = "13.7" instance_class = "db.t3.medium" allocated_storage = 100 max_allocated_storage = 1000 db_name = "deepwikiopen" username = var.db_username password = var.db_password vpc_security_group_ids = [aws_security_group.rds.id] db_subnet_group_name = aws_db_subnet_group.main.name backup_retention_period = 30 backup_window = "03:00-04:00" maintenance_window = "sun:04:00-sun:05:00" skip_final_snapshot = false final_snapshot_identifier = "deepwikiopen-final-snapshot" tags = { Name = "deepwikiopen-database" } } # ElastiCache Redis resource "aws_elasticache_subnet_group" "main" { name = "deepwikiopen-cache-subnet" subnet_ids = aws_subnet.private[*].id } resource "aws_elasticache_replication_group" "main" { replication_group_id = "deepwikiopen-redis" description = "Redis cluster for DeepWikiOpen" node_type = "cache.t3.micro" port = 6379 parameter_group_name = "default.redis6.x" num_cache_clusters = 2 automatic_failover_enabled = true multi_az_enabled = true subnet_group_name = aws_elasticache_subnet_group.main.name security_group_ids = [aws_security_group.redis.id] at_rest_encryption_enabled = true transit_encryption_enabled = true auth_token = var.redis_auth_token tags = { Name = "deepwikiopen-redis" } } # Application Load Balancer resource "aws_lb" "main" { name = "deepwikiopen-alb" internal = false load_balancer_type = "application" security_groups = [aws_security_group.alb.id] subnets = aws_subnet.public[*].id enable_deletion_protection = true tags = { Name = "deepwikiopen-alb" } } # ECS Cluster resource "aws_ecs_cluster" "main" { name = "deepwikiopen" capacity_providers = ["FARGATE"] default_capacity_provider_strategy { capacity_provider = "FARGATE" weight = 1 } setting { name = "containerInsights" value = "enabled" } tags = { Name = "deepwikiopen-cluster" } } ``` ### Google Cloud Platform Deployment ```yaml # gcp-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: deepwikiopen namespace: production spec: replicas: 3 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 1 selector: matchLabels: app: deepwikiopen template: metadata: labels: app: deepwikiopen spec: containers: - name: deepwikiopen image: gcr.io/your-project/deepwikiopen:latest ports: - containerPort: 3000 env: - name: NODE_ENV value: "production" - name: DATABASE_URL valueFrom: secretKeyRef: name: app-secrets key: database-url - name: REDIS_URL valueFrom: secretKeyRef: name: app-secrets key: redis-url resources: requests: memory: "512Mi" cpu: "250m" limits: memory: "1Gi" cpu: "500m" livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 3000 initialDelaySeconds: 5 periodSeconds: 5 --- apiVersion: v1 kind: Service metadata: name: deepwikiopen-service namespace: production spec: selector: app: deepwikiopen ports: - port: 80 targetPort: 3000 type: LoadBalancer --- apiVersion: networking.gke.io/v1 kind: ManagedCertificate metadata: name: deepwikiopen-ssl-cert namespace: production spec: domains: - deepwikiopen.com - www.deepwikiopen.com --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: deepwikiopen-ingress namespace: production annotations: kubernetes.io/ingress.global-static-ip-name: deepwikiopen-ip networking.gke.io/managed-certificates: deepwikiopen-ssl-cert kubernetes.io/ingress.class: "gce" spec: rules: - host: deepwikiopen.com http: paths: - path: /* pathType: ImplementationSpecific backend: service: name: deepwikiopen-service port: number: 80 ``` ### Azure Deployment ```yaml # azure-container-instances.yaml apiVersion: 2019-12-01 location: eastus name: deepwikiopen-container-group properties: containers: - name: deepwikiopen properties: image: your-registry.azurecr.io/deepwikiopen:latest resources: requests: cpu: 2 memoryInGB: 4 ports: - port: 3000 protocol: TCP environmentVariables: - name: NODE_ENV value: production - name: DATABASE_URL secureValue: your-database-connection-string - name: REDIS_URL secureValue: your-redis-connection-string osType: Linux restartPolicy: Always ipAddress: type: Public ports: - protocol: tcp port: 3000 dnsNameLabel: deepwikiopen sku: Standard tags: environment: production application: deepwikiopen ``` ## On-Premise Setup ### Server Requirements ```bash # System requirements CPU: 8+ cores RAM: 16GB+ Storage: 500GB+ SSD Network: 1Gbps+ OS: Ubuntu 20.04 LTS or CentOS 8 # Install prerequisites sudo apt update sudo apt install -y nodejs npm nginx postgresql redis-server # Install Docker curl -fsSL https://get.docker.com -o get-docker.sh sh get-docker.sh sudo usermod -aG docker $USER # Install Docker Compose sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose sudo chmod +x /usr/local/bin/docker-compose ``` ### Docker Compose Production Setup ```yaml # docker-compose.production.yml version: '3.8' services: app: build: context: . dockerfile: Dockerfile.production deploy: replicas: 4 resources: limits: cpus: '1.0' memory: 1G reservations: cpus: '0.5' memory: 512M environment: - NODE_ENV=production - DATABASE_URL=postgresql://postgres:${POSTGRES_PASSWORD}@db:5432/deepwikiopen - REDIS_URL=redis://redis:6379 depends_on: - db - redis networks: - deepwikiopen-network nginx: image: nginx:alpine ports: - "80:80" - "443:443" volumes: - ./nginx.conf:/etc/nginx/nginx.conf:ro - ./ssl:/etc/nginx/ssl:ro depends_on: - app networks: - deepwikiopen-network db: image: postgres:13 environment: - POSTGRES_DB=deepwikiopen - POSTGRES_USER=postgres - POSTGRES_PASSWORD=${POSTGRES_PASSWORD} volumes: - postgres_data:/var/lib/postgresql/data - ./backups:/backups networks: - deepwikiopen-network deploy: resources: limits: cpus: '2.0' memory: 4G redis: image: redis:6-alpine command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD} volumes: - redis_data:/data networks: - deepwikiopen-network prometheus: image: prom/prometheus ports: - "9090:9090" volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml - prometheus_data:/prometheus networks: - deepwikiopen-network grafana: image: grafana/grafana ports: - "3000:3000" environment: - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD} volumes: - grafana_data:/var/lib/grafana networks: - deepwikiopen-network volumes: postgres_data: redis_data: prometheus_data: grafana_data: networks: deepwikiopen-network: driver: bridge ``` ### Production Environment Configuration ```bash # .env.production NODE_ENV=production PORT=3000 # Database DATABASE_URL=postgresql://postgres:secure_password@localhost:5432/deepwikiopen DB_POOL_SIZE=25 # Redis REDIS_URL=redis://localhost:6379 REDIS_PASSWORD=secure_redis_password # Security JWT_SECRET=your-super-secure-jwt-secret-minimum-32-characters SESSION_SECRET=your-session-secret-minimum-32-characters ENCRYPTION_KEY=your-32-character-encryption-key # Logging LOG_LEVEL=info LOG_FILE_PATH=/var/log/deepwikiopen/app.log # Performance CLUSTER_ENABLED=true CACHE_TTL=3600 COMPRESSION_ENABLED=true # Monitoring PROMETHEUS_ENABLED=true HEALTH_CHECK_INTERVAL=30 # Email (for notifications) SMTP_HOST=smtp.example.com SMTP_PORT=587 SMTP_USER=notifications@deepwikiopen.com SMTP_PASS=smtp_password # External Services CDN_URL=https://cdn.deepwikiopen.com ANALYTICS_ID=your-analytics-id ``` This comprehensive production setup guide covers all aspects of deploying DeepWikiOpen in a production environment, from basic architecture to advanced scaling and disaster recovery strategies. Choose the deployment method that best fits your infrastructure requirements and scale accordingly. # Wiki Generation Guide Source: https://asyncfunc.mintlify.app/guides/wiki-generation Complete guide to generating comprehensive documentation wikis with AI-powered analysis and visualization DeepWiki-Open's core feature is intelligent wiki generation that transforms any repository into comprehensive, navigable documentation. This guide covers all aspects of wiki generation, customization, and optimization. ## How Wiki Generation Works DeepWiki clones and analyzes the repository structure, identifying: * File types and programming languages * Directory organization and architecture patterns * Dependencies and configuration files * Documentation and README files * Test structures and examples Repository successfully analyzed and indexed Creates vector embeddings of code content for intelligent retrieval: * Function and class definitions * Comments and documentation * Configuration settings * API endpoints and interfaces * Database schemas and models Embeddings enable semantic search and context-aware documentation generation. Uses your selected AI model to generate: * Project overview and purpose * Installation and setup instructions * Architecture explanations * Component relationships * Usage examples and best practices Different AI models produce varying documentation styles. Experiment to find your preference. Automatically creates Mermaid diagrams showing: * System architecture * Data flow and processing * Component relationships * Database schemas * API endpoint structures Interactive diagrams generated and embedded in documentation ## Generation Options ### Model Selection Choose the best AI model for your documentation needs: **Best for:** Fast, reliable documentation generation * **`gemini-2.0-flash`** - Latest model with excellent speed/quality balance * **`gemini-1.5-flash`** - Previous generation, very stable * **`gemini-1.0-pro`** - More detailed but slower **Strengths:** * Excellent code understanding * Fast generation times * Good diagram creation * Strong multilingual support **Use when:** You want reliable, fast documentation for most projects **Best for:** Detailed, high-quality documentation * **`gpt-4o`** - Latest GPT-4 with multimodal capabilities * **`o1`** - Reasoning-focused model for complex analysis * **`o3`** - Advanced reasoning and analysis * **`o4-mini`** - Lightweight but capable option **Strengths:** * Exceptional writing quality * Deep technical analysis * Excellent code explanations * Strong architectural insights **Use when:** You need detailed, publication-quality documentation **Best for:** Model comparison and specialized needs * **`anthropic/claude-3.5-sonnet`** - Excellent code analysis * **`deepseek/deepseek-r1`** - Strong reasoning capabilities * **`openai/gpt-4o`** - GPT-4 through OpenRouter * **100+ other models** available **Strengths:** * Access to multiple providers * Easy model comparison * Specialized models for specific tasks * Cost optimization options **Use when:** You want to experiment with different models or need specific capabilities **Best for:** Privacy, cost control, offline generation * **`llama3:8b`** - Good balance of quality and speed * **`qwen3:8b`** - Excellent for code understanding * **`qwen3:1.7b`** - Lightweight for smaller projects **Strengths:** * Complete data privacy * No API costs * Offline capability * Customizable models **Use when:** Privacy is critical or you want to avoid API costs ### Generation Parameters **When to use:** * Repository has been significantly updated * You want to try a different AI model * Previous generation had errors * You want fresh documentation with latest model improvements ```javascript { "force_regenerate": true, "reason": "Updated to use new architecture patterns" } ``` Force regeneration will overwrite existing cached documentation. Consider backing up important custom modifications. **Required for:** * Private GitHub repositories * Private GitLab repositories * Private BitBucket repositories * Organizations with restricted access **Token permissions needed:** * GitHub: `repo` scope (full repository access) * GitLab: `read_repository` scope * BitBucket: `Repositories: Read` permission ```javascript { "repo_url": "https://github.com/company/private-repo", "access_token": "ghp_xxxxxxxxxxxxxxxxxxxx" } ``` Tokens are used only for repository access and are not stored permanently. Advanced users can adjust model parameters: ```javascript { "model_config": { "temperature": 0.7, // Creativity vs consistency (0.0-1.0) "top_p": 0.8, // Response diversity (0.0-1.0) "max_tokens": 4000, // Maximum response length "top_k": 20 // Token selection diversity (Gemini only) } } ``` **Parameter effects:** * **Lower temperature** (0.1-0.3): More consistent, factual documentation * **Higher temperature** (0.7-0.9): More creative, varied explanations * **Lower top\_p** (0.3-0.5): More focused responses * **Higher top\_p** (0.8-1.0): More diverse vocabulary and examples ## Repository Types & Optimization ### Programming Languages DeepWiki optimizes documentation generation for different languages: **JavaScript/TypeScript:** * React, Vue, Angular component analysis * Node.js server architecture * API endpoint documentation * Package.json and dependency analysis **Python:** * Django/Flask application structure * FastAPI endpoint documentation * Class and function analysis * Requirements and virtual environment setup **Examples:** * Express.js servers → API endpoint documentation * React apps → Component hierarchy and props * Django projects → Model, view, template analysis **React Native:** * Screen navigation flow * Component reusability analysis * Platform-specific implementations * Build and deployment processes **Flutter:** * Widget tree analysis * State management patterns * Platform channel documentation * Build configuration explanation **iOS/Android:** * Native architecture patterns * Framework usage analysis * Build system documentation * Platform-specific features **Go:** * Package structure analysis * HTTP handler documentation * Goroutine usage patterns * Build and deployment guides **Java/Kotlin:** * Spring Boot application analysis * Maven/Gradle build documentation * Package structure explanation * Configuration analysis **C#/.NET:** * Solution and project analysis * ASP.NET Core documentation * NuGet package analysis * Configuration and appsettings **Python (Data Science):** * Jupyter notebook analysis * Data pipeline documentation * Model architecture explanation * Requirements and environment setup **R:** * Package and function analysis * Statistical analysis documentation * Data workflow explanation * Visualization code explanation **SQL:** * Database schema analysis * Query optimization documentation * Stored procedure explanation * Migration and setup guides ### Repository Size Optimization **Characteristics:** * Fast generation (30 seconds - 2 minutes) * Comprehensive coverage of all files * Detailed analysis of each component **Optimization tips:** * Use any model (all will perform well) * Enable detailed analysis * Include all file types * Generate comprehensive diagrams **Example repositories:** * Personal projects * Small libraries * Configuration repositories * Simple applications **Characteristics:** * Moderate generation time (2-10 minutes) * Focus on important files and patterns * Good balance of detail and overview **Optimization tips:** * Use fast models like `gemini-2.0-flash` * Focus on core directories * Skip generated/compiled files * Prioritize documented code **Example repositories:** * Open source libraries * Medium-sized applications * Framework implementations * Multi-component projects **Characteristics:** * Longer generation time (10-30 minutes) * High-level architecture focus * Selective detailed analysis * Emphasis on main components **Optimization tips:** * Use efficient models (Gemini Flash series) * Configure file filters * Focus on main source directories * Skip test files for initial generation * Use incremental regeneration **Example repositories:** * Large frameworks (React, Vue, Angular) * Enterprise applications * Monorepos with multiple projects * Complex distributed systems ## Customizing Generated Documentation ### Content Customization DeepWiki automatically adapts to repository types, but you can customize the focus: ```javascript { "focus_areas": [ "architecture_patterns", "api_documentation", "deployment_setup", "security_implementation" ], "exclude_areas": [ "test_files", "generated_code", "vendor_dependencies" ] } ``` Control the level of detail in generated documentation: * **High Detail**: Complete analysis of all components * **Medium Detail**: Focus on public APIs and main components * **Overview**: High-level architecture and key features only ```javascript { "detail_level": "medium", "include_private_methods": false, "focus_on_public_api": true } ``` Specify which types of diagrams to generate: * **Architecture diagrams**: System components and relationships * **Data flow diagrams**: Information processing flow * **Database diagrams**: Schema and relationships * **API diagrams**: Endpoint structure and data flow * **Process diagrams**: Workflow and business logic ```javascript { "diagram_types": [ "architecture", "data_flow", "api_structure" ] } ``` ### Output Format Options **Format:** Hierarchical pages with cross-references **Best for:** * General documentation browsing * Team onboarding * Project understanding * Code exploration **Features:** * Navigation tree * Search functionality * Cross-page linking * Embedded diagrams **Format:** OpenAPI-style documentation **Best for:** * REST API projects * GraphQL APIs * Library documentation * Integration guides **Features:** * Endpoint documentation * Request/response examples * Authentication details * Error code explanations **Format:** System design documentation **Best for:** * Complex applications * Microservice architectures * Infrastructure projects * Technical decision records **Features:** * System diagrams * Component interactions * Data flow analysis * Technology stack overview ## Quality Optimization ### Improving Documentation Quality **Before generation, optimize your repository:** 1. **Update README.md** with current project information 2. **Add code comments** for complex logic 3. **Update package.json/requirements.txt** with current dependencies 4. **Add configuration examples** (.env.example, config samples) 5. **Include API documentation** (OpenAPI specs, GraphQL schemas) **Impact:** 40-60% improvement in documentation accuracy and completeness **Match models to repository characteristics:** * **Simple projects**: Use Gemini Flash for speed * **Complex architectures**: Use GPT-4o for depth * **API-heavy projects**: Use Claude 3.5 Sonnet via OpenRouter * **Data projects**: Use models with strong analytical capabilities **A/B Testing approach:** 1. Generate with fast model first (Gemini Flash) 2. If quality insufficient, regenerate with premium model (GPT-4o) 3. Compare results and choose best approach for similar projects **Use the Ask feature to improve documentation:** 1. Generate initial wiki 2. Ask specific questions about unclear sections 3. Use Deep Research for complex topics 4. Incorporate answers into understanding 5. Regenerate sections with better context **Example workflow:** ``` 1. Generate wiki → Review architecture section 2. Ask: "Explain the database connection pooling implementation" 3. Get detailed answer → Understand missing context 4. Regenerate with better understanding ``` ### Troubleshooting Generation Issues **Symptoms:** * Missing pages or sections * Truncated content * Error messages during generation **Solutions:** 1. **Check API limits**: Verify your AI provider has sufficient quota 2. **Reduce scope**: Start with smaller directories or file sets 3. **Try different model**: Some models handle large contexts better 4. **Check logs**: Look for specific error messages in API logs ```bash # Check generation logs tail -f ./api/logs/application.log # Look for specific errors like: # - "Token limit exceeded" # - "Repository access denied" # - "Model timeout" ``` **Symptoms:** * Generic or inaccurate descriptions * Missing technical details * Incorrect architecture analysis **Solutions:** 1. **Improve repository documentation**: Add README, comments, examples 2. **Use higher-quality models**: Switch from Flash to GPT-4o 3. **Enable Deep Research**: For complex analysis tasks 4. **Provide more context**: Add configuration files, API specs **Before/After example:** ``` Before: "This is a web application built with JavaScript" After: "Express.js REST API with MongoDB integration, featuring JWT authentication, rate limiting, and comprehensive error handling middleware" ``` **Symptoms:** * Process hangs or takes extremely long * Browser timeout errors * Partial results only **Solutions:** 1. **Break into smaller chunks**: Process subdirectories separately 2. **Use faster models**: Gemini Flash series for speed 3. **Increase timeout limits**: In API configuration 4. **Optimize repository**: Remove large binary files, generated code ```javascript // Increase timeout in API calls { "timeout": 600000, // 10 minutes "chunk_size": "small", "skip_large_files": true } ``` ## Advanced Features ### Multi-Language Support DeepWiki automatically detects and optimizes for repository languages: **Automatic detection of:** * Primary language (most files) * Secondary languages * Framework combinations * Build system integration **Smart documentation:** * Language-specific setup instructions * Cross-language integration points * Build pipeline explanation * Dependency management per language **Generated documentation adapts to:** * Code comment language * README language detection * Variable naming conventions * Documentation style preferences **Supported languages:** * English (default) * Chinese (Simplified/Traditional) * Japanese * Spanish * Korean * And more via AI model capabilities ### Integration Workflows Automate wiki generation in your development pipeline: ```yaml # GitHub Actions example name: Generate Documentation on: push: branches: [main] pull_request: branches: [main] jobs: docs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Generate Wiki run: | curl -X POST "${{ secrets.DEEPWIKI_URL }}/wiki/generate" \ -H "Content-Type: application/json" \ -d '{ "repo_url": "${{ github.server_url }}/${{ github.repository }}", "model_provider": "google", "force_regenerate": true }' ``` Automatically update documentation on repository changes: ```javascript // Webhook handler example app.post('/webhook/repository', (req, res) => { const { repository, commits } = req.body; // Check if significant changes occurred const significantFiles = commits.some(commit => commit.modified.some(file => file.includes('src/') || file.includes('README') || file.includes('package.json') ) ); if (significantFiles) { // Trigger documentation regeneration generateWiki({ repo_url: repository.html_url, force_regenerate: true }); } }); ``` ## Next Steps Learn to use the interactive Q\&A system with your generated wikis Conduct multi-turn AI research for complex code analysis Understand and customize the generated visual diagrams Deploy DeepWiki for team and enterprise use # Architecture Overview Source: https://asyncfunc.mintlify.app/reference/architecture Deep dive into DeepWiki's system architecture, components, and design decisions # Architecture Overview DeepWiki is a modern AI-powered documentation generation system built with a scalable microservices architecture. This document provides a comprehensive overview of the system's architecture, components, and design decisions. ## High-Level System Architecture
```mermaid graph TB %% Client Layer subgraph "Client Layer" Browser[Web Browser] CLI[CLI Tools] end %% Application Layer subgraph "Frontend (Next.js)" UI[React Components] SSR[Server-Side Rendering] API_Routes[API Routes] WS_Client[WebSocket Client] end %% API Gateway subgraph "API Layer" FastAPI[FastAPI Server] WS_Handler[WebSocket Handler] REST[REST Endpoints] end %% Core Services subgraph "Core Services" RAG[RAG Pipeline] Embedder[Embedding Service] Generator[Content Generator] Cache[Wiki Cache Manager] end %% AI Providers subgraph "AI Providers" OpenAI[OpenAI API] Google[Google Gemini] Bedrock[AWS Bedrock] Azure[Azure AI] Ollama[Ollama Local] end %% Storage Layer subgraph "Storage" FileCache[File System Cache] VectorDB[FAISS Vector DB] Memory[Conversation Memory] end %% External Services subgraph "External" GitHub[GitHub API] GitLab[GitLab API] end %% Connections Browser --> UI CLI --> REST UI --> SSR UI --> API_Routes UI --> WS_Client SSR --> FastAPI API_Routes --> FastAPI WS_Client --> WS_Handler FastAPI --> RAG WS_Handler --> RAG FastAPI --> Cache RAG --> Embedder RAG --> Generator RAG --> Memory Embedder --> VectorDB Generator --> OpenAI Generator --> Google Generator --> Bedrock Generator --> Azure Generator --> Ollama Cache --> FileCache FastAPI --> GitHub FastAPI --> GitLab ```
## Frontend Architecture The frontend is built with Next.js 14, leveraging modern React patterns and TypeScript for type safety. ### Component Structure
```mermaid graph TB subgraph "Page Components" HomePage[Home Page] WikiPage[Wiki Page] ProjectsPage[Projects Page] SlidesPage[Slides Page] WorkshopPage[Workshop Page] end subgraph "Feature Components" Ask[Ask Component] WikiTree[Wiki Tree View] ProcessedProjects[Projects List] ModelSelection[Model Selector] ConfigModal[Configuration] end subgraph "UI Components" Markdown[Markdown Renderer] Mermaid[Mermaid Diagrams] ThemeToggle[Theme Toggle] TokenInput[Token Input] UserSelector[User Selector] end subgraph "Context Providers" LanguageCtx[Language Context] ThemeProvider[Theme Provider] end subgraph "Utilities" WebSocketClient[WebSocket Client] URLDecoder[URL Decoder] RepoUtils[Repository Utils] end HomePage --> Ask HomePage --> ModelSelection WikiPage --> WikiTree WikiPage --> Ask ProjectsPage --> ProcessedProjects Ask --> WebSocketClient Ask --> Markdown Markdown --> Mermaid HomePage --> LanguageCtx WikiPage --> LanguageCtx ```
### Key Frontend Features 1. **Server-Side Rendering (SSR)** * Pre-renders pages for better SEO and initial load performance * Dynamic route generation for repository pages 2. **Real-time Communication** * WebSocket connection for streaming chat responses * Progressive content rendering 3. **Internationalization (i18n)** * Support for 10+ languages * Context-based language switching 4. **Responsive Design** * Mobile-first approach * Adaptive layouts for different screen sizes ## Backend Architecture The backend is built with FastAPI, providing high-performance async capabilities and automatic API documentation. ### API Structure
```mermaid graph LR subgraph "API Endpoints" Auth["/auth/*"] Chat["/chat/*"] Wiki["/api/wiki_cache"] Models["/models/config"] Export["/export/wiki"] Health["/health"] end subgraph "Core Modules" Main[main.py] API[api.py] RAG_Module[rag.py] Pipeline[data_pipeline.py] end subgraph "Client Adapters" OpenAI_Client[openai_client.py] Google_Client[gemini_client.py] Bedrock_Client[bedrock_client.py] Azure_Client[azureai_client.py] Ollama_Client[ollama_patch.py] end API --> Auth API --> Chat API --> Wiki API --> Models API --> Export Chat --> RAG_Module Wiki --> Pipeline RAG_Module --> OpenAI_Client RAG_Module --> Google_Client RAG_Module --> Bedrock_Client ```
### RAG Pipeline Architecture The Retrieval-Augmented Generation (RAG) pipeline is the core of DeepWiki's intelligence.
```mermaid graph TB subgraph "RAG Pipeline" Input[User Query] subgraph "Retrieval Phase" Embedder[Text Embedder] VectorSearch[FAISS Vector Search] ContextRetrieval[Context Retrieval] end subgraph "Augmentation Phase" ContextProcessor[Context Processor] MemoryIntegration[Memory Integration] PromptBuilder[Prompt Builder] end subgraph "Generation Phase" ModelSelector[Model Selector] Generator[Content Generator] StreamProcessor[Stream Processor] end Output[Generated Response] end Input --> Embedder Embedder --> VectorSearch VectorSearch --> ContextRetrieval ContextRetrieval --> ContextProcessor ContextProcessor --> MemoryIntegration MemoryIntegration --> PromptBuilder PromptBuilder --> ModelSelector ModelSelector --> Generator Generator --> StreamProcessor StreamProcessor --> Output ```
## Data Flow ### Wiki Generation Flow
```mermaid sequenceDiagram participant User participant Frontend participant API participant Cache participant RAG participant VCS participant AI User->>Frontend: Request Wiki for Repo Frontend->>API: Check Wiki Cache API->>Cache: Query Cache alt Cache Hit Cache-->>API: Return Cached Wiki API-->>Frontend: Send Wiki Data else Cache Miss API->>VCS: Fetch Repository VCS-->>API: Repository Data API->>RAG: Process Repository RAG->>AI: Generate Wiki Structure AI-->>RAG: Wiki Structure loop For Each Page RAG->>AI: Generate Page Content AI-->>RAG: Page Content end RAG-->>API: Complete Wiki API->>Cache: Store Wiki API-->>Frontend: Send Wiki Data end Frontend-->>User: Display Wiki ```
### Real-time Chat Flow
```mermaid sequenceDiagram participant User participant Frontend participant WebSocket participant RAG participant VectorDB participant AI User->>Frontend: Ask Question Frontend->>WebSocket: Open Connection WebSocket->>RAG: Process Query RAG->>VectorDB: Search Context VectorDB-->>RAG: Relevant Documents RAG->>AI: Generate Response AI-->>RAG: Stream Tokens loop Streaming RAG-->>WebSocket: Send Token WebSocket-->>Frontend: Update UI Frontend-->>User: Display Token end WebSocket->>WebSocket: Close Connection ```
## Component Interactions ### Core Service Dependencies
```mermaid graph TB subgraph "Service Layer" API[API Service] RAG[RAG Service] Cache[Cache Service] Embed[Embedding Service] end subgraph "Data Layer" FileSystem[File System] VectorStore[Vector Store] Memory[Memory Store] end subgraph "External" AIProviders[AI Providers] VCSProviders[VCS Providers] end API --> RAG API --> Cache RAG --> Embed Cache --> FileSystem Embed --> VectorStore RAG --> Memory RAG --> AIProviders API --> VCSProviders style API fill:#f9f,stroke:#333,stroke-width:4px style RAG fill:#bbf,stroke:#333,stroke-width:4px ```
## Database and Storage Design ### Storage Architecture
```mermaid graph TB subgraph "Storage Types" subgraph "File System Storage" WikiCache[Wiki Cache
~/.adalflow/wikicache/] Embeddings[Embeddings
~/.adalflow/embeddings/] Logs[Application Logs
./api/logs/] end subgraph "Vector Storage" FAISS[FAISS Index] EmbeddingVectors[Embedding Vectors] end subgraph "Memory Storage" ConversationHistory[Conversation History] DialogTurns[Dialog Turns] end end subgraph "Data Models" WikiStructure[Wiki Structure Model] WikiPage[Wiki Page Model] ProcessedProject[Processed Project Model] CacheMetadata[Cache Metadata] end WikiStructure --> WikiCache WikiPage --> WikiCache ProcessedProject --> WikiCache CacheMetadata --> WikiCache EmbeddingVectors --> FAISS ConversationHistory --> DialogTurns ```
### Cache Key Structure ``` deepwiki_cache_{repo_type}_{owner}_{repo}_{language}.json Example: deepwiki_cache_github_microsoft_vscode_en.json ``` ## Caching Strategies ### Multi-Level Caching 1. **Wiki Cache (Persistent)** * Full wiki structures stored as JSON * Language-specific caching * File-based for durability 2. **Vector Cache (In-Memory)** * FAISS indices for fast retrieval * Rebuilt on startup from embeddings 3. **Conversation Cache (Session)** * Dialog history per session * Memory-efficient circular buffer ### Cache Invalidation
```mermaid flowchart LR subgraph "Invalidation Triggers" RepoUpdate[Repository Update] ModelChange[Model Change] UserRequest[User Request] TTL[Time-To-Live] end subgraph "Cache Actions" Invalidate[Invalidate Entry] Regenerate[Regenerate Content] Update[Update Metadata] end RepoUpdate --> Invalidate ModelChange --> Invalidate UserRequest --> Invalidate TTL --> Invalidate Invalidate --> Regenerate Regenerate --> Update ```
## Security Architecture ### Authentication & Authorization
```mermaid graph TB subgraph "Auth Flow" Client[Client Request] AuthCheck[Auth Check] CodeValidation[Code Validation] TokenValidation[Token Validation] Access[Grant Access] end subgraph "Security Layers" CORS[CORS Middleware] RateLimit[Rate Limiting] InputValidation[Input Validation] APIKeys[API Key Management] end Client --> CORS CORS --> AuthCheck AuthCheck --> CodeValidation AuthCheck --> TokenValidation CodeValidation --> Access TokenValidation --> Access CORS --> RateLimit RateLimit --> InputValidation InputValidation --> APIKeys ```
### Security Features 1. **API Security** * CORS configuration for cross-origin requests * Environment-based API key management * Input validation and sanitization 2. **Data Protection** * Token-based repository access * Secure storage of credentials * Encrypted communication channels 3. **Rate Limiting** * Request throttling per IP * WebSocket connection limits * Resource usage monitoring ## Scalability Considerations ### Horizontal Scaling
```mermaid graph TB subgraph "Load Balancer" LB[Nginx/HAProxy] end subgraph "Application Instances" App1[DeepWiki Instance 1] App2[DeepWiki Instance 2] App3[DeepWiki Instance 3] end subgraph "Shared Storage" SharedCache[Shared File System] SharedVector[Distributed Vector DB] end LB --> App1 LB --> App2 LB --> App3 App1 --> SharedCache App2 --> SharedCache App3 --> SharedCache App1 --> SharedVector App2 --> SharedVector App3 --> SharedVector ```
### Performance Optimizations 1. **Async Processing** * Non-blocking I/O operations * Concurrent request handling * Stream processing for large responses 2. **Resource Management** * Memory limits (6GB max, 2GB reserved) * Connection pooling * Garbage collection optimization 3. **Caching Strategy** * Pre-computed wiki structures * Embedding reuse * Response streaming ## Centralized Prompt Management As of commit 36777d3, DeepWiki implements a centralized prompt management system for better maintainability and consistency across the application. ### Prompt Architecture
```mermaid graph TB subgraph "Prompt Management System" PromptModule[prompts.py] subgraph "System Prompts" RAGSystem[RAG_SYSTEM_PROMPT] ResearchFirst[DEEP_RESEARCH_FIRST_ITERATION_PROMPT] ResearchInter[DEEP_RESEARCH_INTERMEDIATE_ITERATION_PROMPT] ResearchFinal[DEEP_RESEARCH_FINAL_ITERATION_PROMPT] SimpleChat[SIMPLE_CHAT_SYSTEM_PROMPT] end subgraph "Templates" RAGTemplate[RAG_TEMPLATE] end subgraph "Consumers" RAGModule[rag.py] SimpleChatModule[simple_chat.py] end end PromptModule --> RAGSystem PromptModule --> ResearchFirst PromptModule --> ResearchInter PromptModule --> ResearchFinal PromptModule --> SimpleChat PromptModule --> RAGTemplate RAGModule --> PromptModule SimpleChatModule --> PromptModule ```
### Prompt Types and Usage #### 1. RAG System Prompt * **Purpose**: Core prompt for RAG-based question answering * **Features**: * Language detection and response matching * Markdown formatting guidelines * Context integration with past conversations * Code syntax highlighting support #### 2. Deep Research Prompts Three-stage research prompt system for comprehensive analysis: * **First Iteration**: Research planning and initial investigation * **Intermediate Iterations**: Building on previous findings * **Final Iteration**: Synthesis and comprehensive conclusion #### 3. Simple Chat System Prompt * **Purpose**: Direct, concise responses without preambles * **Features**: * No markdown headers in responses * Direct answers without filler * Optimized for quick interactions ### Benefits of Centralized Management 1. **Consistency**: All prompts follow the same standards and formatting 2. **Maintainability**: Single source of truth for all prompt templates 3. **Versioning**: Easy to track prompt changes and iterations 4. **Testability**: Prompts can be unit tested and validated 5. **Flexibility**: Easy to add new prompts or modify existing ones ### Implementation Example ```python # api/prompts.py """Module containing all prompts used in the DeepWiki project.""" # Centralized prompt definitions RAG_SYSTEM_PROMPT = r""" You are a code assistant which answers user questions on a Github Repo. You will receive user query, relevant context, and past conversation history. LANGUAGE DETECTION AND RESPONSE: - Detect the language of the user's query - Respond in the SAME language as the user's query ... """ # Usage in rag.py from api.prompts import RAG_SYSTEM_PROMPT, RAG_TEMPLATE def create_rag_prompt(query, contexts, history): return RAG_TEMPLATE.format( system_prompt=RAG_SYSTEM_PROMPT, input_str=query, contexts=contexts, conversation_history=history ) ``` ## Technology Stack ### Frontend Technologies | Category | Technology | Purpose | | ---------- | ---------------- | ------------------------ | | Framework | Next.js 14 | React framework with SSR | | Language | TypeScript | Type safety | | Styling | Tailwind CSS | Utility-first CSS | | UI Library | Radix UI | Accessible components | | State | React Context | Global state management | | HTTP | Fetch API | REST communication | | WebSocket | Native WebSocket | Real-time communication | | Markdown | react-markdown | Content rendering | | Diagrams | Mermaid | Flowchart rendering | ### Backend Technologies | Category | Technology | Purpose | | ------------ | -------------- | -------------------- | | Framework | FastAPI | High-performance API | | Language | Python 3.11+ | Backend development | | Server | Uvicorn | ASGI server | | Vector DB | FAISS | Similarity search | | ML Framework | AdalFlow | RAG implementation | | Logging | Python logging | Application logs | | Environment | python-dotenv | Configuration | ### AI/ML Stack | Category | Technology | Purpose | | ------------- | ------------- | ---------------------- | | Embeddings | OpenAI/Gemini | Text embeddings | | Generation | Multiple LLMs | Content generation | | RAG | AdalFlow | Retrieval augmentation | | Vector Search | FAISS | Semantic search | ### Infrastructure | Category | Technology | Purpose | | ------------- | -------------- | --------------------- | | Container | Docker | Application packaging | | Orchestration | Docker Compose | Service management | | Storage | File System | Persistent storage | | Process Mgmt | Supervisor | Process control | ## Deployment Architecture
```mermaid graph TB subgraph "Production Environment" subgraph "Container" Supervisor[Supervisor] NextJS[Next.js Server :3000] FastAPI[FastAPI Server :8001] end subgraph "Volumes" AppData[~/.adalflow/] Logs[./api/logs/] end subgraph "Network" Port3000[Port 3000] Port8001[Port 8001] end end Supervisor --> NextJS Supervisor --> FastAPI NextJS --> AppData FastAPI --> AppData FastAPI --> Logs NextJS --> Port3000 FastAPI --> Port8001 Port3000 --> |External Access| Users Port8001 --> |API Access| Users ```
## Best Practices & Design Principles ### Architecture Principles 1. **Separation of Concerns** * Clear boundaries between frontend and backend * Modular service design * Independent scaling capabilities 2. **Resilience** * Graceful error handling * Fallback mechanisms * Health monitoring 3. **Performance** * Async-first design * Efficient caching * Resource optimization 4. **Maintainability** * Clean code architecture * Comprehensive logging * Type safety throughout ### Development Guidelines 1. **Code Organization** * Feature-based module structure * Consistent naming conventions * Separation of business logic 2. **Testing Strategy** * Unit tests for core logic * Integration tests for APIs * End-to-end testing for critical flows 3. **Documentation** * Inline code documentation * API documentation (OpenAPI) * Architecture decision records ## Future Enhancements ### Planned Improvements 1. **Scalability** * Kubernetes deployment support * Distributed caching with Redis * Message queue integration 2. **Features** * Real-time collaboration * Advanced analytics * Plugin system 3. **Performance** * GraphQL API option * Edge caching * Optimized vector search ### Architecture Evolution The architecture is designed to evolve with: * Microservices decomposition * Event-driven architecture * Serverless function support * Multi-region deployment ## Conclusion DeepWiki's architecture balances performance, scalability, and maintainability while providing a robust platform for AI-powered documentation generation. The modular design allows for easy extension and adaptation to changing requirements. # Configuration Files Source: https://asyncfunc.mintlify.app/reference/configuration-files Complete reference for all DeepWikiOpen configuration files and options # Configuration Files Reference DeepWikiOpen uses multiple configuration files to control various aspects of documentation generation, embedding, and deployment. This guide covers all configuration options with complete examples and best practices. ## Configuration Overview DeepWikiOpen configuration follows a hierarchical structure: ``` project-root/ ├── generator.json # Main generation config ├── embedder.json # Embedding settings ├── repo.json # Repository filters ├── .env # Environment variables ├── docker-compose.yml # Docker configuration └── config/ # Custom config directory ├── overrides.json └── templates/ ``` ## 1. generator.json Structure The main configuration file controlling documentation generation and AI model selection. ### Complete Schema ```json { "$schema": "https://deepwikiopen.com/schemas/generator.json", "version": "2.0", "name": "project-name", "description": "Project description", "ai": { "openai": { "default_model": "gpt-5", "supportsCustomModel": true, "models": { "gpt-5": { "temperature": 1.0 }, "gpt-4o": { "temperature": 0.7, "top_p": 0.8 }, "gpt-4o-mini": { "temperature": 0.8 } } }, "google": { "default_model": "gemini-2.0-flash-thinking-exp-1219", "models": { "gemini-2.0-flash-thinking-exp-1219": {}, "gemini-1.5-flash": {}, "gemini-1.5-pro": {} } } }, "output": { "directory": "./docs", "format": "mdx", "clean": true, "preserveStructure": true }, "source": { "directories": ["src", "lib"], "extensions": [".ts", ".tsx", ".js", ".jsx"], "exclude": ["**/*.test.*", "**/node_modules/**"], "followSymlinks": false }, "parsing": { "parser": "typescript", "parserOptions": { "ecmaVersion": 2022, "sourceType": "module", "ecmaFeatures": { "jsx": true } }, "plugins": ["jsdoc", "typescript"] }, "generation": { "includePrivate": false, "includeInternal": false, "groupBy": "category", "sortBy": "alphabetical", "template": "default", "customTemplates": "./templates" }, "features": { "searchIndex": true, "apiReference": true, "changelog": true, "examples": true, "playground": false }, "metadata": { "author": "Your Name", "license": "MIT", "repository": "https://github.com/user/repo", "homepage": "https://project.com" } } ``` ### Key Options Explained #### Output Configuration ```json { "output": { "directory": "./docs", // Output directory path "format": "mdx", // Output format: mdx, markdown, html "clean": true, // Clean output dir before generation "preserveStructure": true, // Maintain source directory structure "assets": "./assets", // Static assets directory "publicPath": "/docs" // Base path for deployment } } ``` #### Source Configuration ```json { "source": { "directories": ["src", "lib", "packages/**/src"], "extensions": [".ts", ".tsx", ".js", ".jsx", ".vue", ".svelte"], "exclude": [ "**/*.test.*", "**/*.spec.*", "**/node_modules/**", "**/dist/**", "**/.next/**" ], "include": ["README.md", "CHANGELOG.md"], "followSymlinks": false, "maxDepth": 10 } } ``` #### Advanced Parsing Options ```json { "parsing": { "parser": "typescript", "parserOptions": { "ecmaVersion": 2022, "sourceType": "module", "ecmaFeatures": { "jsx": true, "decorators": true, "dynamicImport": true } }, "tsconfig": "./tsconfig.json", "plugins": ["jsdoc", "typescript", "flow"], "customParsers": { ".vue": "vue-parser", ".svelte": "svelte-parser" } } } ``` ### AI Model Configuration (Updated) As of commit 05693d5, DeepWiki now supports GPT-5 as the default OpenAI model. Configure AI providers in `generator.json`: ```json { "ai": { "openai": { "default_model": "gpt-5", // Now the default model "supportsCustomModel": true, "models": { "gpt-5": { "temperature": 1.0, "description": "Latest generation model with advanced reasoning" }, "gpt-4o": { "temperature": 0.7, "top_p": 0.8, "description": "Previous default, still available as fallback" } } }, "google": { "default_model": "gemini-2.0-flash-thinking-exp-1219", "models": { "gemini-2.0-flash-thinking-exp-1219": { "description": "Latest Gemini model with enhanced capabilities" } } }, "azure": { "default_model": "gpt-5", "deployment_name": "gpt-5-deployment", "api_version": "2025-01-01-preview" } } } ``` **Important GPT-5 Requirements:** * Ensure your OpenAI account has GPT-5 API access * GPT-5 may have regional availability limitations * Pricing may differ from GPT-4o models * Consider using GPT-4o as a fallback when GPT-5 is unavailable ## 2. embedder.json Configuration Controls vector embedding generation for semantic search. ### Complete Schema ```json { "$schema": "https://deepwikiopen.com/schemas/embedder.json", "version": "1.0", "model": { "provider": "openai", "name": "text-embedding-3-small", "dimensions": 1536, "maxTokens": 8192 }, "processing": { "chunkSize": 1000, "chunkOverlap": 200, "minChunkSize": 100, "splitter": "recursive", "batchSize": 100 }, "storage": { "type": "chroma", "path": "./embeddings", "collection": "docs", "persistence": true }, "indexing": { "includeCode": true, "includeComments": true, "includeDocstrings": true, "languages": ["typescript", "javascript", "python"], "metadata": { "extractTags": true, "extractAuthors": true, "extractDates": true } }, "optimization": { "cache": true, "cacheDir": "./.embedding-cache", "parallel": true, "workers": 4 } } ``` ### Provider-Specific Configurations #### OpenAI Configuration ```json { "model": { "provider": "openai", "name": "text-embedding-3-large", "dimensions": 3072, "apiKey": "${OPENAI_API_KEY}", "baseUrl": "https://api.openai.com/v1" } } ``` #### Ollama Configuration ```json { "model": { "provider": "ollama", "name": "nomic-embed-text", "dimensions": 768, "baseUrl": "http://localhost:11434" } } ``` #### Hugging Face Configuration ```json { "model": { "provider": "huggingface", "name": "sentence-transformers/all-MiniLM-L6-v2", "dimensions": 384, "device": "cuda", "apiKey": "${HF_TOKEN}" } } ``` ## 3. repo.json File Filters Controls file processing and repository limits. ### Complete Schema ```json { "$schema": "https://deepwikiopen.com/schemas/repo.json", "version": "1.0", "filters": { "include": { "patterns": [ "src/**/*.{ts,tsx,js,jsx}", "lib/**/*.{ts,tsx,js,jsx}", "docs/**/*.{md,mdx}" ], "minSize": 0, "maxSize": 1048576 }, "exclude": { "patterns": [ "**/node_modules/**", "**/.git/**", "**/dist/**", "**/coverage/**", "**/*.min.js" ], "dotfiles": true, "vendored": true, "generated": true } }, "limits": { "maxFiles": 10000, "maxFileSize": 5242880, "maxTotalSize": 104857600, "maxDepth": 20 }, "processing": { "detectLanguage": true, "extractMetadata": true, "parseImports": true, "analyzeDependencies": true }, "languages": { "typescript": { "extensions": [".ts", ".tsx"], "parser": "typescript" }, "javascript": { "extensions": [".js", ".jsx", ".mjs"], "parser": "babel" }, "python": { "extensions": [".py"], "parser": "python" } } } ``` ### Advanced Filtering Examples ```json { "filters": { "include": { "patterns": [ "packages/*/src/**/*.ts", "!packages/*/src/**/*.test.ts" ], "rules": [ { "pattern": "*.config.js", "maxSize": 102400 } ] }, "exclude": { "patterns": ["**/temp/**"], "keywords": ["GENERATED", "AUTO-GENERATED"], "licenses": ["proprietary", "confidential"] } } } ``` ## 4. Environment Variables Reference ### Core Variables ```bash # API Keys OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... HF_TOKEN=hf_... # Database DATABASE_URL=postgresql://user:pass@localhost:5432/deepwiki REDIS_URL=redis://localhost:6379 # Storage S3_BUCKET=deepwiki-docs S3_REGION=us-east-1 S3_ACCESS_KEY_ID=... S3_SECRET_ACCESS_KEY=... # Server Configuration PORT=3000 HOST=0.0.0.0 NODE_ENV=production LOG_LEVEL=info # Feature Flags ENABLE_EMBEDDINGS=true ENABLE_SEARCH=true ENABLE_ANALYTICS=false ENABLE_CACHE=true # Security JWT_SECRET=your-secret-key ENCRYPTION_KEY=your-encryption-key ALLOWED_ORIGINS=https://example.com,https://app.example.com # Limits MAX_FILE_SIZE=5MB MAX_REQUEST_SIZE=10MB RATE_LIMIT_WINDOW=15m RATE_LIMIT_MAX=100 ``` ### Development Variables ```bash # Development DEV_MODE=true HOT_RELOAD=true VERBOSE_LOGGING=true SKIP_AUTH=true # Testing TEST_DATABASE_URL=postgresql://test:test@localhost:5432/test MOCK_EMBEDDINGS=true ``` ## 5. Docker Configuration ### docker-compose.yml ```yaml version: '3.8' services: app: build: context: . dockerfile: Dockerfile args: NODE_ENV: production environment: - NODE_ENV=production - DATABASE_URL=${DATABASE_URL} - REDIS_URL=redis://redis:6379 ports: - "3000:3000" volumes: - ./docs:/app/docs - ./config:/app/config:ro depends_on: - postgres - redis restart: unless-stopped healthcheck: test: ["CMD", "curl", "-f", "http://localhost:3000/health"] interval: 30s timeout: 10s retries: 3 postgres: image: postgres:15 environment: POSTGRES_DB: deepwiki POSTGRES_USER: ${DB_USER} POSTGRES_PASSWORD: ${DB_PASSWORD} volumes: - postgres_data:/var/lib/postgresql/data ports: - "5432:5432" redis: image: redis:7-alpine volumes: - redis_data:/data ports: - "6379:6379" command: redis-server --appendonly yes nginx: image: nginx:alpine volumes: - ./nginx.conf:/etc/nginx/nginx.conf:ro - ./docs:/usr/share/nginx/html:ro ports: - "80:80" - "443:443" depends_on: - app volumes: postgres_data: redis_data: ``` ### Dockerfile Configuration ```dockerfile FROM node:20-alpine AS builder WORKDIR /app # Copy configuration files COPY package*.json ./ COPY generator.json embedder.json repo.json ./ # Install dependencies RUN npm ci --only=production # Copy source COPY . . # Build RUN npm run build # Production stage FROM node:20-alpine WORKDIR /app # Copy built application COPY --from=builder /app/dist ./dist COPY --from=builder /app/node_modules ./node_modules COPY --from=builder /app/*.json ./ # Create non-root user RUN addgroup -g 1001 -S nodejs RUN adduser -S nodejs -u 1001 # Set ownership RUN chown -R nodejs:nodejs /app USER nodejs EXPOSE 3000 CMD ["node", "dist/index.js"] ``` ## 6. Custom Configuration Directory ### Directory Structure ``` config/ ├── environments/ │ ├── development.json │ ├── staging.json │ └── production.json ├── features/ │ ├── search.json │ ├── embeddings.json │ └── analytics.json ├── overrides.json └── templates/ ├── page.mdx └── component.mdx ``` ### Environment-Specific Config ```json // config/environments/production.json { "extends": "../generator.json", "output": { "directory": "/var/www/docs", "publicPath": "https://docs.example.com" }, "features": { "playground": false, "analytics": true }, "optimization": { "minify": true, "compress": true, "cache": true } } ``` ### Feature Toggles ```json // config/features/search.json { "search": { "enabled": true, "provider": "algolia", "config": { "appId": "${ALGOLIA_APP_ID}", "apiKey": "${ALGOLIA_API_KEY}", "indexName": "docs" }, "options": { "facets": ["category", "tags", "language"], "snippetLength": 200, "highlightPreTag": "", "highlightPostTag": "" } } } ``` ## 7. Configuration Inheritance and Overrides ### Inheritance Chain ```javascript // Configuration precedence (highest to lowest): 1. Environment variables 2. CLI arguments 3. Override files 4. Environment-specific configs 5. Base configuration files 6. Default values ``` ### Override Example ```json // config/overrides.json { "extends": ["./generator.json", "./environments/production.json"], "merge": { "output": { "directory": "/custom/path" } }, "replace": { "features": { "playground": true } } } ``` ### Programmatic Configuration ```javascript // config.js module.exports = { generator: { async beforeGenerate(config) { // Dynamic configuration if (process.env.ENABLE_BETA_FEATURES) { config.features.beta = true; } return config; }, async afterGenerate(result, config) { // Post-processing console.log(`Generated ${result.files.length} files`); } } }; ``` ## 8. Best Practices ### 1. Use Environment Variables for Secrets ```json { "api": { "key": "${API_KEY}", "secret": "${API_SECRET}" } } ``` ### 2. Separate Concerns ``` config/ ├── generation.json # Documentation generation ├── deployment.json # Deployment settings ├── security.json # Security policies └── performance.json # Performance tuning ``` ### 3. Version Control Strategy ```gitignore # .gitignore .env .env.local config/secrets.json config/production.json ``` ### 4. Validation ```javascript // validate-config.js const { validateConfig } = require('deepwikiopen'); async function validate() { try { await validateConfig('./generator.json'); console.log('Configuration is valid'); } catch (error) { console.error('Configuration error:', error); process.exit(1); } } ``` ### 5. Configuration Templates ```json // config/templates/microservice.json { "name": "${SERVICE_NAME}", "source": { "directories": ["./src"], "extensions": [".ts"] }, "output": { "directory": "./docs/${SERVICE_NAME}" } } ``` ## 9. Security Considerations ### Sensitive Data Protection ```json { "security": { "encryption": { "enabled": true, "algorithm": "aes-256-gcm", "keySource": "env:ENCRYPTION_KEY" }, "secrets": { "provider": "vault", "endpoint": "${VAULT_ENDPOINT}", "namespace": "deepwiki" } } } ``` ### Access Control ```json { "access": { "authentication": { "required": true, "providers": ["oauth", "saml"] }, "authorization": { "roles": { "admin": ["*"], "developer": ["read", "generate"], "viewer": ["read"] } } } } ``` ### Audit Configuration ```json { "audit": { "enabled": true, "events": ["config.changed", "generation.started", "generation.completed"], "retention": "30d", "storage": "s3://audit-logs" } } ``` ## 10. Migration Between Versions ### Version 1.x to 2.x Migration ```javascript // migrate-config.js const { migrateConfig } = require('deepwikiopen/migrate'); async function migrate() { const oldConfig = require('./generator.v1.json'); const newConfig = await migrateConfig(oldConfig, { from: '1.x', to: '2.x', transforms: { // Custom transformations 'output.path': 'output.directory', 'parse.include': 'source.extensions' } }); await fs.writeFile('./generator.json', JSON.stringify(newConfig, null, 2)); } ``` ### Breaking Changes Checklist ```markdown ## Migration Checklist - [ ] Backup existing configuration - [ ] Update schema version - [ ] Rename deprecated fields - [ ] Update environment variables - [ ] Test with dry-run - [ ] Update CI/CD pipelines - [ ] Update documentation ``` ### Compatibility Mode ```json { "version": "2.0", "compatibility": { "mode": "1.x", "warnings": true, "strict": false } } ``` ## Configuration Debugging ### Debug Mode ```bash # Enable debug logging DEBUG=deepwiki:config npm run generate # Validate configuration npm run validate-config # Show resolved configuration npm run show-config ``` ### Common Issues 1. **Path Resolution** ```json { "paths": { "base": "${__dirname}", "resolve": ["node_modules", "../shared"] } } ``` 2. **Environment Variable Expansion** ```json { "database": { "url": "${DATABASE_URL:-postgresql://localhost:5432/dev}" } } ``` 3. **Circular Dependencies** ```json { "extends": "./base.json", "validate": { "circular": false } } ``` ## Next Steps * [Environment Setup Guide](/deployment/environment-setup) * [Security Best Practices](/deployment/security-best-practices) * [Performance Tuning](/deployment/performance-tuning) * [Troubleshooting Guide](/troubleshooting) # null Source: https://asyncfunc.mintlify.app/reference/contributing # Contributing to DeepWikiOpen Thank you for your interest in contributing to DeepWikiOpen! We're excited to have you join our community. This guide will help you get started with contributing to our project. ## 🤝 Code of Conduct We are committed to providing a welcoming and inclusive environment for all contributors. By participating in this project, you agree to abide by our Code of Conduct: ### Our Pledge We pledge to make participation in our project a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation. ### Expected Behavior * Use welcoming and inclusive language * Be respectful of differing viewpoints and experiences * Gracefully accept constructive criticism * Focus on what is best for the community * Show empathy towards other community members ### Unacceptable Behavior * Harassment, discrimination, or hate speech of any kind * Personal attacks or insults * Publishing others' private information without consent * Any conduct that could reasonably be considered inappropriate ### Enforcement Violations of the Code of Conduct may result in temporary or permanent exclusion from the project. Please report any incidents to [conduct@deepwikiopen.org](mailto:conduct@deepwikiopen.org). ## 🐛 How to Report Issues We use GitHub Issues to track bugs, feature requests, and other project-related discussions. ### Before Reporting an Issue 1. **Search existing issues** to avoid duplicates 2. **Check the documentation** to ensure it's not a known limitation 3. **Try the latest version** to see if the issue has been fixed ### Creating a Bug Report When reporting a bug, please include: ```markdown **Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior: 1. Go to '...' 2. Click on '....' 3. Scroll down to '....' 4. See error **Expected behavior** What you expected to happen. **Screenshots** If applicable, add screenshots to help explain your problem. **Environment:** - OS: [e.g. macOS, Windows, Linux] - Browser [e.g. Chrome, Safari] - Version [e.g. 22] **Additional context** Any other context about the problem. ``` ## 💡 Feature Request Process We love hearing your ideas for improving DeepWikiOpen! ### Submitting a Feature Request 1. **Check existing requests** in Issues and Discussions 2. **Create a new issue** with the "feature request" label 3. **Provide detailed information:** * Clear description of the feature * Use cases and benefits * Potential implementation approach * Any relevant examples or mockups ### Feature Request Template ```markdown **Is your feature request related to a problem?** A clear description of what the problem is. **Describe the solution you'd like** A clear description of what you want to happen. **Describe alternatives you've considered** Any alternative solutions or features you've considered. **Additional context** Add any other context or screenshots about the feature request. ``` ## 🚀 Development Setup Follow these steps to set up your development environment: ### Prerequisites * Node.js 18.0 or higher * npm or yarn package manager * Git * A code editor (we recommend VS Code) ### Setup Instructions ```bash # Clone the repository git clone https://github.com/deepwikiopen/deepwikiopen-docs.git cd deepwikiopen-docs # Install dependencies npm install # or yarn install # Create a development branch git checkout -b feature/your-feature-name # Start the development server npm run dev # or yarn dev ``` ### Environment Configuration Create a `.env.local` file in the root directory: ```env # Add any required environment variables API_KEY=your_api_key_here DATABASE_URL=your_database_url ``` ## 📝 Code Style Guidelines We maintain consistent code style to ensure readability and maintainability. ### General Guidelines * Use **TypeScript** for all new code * Follow **ESLint** and **Prettier** configurations * Write **self-documenting code** with clear variable and function names * Keep functions **small and focused** (single responsibility) * Use **meaningful commit messages** following conventional commits ### TypeScript Style ```typescript // Good example interface UserData { id: string; name: string; email: string; createdAt: Date; } export async function getUserById(id: string): Promise { try { const user = await db.users.findById(id); return user; } catch (error) { console.error('Error fetching user:', error); return null; } } ``` ### React/Component Style ```tsx // Good component example interface ButtonProps { label: string; onClick: () => void; variant?: 'primary' | 'secondary'; disabled?: boolean; } export function Button({ label, onClick, variant = 'primary', disabled = false }: ButtonProps) { return ( ); } ``` ### Running Code Quality Checks ```bash # Lint your code npm run lint # Format with Prettier npm run format # Type check npm run typecheck ``` ## 🧪 Testing Requirements All contributions must include appropriate tests. ### Test Categories 1. **Unit Tests** - Test individual functions and components 2. **Integration Tests** - Test component interactions 3. **End-to-End Tests** - Test complete user workflows ### Writing Tests ```typescript // Example unit test describe('getUserById', () => { it('should return user data when user exists', async () => { const userId = 'test-123'; const result = await getUserById(userId); expect(result).toBeDefined(); expect(result?.id).toBe(userId); }); it('should return null when user does not exist', async () => { const result = await getUserById('non-existent'); expect(result).toBeNull(); }); }); ``` ### Running Tests ```bash # Run all tests npm test # Run tests in watch mode npm test:watch # Run tests with coverage npm test:coverage ``` ### Test Coverage Requirements * Maintain **minimum 80% code coverage** * All new features must include tests * Bug fixes should include regression tests ## 📚 Documentation Standards Good documentation is crucial for project success. ### Documentation Types 1. **Code Comments** - Explain complex logic 2. **API Documentation** - Document all public APIs 3. **User Guides** - Help users understand features 4. **Developer Guides** - Help contributors understand the codebase ### Documentation Style ````typescript /** * Fetches user data by ID from the database. * * @param id - The unique identifier of the user * @returns Promise resolving to UserData or null if not found * @throws {DatabaseError} When database connection fails * * @example * ```typescript * const user = await getUserById('user-123'); * if (user) { * console.log(user.name); * } * ``` */ export async function getUserById(id: string): Promise { // Implementation } ```` ### MDX Documentation When writing MDX documentation: * Use clear, concise language * Include code examples * Add visual aids when helpful * Structure content with proper headings * Include a table of contents for long documents ## 🔄 Pull Request Process ### Before Creating a PR 1. **Fork the repository** and create a feature branch 2. **Make your changes** following our guidelines 3. **Write/update tests** for your changes 4. **Update documentation** as needed 5. **Run all checks** locally: ```bash npm run lint npm run test npm run build ``` ### Creating a Pull Request 1. **Push your branch** to your fork 2. **Create a PR** from your fork to our main branch 3. **Fill out the PR template** completely: ```markdown ## Description Brief description of changes ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Breaking change - [ ] Documentation update ## Testing - [ ] Tests pass locally - [ ] Added new tests - [ ] Updated existing tests ## Checklist - [ ] Code follows style guidelines - [ ] Self-reviewed my code - [ ] Commented complex code - [ ] Updated documentation - [ ] No breaking changes ``` ### PR Review Process 1. **Automated checks** run on all PRs 2. **Code review** by maintainers 3. **Address feedback** promptly 4. **Approval and merge** by maintainers ### Tips for Getting Your PR Merged * Keep PRs **focused and small** * Write **clear commit messages** * **Respond to feedback** constructively * Be **patient and respectful** * **Update your PR** if merge conflicts arise ## 💬 Community Channels Join our community to get help, share ideas, and connect with other contributors: ### Official Channels * **GitHub Discussions** - General discussions and Q\&A * **Discord Server** - [Join our Discord](https://discord.gg/deepwikiopen) * **Twitter/X** - [@deepwikiopen](https://twitter.com/deepwikiopen) * **Blog** - [blog.deepwikiopen.org](https://blog.deepwikiopen.org) ### Community Guidelines * Be helpful and supportive * Share knowledge freely * Respect different skill levels * Keep discussions on-topic * Report inappropriate behavior ### Getting Help * **Documentation** - Start with our docs * **FAQ** - Check frequently asked questions * **Discord #help** - Ask the community * **GitHub Discussions** - For deeper technical discussions ## 🏆 Recognition and Credits We believe in recognizing all contributions to our project! ### Types of Contributions We Recognize * **Code contributions** - Features, bug fixes, improvements * **Documentation** - Guides, API docs, examples * **Design** - UI/UX improvements, graphics * **Testing** - Test cases, bug reports * **Community** - Helping others, organizing events * **Translations** - Localizing content ### How We Recognize Contributors 1. **Contributors File** - All contributors listed in CONTRIBUTORS.md 2. **Release Notes** - Contributors mentioned in releases 3. **Website Credits** - Featured on our website 4. **Contributor Badge** - Special Discord role 5. **Swag** - Top contributors receive DeepWikiOpen swag ### Becoming a Core Contributor Consistent, high-quality contributions may lead to: * Write access to the repository * Invitation to maintainer meetings * Decision-making in project direction * Mentoring new contributors ### All Contributors We use the [All Contributors](https://allcontributors.org/) specification to recognize all types of contributions. Your avatar will appear in our README! *** ## 🎉 Thank You! Every contribution, no matter how small, helps make DeepWikiOpen better. We're grateful for your time and effort in improving our project. If you have any questions not covered in this guide, please don't hesitate to ask in our community channels. We're here to help! Happy contributing! 🚀 # null Source: https://asyncfunc.mintlify.app/reference/faq # Frequently Asked Questions ## General Questions ### What is DeepWiki? DeepWiki is an AI-powered documentation generator that transforms your codebase into comprehensive, searchable documentation. It analyzes your code structure, understands relationships between components, and generates clear, contextual documentation that stays in sync with your code. ### How is DeepWiki different from traditional documentation tools? Unlike traditional tools that require manual documentation writing, DeepWiki: * Automatically generates documentation from your code * Understands code context and relationships * Updates documentation as your code changes * Provides AI-powered search and explanations * Works with multiple programming languages ### What programming languages does DeepWiki support? DeepWiki currently supports: * JavaScript/TypeScript * Python * Java * Go * Rust * C/C++ * Ruby * PHP * Swift * And more coming soon! ### Can I use DeepWiki for commercial projects? Yes! DeepWiki is licensed for both personal and commercial use. Check our licensing section below for specific terms. ## Installation and Setup ### How do I install DeepWiki? Installation is simple with npm: ```bash npm install -g deepwiki ``` Or using yarn: ```bash yarn global add deepwiki ``` ### What are the system requirements? * Node.js 16.0 or higher * 4GB RAM minimum (8GB recommended for large projects) * 500MB free disk space * Internet connection for AI features ### How do I set up DeepWiki for my project? 1. Navigate to your project root 2. Run `deepwiki init` 3. Follow the interactive setup wizard 4. Configure your preferences in `.deepwiki/config.json` ### Can I use DeepWiki in a CI/CD pipeline? Absolutely! DeepWiki supports CI/CD integration: ```yaml # Example GitHub Actions workflow - name: Generate Documentation run: | npm install -g deepwiki deepwiki generate --ci ``` ## API Keys and Authentication ### Do I need an API key to use DeepWiki? DeepWiki offers both free and premium features: * **Free tier**: Basic documentation generation, no API key required * **Premium features**: AI-powered search, advanced analysis, requires API key ### How do I get an API key? 1. Visit [deepwiki.ai/dashboard](https://deepwiki.ai/dashboard) 2. Sign up for an account 3. Navigate to API Keys section 4. Generate your personal API key ### Where do I put my API key? You have several options: ```bash # Environment variable (recommended) export DEEPWIKI_API_KEY=your-api-key-here # Or in .env file DEEPWIKI_API_KEY=your-api-key-here # Or during initialization deepwiki init --api-key your-api-key-here ``` ### Is my API key secure? Yes! DeepWiki: * Never stores API keys in generated documentation * Uses HTTPS for all API communications * Supports environment variable configuration * Excludes `.env` files from documentation by default ## Model Selection and Costs ### What AI models does DeepWiki use? DeepWiki supports multiple AI models: * **GPT-4**: Best quality, higher cost * **GPT-3.5**: Good balance of quality and cost * **Claude**: Excellent for technical documentation * **Local models**: Privacy-focused, no API costs ### How much does it cost? Pricing depends on your usage: * **Free tier**: 10,000 tokens/month * **Starter**: \$19/month for 100,000 tokens * **Professional**: \$49/month for 500,000 tokens * **Enterprise**: Custom pricing ### Can I control costs? Yes! DeepWiki provides cost controls: ```json { "ai": { "monthlyBudget": 50, "warningThreshold": 0.8, "model": "gpt-3.5-turbo" } } ``` ### How do I estimate token usage? Rule of thumb: * Small project (\< 10k LOC): \~50k tokens * Medium project (10k-100k LOC): \~200k tokens * Large project (> 100k LOC): \~500k+ tokens ## Private Repository Support ### Can DeepWiki work with private repositories? Yes! DeepWiki works perfectly with private repositories: * Runs locally on your machine * Never uploads code to external servers * Supports all git providers (GitHub, GitLab, Bitbucket) ### Does my code leave my machine? By default, no. DeepWiki: * Processes code locally * Only sends anonymized prompts to AI providers * Can run entirely offline with local models ### Can I self-host DeepWiki? Enterprise customers can self-host DeepWiki: * Full on-premise deployment * Air-gapped environment support * Custom model integration * Contact [sales@deepwiki.ai](mailto:sales@deepwiki.ai) for details ## Performance and Scaling ### How fast is documentation generation? Generation speed depends on: * Project size: \~1000 files/minute * AI model: GPT-4 slower but higher quality * Hardware: Faster with more CPU cores ### Can DeepWiki handle large codebases? Yes! DeepWiki scales well: * Tested on codebases with 1M+ lines * Incremental updates for efficiency * Parallel processing support * Smart caching to reduce regeneration ### How can I improve performance? Tips for better performance: ```json { "performance": { "parallel": true, "maxWorkers": 8, "incremental": true, "cache": true } } ``` ### Does DeepWiki support monorepos? Absolutely! DeepWiki has excellent monorepo support: * Configure multiple documentation sites * Cross-package linking * Shared configuration * Workspace-aware processing ## Troubleshooting Common Issues ### DeepWiki command not found If you get "command not found": ```bash # Check installation npm list -g deepwiki # Reinstall if needed npm install -g deepwiki # Check PATH echo $PATH ``` ### Documentation not updating Try these steps: 1. Clear cache: `deepwiki clean` 2. Force regenerate: `deepwiki generate --force` 3. Check file patterns in config 4. Verify git status for changes ### AI features not working Common fixes: * Verify API key is set correctly * Check internet connection * Ensure API quota isn't exceeded * Try different AI model ### Build errors For build issues: * Update Node.js to latest LTS * Clear npm cache: `npm cache clean --force` * Delete node\_modules and reinstall * Check for conflicting global packages ## Feature Requests and Roadmap ### How can I request a feature? We love feedback! Submit feature requests: * GitHub Issues: [github.com/deepwiki/deepwiki/issues](https://github.com/deepwiki/deepwiki/issues) * Feature board: [deepwiki.ai/features](https://deepwiki.ai/features) * Community Discord: [discord.gg/deepwiki](https://discord.gg/deepwiki) ### What's on the roadmap? Upcoming features: * **Q1 2024**: Real-time collaboration * **Q2 2024**: Video tutorials generation * **Q3 2024**: API playground integration * **Q4 2024**: Mobile app for documentation ### Can I contribute to DeepWiki? Yes! We're open source: * Star us on GitHub * Submit pull requests * Improve documentation * Share templates and themes ## Community and Support ### Where can I get help? Multiple support channels: * 📚 Documentation: [docs.deepwiki.ai](https://docs.deepwiki.ai) * 💬 Discord: [discord.gg/deepwiki](https://discord.gg/deepwiki) * 🐛 Issues: [GitHub Issues](https://github.com/deepwiki/deepwiki/issues) * 📧 Email: [support@deepwiki.ai](mailto:support@deepwiki.ai) ### Is there a community? Join our growing community: * 5,000+ developers on Discord * Weekly office hours * Community showcases * Template marketplace ### Do you offer training? Yes! We provide: * Free video tutorials * Live workshops (monthly) * Enterprise training sessions * Certification program (coming soon) ## Licensing and Usage Rights ### What license is DeepWiki under? DeepWiki uses a dual license: * **Community Edition**: MIT License * **Enterprise Edition**: Commercial license ### Who owns the generated documentation? You do! Generated documentation is: * 100% owned by you * No licensing restrictions * Can be used commercially * Can be modified freely ### Can I white-label DeepWiki? Enterprise customers can: * Remove DeepWiki branding * Custom themes and domains * Private npm registry * Contact sales for details ### Are there usage limits? Limits by plan: * **Free**: 10k tokens/month, 3 projects * **Starter**: 100k tokens/month, 10 projects * **Pro**: 500k tokens/month, unlimited projects * **Enterprise**: No limits *** ## Still have questions? Can't find what you're looking for? We're here to help: * 📧 Email: [support@deepwiki.ai](mailto:support@deepwiki.ai) * 💬 Live chat: Available on our website * 📞 Enterprise support: 1-800-DEEPWIKI Happy documenting! 🚀 # null Source: https://asyncfunc.mintlify.app/reference/logging # Logging DeepWiki implements a comprehensive logging system designed for production environments, providing detailed insights into system behavior while maintaining performance and security. ## Configuration ### Environment Variables Configure DeepWiki's logging behavior through environment variables: ```bash # Log level configuration LOG_LEVEL=info # Options: debug, info, warn, error NODE_ENV=production # Affects log verbosity and format # Log output configuration LOG_FORMAT=json # Options: json, pretty LOG_TIMESTAMP=iso # Options: iso, unix, relative LOG_COLOR=false # Enable/disable colored output # File logging LOG_TO_FILE=true # Enable file logging LOG_FILE_PATH=/var/log/deepwiki # Log directory path LOG_FILE_NAME=deepwiki.log # Primary log file name LOG_MAX_SIZE=10M # Max size before rotation LOG_MAX_FILES=5 # Number of rotated files to keep LOG_COMPRESS=true # Compress rotated files # Performance logging LOG_SLOW_QUERIES=true # Log slow database queries LOG_SLOW_THRESHOLD=1000 # Slow query threshold (ms) LOG_REQUEST_DETAILS=true # Log HTTP request details LOG_RESPONSE_TIME=true # Log API response times # Security logging LOG_AUTH_EVENTS=true # Log authentication events LOG_SENSITIVE_DATA=false # Never log sensitive data LOG_IP_ADDRESSES=true # Log client IP addresses ``` ### Docker Compose Configuration ```yaml version: '3.8' services: deepwiki: image: deepwiki/deepwiki:latest environment: - LOG_LEVEL=info - LOG_FORMAT=json - LOG_TO_FILE=true - LOG_FILE_PATH=/app/logs volumes: - ./logs:/app/logs logging: driver: "json-file" options: max-size: "10m" max-file: "3" labels: "service=deepwiki" ``` ## Log Levels DeepWiki uses standard log levels with specific meanings: ### Debug Detailed information for diagnosing problems: ```javascript logger.debug('Processing search query', { query: searchQuery, filters: activeFilters, userId: user.id, timestamp: Date.now() }); ``` ### Info General informational messages: ```javascript logger.info('User logged in successfully', { userId: user.id, method: 'oauth', provider: 'github' }); ``` ### Warn Warning messages for potentially harmful situations: ```javascript logger.warn('Rate limit approaching', { userId: user.id, currentRate: 95, limit: 100, window: '1h' }); ``` ### Error Error events that might still allow the application to continue: ```javascript logger.error('Failed to send email', { error: err.message, stack: err.stack, recipient: user.email, template: 'welcome' }); ``` ## File Management ### Log File Location Default log file structure: ``` /var/log/deepwiki/ ├── deepwiki.log # Current log file ├── deepwiki.1.log.gz # Rotated and compressed ├── deepwiki.2.log.gz ├── error.log # Error-only log ├── access.log # HTTP access log └── slow-query.log # Database performance log ``` ### Rotation Configuration Configure log rotation with environment variables or logrotate: ```bash # /etc/logrotate.d/deepwiki /var/log/deepwiki/*.log { daily rotate 7 compress delaycompress missingok notifempty create 0644 deepwiki deepwiki sharedscripts postrotate docker exec deepwiki-app kill -USR2 1 endscript } ``` ### Programmatic Rotation ```javascript const winston = require('winston'); require('winston-daily-rotate-file'); const fileRotateTransport = new winston.transports.DailyRotateFile({ filename: 'deepwiki-%DATE%.log', dirname: process.env.LOG_FILE_PATH, datePattern: 'YYYY-MM-DD', maxSize: '10m', maxFiles: '14d', auditFile: 'log-audit.json', zippedArchive: true }); ``` ## Docker Logging ### Container Logging Strategy ```yaml # docker-compose.yml services: deepwiki: logging: driver: "json-file" options: max-size: "10m" max-file: "5" labels: "app=deepwiki,env=production" env: "LOG_LEVEL,NODE_ENV" ``` ### Centralized Logging with Fluentd ```yaml # fluentd configuration version: '3.8' services: deepwiki: logging: driver: "fluentd" options: fluentd-address: "localhost:24224" tag: "deepwiki.{{.Name}}" fluentd-async-connect: "true" fluentd-retry-wait: "1s" fluentd-max-retries: "30" fluentd: image: fluent/fluentd:latest volumes: - ./fluent.conf:/fluentd/etc/fluent.conf - ./logs:/fluentd/log ports: - "24224:24224" ``` ### Docker Logs Commands ```bash # View logs docker logs deepwiki-app # Follow logs docker logs -f deepwiki-app # View last 100 lines docker logs --tail 100 deepwiki-app # Filter by timestamp docker logs --since 2024-01-01T00:00:00 deepwiki-app # Save logs to file docker logs deepwiki-app > deepwiki.log 2>&1 ``` ## Structured Logging ### JSON Log Format Standard JSON log structure: ```json { "timestamp": "2024-01-15T10:30:45.123Z", "level": "info", "service": "deepwiki", "environment": "production", "version": "1.2.3", "message": "API request completed", "context": { "requestId": "abc-123-def", "userId": "user_456", "method": "GET", "path": "/api/v1/search", "statusCode": 200, "duration": 145, "ip": "192.168.1.1", "userAgent": "Mozilla/5.0..." }, "metadata": { "searchQuery": "kubernetes deployment", "resultsCount": 25, "cacheHit": true } } ``` ### Custom Log Formatter ```javascript const logFormat = winston.format.combine( winston.format.timestamp({ format: 'YYYY-MM-DD HH:mm:ss.SSS' }), winston.format.errors({ stack: true }), winston.format.metadata({ fillWith: ['requestId', 'userId', 'sessionId'] }), winston.format.json() ); const logger = winston.createLogger({ format: logFormat, defaultMeta: { service: 'deepwiki', version: process.env.APP_VERSION, environment: process.env.NODE_ENV } }); ``` ## Common Debug Patterns ### Request Tracking ```javascript // Middleware for request tracking app.use((req, res, next) => { req.id = generateRequestId(); req.startTime = Date.now(); logger.info('Request started', { requestId: req.id, method: req.method, path: req.path, query: req.query, headers: sanitizeHeaders(req.headers) }); res.on('finish', () => { logger.info('Request completed', { requestId: req.id, statusCode: res.statusCode, duration: Date.now() - req.startTime }); }); next(); }); ``` ### Database Query Logging ```javascript // Log slow queries db.on('query', (query) => { if (query.duration > process.env.LOG_SLOW_THRESHOLD) { logger.warn('Slow query detected', { query: query.sql, bindings: query.bindings, duration: query.duration, rows: query.rowCount }); } }); ``` ### Error Context Logging ```javascript // Enhanced error logging function logError(error, context = {}) { logger.error('Application error', { message: error.message, stack: error.stack, code: error.code, ...context, timestamp: new Date().toISOString(), pid: process.pid, memory: process.memoryUsage() }); } ``` ## Performance Considerations ### Asynchronous Logging ```javascript // Use async transports to prevent blocking const asyncTransport = new winston.transports.File({ filename: 'app.log', maxsize: 10485760, // 10MB maxFiles: 5, tailable: true, zippedArchive: true, // Async writing flags: 'a', encoding: 'utf8', mode: 0o666 }); // Buffer logs for batch writing const batchTransport = new BatchTransport({ batchSize: 100, flushInterval: 5000, transport: asyncTransport }); ``` ### Sampling Strategies ```javascript // Sample verbose logs in production function shouldLog(level, samplingRate = 0.1) { if (level === 'error' || level === 'warn') return true; if (process.env.NODE_ENV !== 'production') return true; return Math.random() < samplingRate; } // Usage if (shouldLog('debug', 0.05)) { logger.debug('Detailed trace information', heavyObject); } ``` ### Performance Metrics ```javascript // Log performance metrics periodically setInterval(() => { const metrics = { memory: process.memoryUsage(), cpu: process.cpuUsage(), uptime: process.uptime(), activeRequests: getActiveRequestCount(), dbConnections: getDbConnectionCount() }; logger.info('Performance metrics', metrics); }, 60000); // Every minute ``` ## Security Best Practices ### Data Sanitization ```javascript // Sanitize sensitive data function sanitizeLogData(data) { const sensitiveFields = [ 'password', 'token', 'secret', 'apiKey', 'creditCard', 'ssn', 'email' ]; const sanitized = { ...data }; sensitiveFields.forEach(field => { if (sanitized[field]) { sanitized[field] = '[REDACTED]'; } }); return sanitized; } // Usage logger.info('User registration', sanitizeLogData({ username: user.username, email: user.email, password: user.password })); ``` ### Compliance Logging ```javascript // GDPR-compliant logging const gdprLogger = { logUserAction(userId, action, details) { logger.info('User action', { userId: hashUserId(userId), // Pseudonymize action, timestamp: Date.now(), details: sanitizePersonalData(details) }); }, logDataAccess(accessor, resource, purpose) { logger.info('Data access', { accessor: hashUserId(accessor), resource, purpose, timestamp: Date.now(), authorized: true }); } }; ``` ## Log Aggregation ### ELK Stack Integration ```yaml # Logstash configuration input { file { path => "/var/log/deepwiki/*.log" start_position => "beginning" codec => "json" type => "deepwiki" } } filter { if [type] == "deepwiki" { date { match => [ "timestamp", "ISO8601" ] } geoip { source => "ip" target => "geoip" } mutate { remove_field => [ "host", "@version" ] } } } output { elasticsearch { hosts => ["elasticsearch:9200"] index => "deepwiki-%{+YYYY.MM.dd}" } } ``` ### Prometheus Integration ```javascript // Export metrics for Prometheus const promClient = require('prom-client'); const httpDuration = new promClient.Histogram({ name: 'http_request_duration_seconds', help: 'Duration of HTTP requests in seconds', labelNames: ['method', 'route', 'status'] }); // Log and export metrics app.use((req, res, next) => { const end = httpDuration.startTimer(); res.on('finish', () => { end({ method: req.method, route: req.route?.path || 'unknown', status: res.statusCode }); logger.info('Request metric', { method: req.method, path: req.path, status: res.statusCode, duration: end() }); }); next(); }); ``` ## Monitoring & Alerting ### Alert Rules ```yaml # Prometheus alert rules groups: - name: deepwiki_alerts rules: - alert: HighErrorRate expr: rate(log_entries_total{level="error"}[5m]) > 0.05 for: 5m labels: severity: warning annotations: summary: "High error rate detected" description: "Error rate is {{ $value }} errors/sec" - alert: SlowQueries expr: rate(slow_queries_total[5m]) > 10 for: 5m labels: severity: warning annotations: summary: "High number of slow queries" ``` ### Health Check Logging ```javascript // Health check endpoint with logging app.get('/health', async (req, res) => { const health = { status: 'healthy', timestamp: new Date().toISOString(), uptime: process.uptime(), checks: {} }; try { // Database check const dbStart = Date.now(); await db.raw('SELECT 1'); health.checks.database = { status: 'healthy', responseTime: Date.now() - dbStart }; // Redis check const redisStart = Date.now(); await redis.ping(); health.checks.redis = { status: 'healthy', responseTime: Date.now() - redisStart }; logger.info('Health check passed', health); res.json(health); } catch (error) { health.status = 'unhealthy'; health.error = error.message; logger.error('Health check failed', health); res.status(503).json(health); } }); ``` ### Dashboard Integration ```javascript // Export logs to monitoring dashboard class LogExporter { constructor(dashboardUrl, apiKey) { this.dashboardUrl = dashboardUrl; this.apiKey = apiKey; this.buffer = []; this.flushInterval = 5000; setInterval(() => this.flush(), this.flushInterval); } export(logEntry) { this.buffer.push({ ...logEntry, hostname: os.hostname(), pid: process.pid }); if (this.buffer.length >= 100) { this.flush(); } } async flush() { if (this.buffer.length === 0) return; const logs = [...this.buffer]; this.buffer = []; try { await fetch(`${this.dashboardUrl}/api/logs`, { method: 'POST', headers: { 'Authorization': `Bearer ${this.apiKey}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ logs }) }); } catch (error) { logger.error('Failed to export logs', { error: error.message }); // Re-add logs to buffer for retry this.buffer.unshift(...logs); } } } ``` ## Best Practices 1. **Use Structured Logging**: Always log in JSON format for easy parsing 2. **Include Context**: Add request IDs, user IDs, and session IDs to all logs 3. **Avoid Blocking**: Use asynchronous logging in production 4. **Sanitize Data**: Never log passwords, tokens, or sensitive personal data 5. **Set Appropriate Levels**: Use debug in development, info in production 6. **Monitor Log Volume**: Set up alerts for unusual log volumes 7. **Implement Retention**: Define log retention policies based on compliance needs 8. **Use Correlation IDs**: Track requests across services with unique IDs 9. **Log Metrics**: Include performance data in logs for analysis 10. **Test Log Output**: Verify logs work correctly in all environments ## Troubleshooting ### Common Issues ```bash # Check if logs are being written tail -f /var/log/deepwiki/deepwiki.log # Verify log permissions ls -la /var/log/deepwiki/ # Check disk space df -h /var/log # Monitor log growth watch -n 1 'du -sh /var/log/deepwiki/*' # Search for errors grep -i error /var/log/deepwiki/deepwiki.log | tail -20 # Check log configuration docker exec deepwiki-app env | grep LOG_ ``` ### Debug Mode Enable verbose logging for troubleshooting: ```bash # Temporarily enable debug logging docker exec deepwiki-app npm run debug # Or set environment variable docker exec -e LOG_LEVEL=debug deepwiki-app node app.js ``` # Performance Optimization Source: https://asyncfunc.mintlify.app/reference/performance Comprehensive guide to optimizing deepwikiopen performance # Performance Optimization This guide covers performance benchmarks, optimization strategies, and best practices for running deepwikiopen efficiently at scale. ## Performance Benchmarks ### Baseline Performance Metrics | Repository Size | Initial Index Time | Query Response Time | Memory Usage | CPU Usage | | --------------- | ------------------ | ------------------- | ------------ | --------- | | Small (\<1GB) | 5-10 minutes | \<100ms | 2-4GB | 20-40% | | Medium (1-10GB) | 20-60 minutes | 100-300ms | 4-8GB | 40-60% | | Large (10-50GB) | 1-4 hours | 300-500ms | 8-16GB | 60-80% | | XLarge (>50GB) | 4-12 hours | 500-1000ms | 16-32GB | 80-100% | ### Query Performance Benchmarks ```yaml # Average query response times by complexity simple_keyword_search: 50-100ms semantic_search: 100-300ms code_understanding: 200-500ms multi_file_analysis: 500-2000ms repository_wide_search: 1000-5000ms ``` ### Throughput Benchmarks ```yaml # Requests per second (RPS) by hardware configuration minimal_setup: cpu: 2 cores ram: 4GB rps: 10-20 recommended_setup: cpu: 4 cores ram: 8GB rps: 50-100 production_setup: cpu: 8 cores ram: 16GB rps: 200-500 high_performance: cpu: 16 cores ram: 32GB rps: 1000+ ``` ## Resource Requirements by Repository Size ### Minimum Requirements ```yaml small_repositories: cpu: 2 cores memory: 4GB storage: 10GB network: 100Mbps medium_repositories: cpu: 4 cores memory: 8GB storage: 50GB network: 1Gbps large_repositories: cpu: 8 cores memory: 16GB storage: 200GB network: 1Gbps enterprise_scale: cpu: 16+ cores memory: 32GB+ storage: 1TB+ network: 10Gbps ``` ### Recommended Configurations ```yaml development: cpu: 4 cores memory: 8GB storage: 100GB SSD gpu: Optional (NVIDIA GTX 1060+) production: cpu: 8-16 cores memory: 16-32GB storage: 500GB NVMe SSD gpu: Recommended (NVIDIA RTX 3060+) high_performance: cpu: 32+ cores memory: 64GB+ storage: 2TB NVMe RAID gpu: Required (NVIDIA A100/H100) ``` ## Caching Strategies and Configuration ### Multi-Level Caching Architecture ```python # config/cache.py CACHE_CONFIG = { "embedding_cache": { "type": "redis", "ttl": 86400, # 24 hours "max_size": "10GB", "eviction": "lru" }, "query_cache": { "type": "memory", "ttl": 3600, # 1 hour "max_size": "2GB", "eviction": "lfu" }, "file_cache": { "type": "disk", "ttl": 604800, # 7 days "max_size": "50GB", "path": "/var/cache/deepwikiopen" } } ``` ### Redis Configuration ```redis # redis.conf optimizations maxmemory 8gb maxmemory-policy allkeys-lru save "" # Disable persistence for cache tcp-keepalive 60 tcp-backlog 511 databases 16 # Performance tuning hz 100 dynamic-hz yes rdb-compression yes rdb-checksum no ``` ### Application-Level Caching ```python # Implement intelligent caching from functools import lru_cache from typing import Dict, List import hashlib class PerformanceCache: def __init__(self): self.embedding_cache = {} self.query_cache = {} @lru_cache(maxsize=10000) def get_cached_embedding(self, content_hash: str): """Cache embeddings by content hash""" return self.embedding_cache.get(content_hash) def cache_query_result(self, query: str, results: List[Dict]): """Cache query results with TTL""" query_hash = hashlib.md5(query.encode()).hexdigest() self.query_cache[query_hash] = { "results": results, "timestamp": time.time(), "ttl": 3600 } ``` ## Database Optimization ### PostgreSQL Configuration ```sql -- postgresql.conf optimizations shared_buffers = 25% of RAM effective_cache_size = 75% of RAM work_mem = 256MB maintenance_work_mem = 1GB wal_buffers = 16MB checkpoint_completion_target = 0.9 random_page_cost = 1.1 -- For SSD -- Connection pooling max_connections = 200 shared_preload_libraries = 'pg_stat_statements' ``` ### Index Optimization ```sql -- Create optimized indexes CREATE INDEX idx_embeddings_vector ON embeddings USING ivfflat (vector vector_l2_ops) WITH (lists = 1000); CREATE INDEX idx_code_files_path ON code_files USING btree (file_path); CREATE INDEX idx_code_files_language ON code_files USING btree (language); CREATE INDEX idx_code_files_updated ON code_files USING btree (last_updated DESC); -- Partial indexes for common queries CREATE INDEX idx_active_repositories ON repositories (id) WHERE is_active = true; -- Composite indexes CREATE INDEX idx_search_composite ON code_files (repository_id, language, file_path); ``` ### Query Optimization ```sql -- Use prepared statements PREPARE search_embeddings AS SELECT file_id, similarity FROM embeddings ORDER BY vector <-> $1 LIMIT $2; -- Batch operations INSERT INTO embeddings (file_id, vector, metadata) VALUES ($1, $2, $3) ON CONFLICT (file_id) DO UPDATE SET vector = EXCLUDED.vector, updated_at = NOW(); ``` ## Model Selection for Performance ### Model Performance Comparison | Model Type | Speed | Accuracy | Memory | Use Case | | ---------- | ----- | -------- | ------ | -------------------- | | TinyBERT | ⚡⚡⚡⚡⚡ | ⭐⭐⭐ | 500MB | Real-time search | | DistilBERT | ⚡⚡⚡⚡ | ⭐⭐⭐⭐ | 1GB | Balanced performance | | BERT-base | ⚡⚡⚡ | ⭐⭐⭐⭐ | 2GB | Standard search | | CodeBERT | ⚡⚡ | ⭐⭐⭐⭐⭐ | 4GB | Code understanding | | GPT-2 | ⚡ | ⭐⭐⭐⭐⭐ | 8GB | Advanced analysis | ### Dynamic Model Selection ```python def select_optimal_model(query_type: str, resource_constraints: dict): """Select model based on query type and resources""" if resource_constraints["memory"] < 2000: # 2GB return "sentence-transformers/all-MiniLM-L6-v2" if query_type == "simple_search": return "sentence-transformers/all-mpnet-base-v2" elif query_type == "code_search": return "microsoft/codebert-base" elif query_type == "semantic_analysis": return "sentence-transformers/all-roberta-large-v1" return "sentence-transformers/all-mpnet-base-v2" # default ``` ## Concurrent Request Handling ### Async Request Processing ```python import asyncio from aiohttp import web import aiodns from concurrent.futures import ThreadPoolExecutor class ConcurrentHandler: def __init__(self, max_workers=10): self.executor = ThreadPoolExecutor(max_workers=max_workers) self.semaphore = asyncio.Semaphore(100) # Limit concurrent requests async def handle_request(self, request): async with self.semaphore: # Process request asynchronously result = await self.process_query(request) return result async def process_batch(self, requests): """Process multiple requests concurrently""" tasks = [self.handle_request(req) for req in requests] return await asyncio.gather(*tasks) ``` ### Load Balancing Configuration ```nginx # nginx.conf for load balancing upstream deepwikiopen_backend { least_conn; server backend1:8000 weight=3; server backend2:8000 weight=2; server backend3:8000 weight=1; keepalive 32; } server { location /api { proxy_pass http://deepwikiopen_backend; proxy_http_version 1.1; proxy_set_header Connection ""; proxy_buffering off; proxy_request_buffering off; } } ``` ## Memory Management ### Memory Optimization Strategies ```python import gc import resource from memory_profiler import profile class MemoryManager: def __init__(self, max_memory_mb=8192): self.max_memory = max_memory_mb * 1024 * 1024 self.configure_limits() def configure_limits(self): """Set memory limits""" resource.setrlimit( resource.RLIMIT_AS, (self.max_memory, self.max_memory) ) @profile def process_large_dataset(self, data_iterator): """Process data in chunks to manage memory""" chunk_size = 1000 processed = 0 for chunk in self.chunked(data_iterator, chunk_size): # Process chunk results = self.process_chunk(chunk) # Yield results to avoid memory accumulation yield results # Explicit garbage collection if processed % 10000 == 0: gc.collect() processed += chunk_size def monitor_memory(self): """Monitor current memory usage""" import psutil process = psutil.Process() return { "rss": process.memory_info().rss / 1024 / 1024, # MB "vms": process.memory_info().vms / 1024 / 1024, # MB "percent": process.memory_percent() } ``` ### Memory Pool Configuration ```python # Configure memory pools for embeddings MEMORY_POOLS = { "embeddings": { "size": "4GB", "prealloc": True, "cleanup_interval": 300 # 5 minutes }, "cache": { "size": "2GB", "prealloc": False, "cleanup_interval": 600 # 10 minutes }, "temporary": { "size": "1GB", "prealloc": False, "cleanup_interval": 60 # 1 minute } } ``` ## Docker Resource Limits ### Docker Compose Configuration ```yaml version: '3.8' services: deepwikiopen: image: deepwikiopen:latest deploy: resources: limits: cpus: '4.0' memory: 8G reservations: cpus: '2.0' memory: 4G environment: - PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 - OMP_NUM_THREADS=4 ulimits: memlock: soft: -1 hard: -1 nofile: soft: 65536 hard: 65536 ``` ### Container Optimization ```dockerfile # Optimized Dockerfile FROM python:3.11-slim # Install only required dependencies RUN apt-get update && apt-get install -y \ --no-install-recommends \ build-essential \ && rm -rf /var/lib/apt/lists/* # Use multi-stage builds FROM python:3.11-slim as runtime COPY --from=builder /app /app # Set resource limits ENV PYTHONUNBUFFERED=1 ENV MALLOC_ARENA_MAX=2 ENV MALLOC_MMAP_THRESHOLD_=131072 ENV MALLOC_TRIM_THRESHOLD_=131072 ``` ## Monitoring Performance Metrics ### Prometheus Configuration ```yaml # prometheus.yml global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: - job_name: 'deepwikiopen' static_configs: - targets: ['localhost:8000'] metrics_path: '/metrics' ``` ### Custom Metrics ```python from prometheus_client import Counter, Histogram, Gauge import time # Define metrics query_counter = Counter('deepwikiopen_queries_total', 'Total queries processed') query_duration = Histogram('deepwikiopen_query_duration_seconds', 'Query duration') active_connections = Gauge('deepwikiopen_active_connections', 'Active connections') class MetricsCollector: @query_duration.time() def process_query(self, query): """Track query processing time""" query_counter.inc() start_time = time.time() try: result = self.execute_query(query) return result finally: duration = time.time() - start_time self.record_metric('query_duration', duration) ``` ### Grafana Dashboard ```json { "dashboard": { "title": "DeepWikiOpen Performance", "panels": [ { "title": "Query Rate", "targets": [{ "expr": "rate(deepwikiopen_queries_total[5m])" }] }, { "title": "Response Time", "targets": [{ "expr": "histogram_quantile(0.95, deepwikiopen_query_duration_seconds)" }] }, { "title": "Memory Usage", "targets": [{ "expr": "process_resident_memory_bytes / 1024 / 1024" }] } ] } } ``` ## Optimization Techniques ### Repository Cloning Optimization ```python # Optimized repository cloning with shallow clone # As of commit f79554f - Significantly reduces clone time and bandwidth usage def download_repo(repo_url: str, local_path: str, type: str = "github", access_token: str = None): """ Clone repository with optimized shallow clone for faster performance. Using --depth=1 and --single-branch flags reduces clone time by up to 90% for large repositories. """ clone_url = repo_url if not access_token else repo_url.replace("https://", f"https://{access_token}@") # Optimized clone command with shallow clone flags result = subprocess.run( ["git", "clone", "--depth=1", "--single-branch", clone_url, local_path], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, ) # Benefits: # - Reduces download size by only fetching latest commit # - Decreases clone time from minutes to seconds for large repos # - Lowers bandwidth usage and storage requirements # - Perfect for documentation generation where history isn't needed ``` ### Clone Performance Benchmarks | Repository Size | Standard Clone | Shallow Clone | Time Saved | Size Reduction | | ------------------ | -------------- | ------------- | ---------- | -------------- | | Small (\<100MB) | 5-10s | 1-2s | 80% | 70% | | Medium (100MB-1GB) | 30-60s | 3-5s | 90% | 85% | | Large (1-10GB) | 5-15min | 10-30s | 95% | 90% | | Massive (>10GB) | 30-60min | 30-60s | 98% | 95% | ### Code-Level Optimizations ```python # Use vectorized operations import numpy as np from numba import jit @jit(nopython=True) def fast_similarity(vector1, vector2): """Optimized cosine similarity calculation""" dot_product = np.dot(vector1, vector2) norm1 = np.linalg.norm(vector1) norm2 = np.linalg.norm(vector2) return dot_product / (norm1 * norm2) # Batch processing def process_embeddings_batch(embeddings, batch_size=1000): """Process embeddings in optimized batches""" for i in range(0, len(embeddings), batch_size): batch = embeddings[i:i + batch_size] # Process batch using vectorized operations yield np.array(batch) ``` ### Network Optimization ```python # Connection pooling import aiohttp from aiohttp import TCPConnector async def create_session(): """Create optimized HTTP session""" connector = TCPConnector( limit=100, limit_per_host=30, ttl_dns_cache=300, enable_cleanup_closed=True ) timeout = aiohttp.ClientTimeout(total=30, connect=5) return aiohttp.ClientSession( connector=connector, timeout=timeout, headers={'Connection': 'keep-alive'} ) ``` ### Storage Optimization ```python # Use efficient storage formats import pyarrow.parquet as pq import pyarrow as pa def save_embeddings_optimized(embeddings, path): """Save embeddings in optimized format""" # Convert to Arrow table table = pa.table({ 'id': embeddings['id'], 'vector': embeddings['vector'], 'metadata': embeddings['metadata'] }) # Write with compression pq.write_table( table, path, compression='snappy', use_dictionary=True, row_group_size=50000 ) ``` ## Scaling Strategies ### Horizontal Scaling ```yaml # kubernetes-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: deepwikiopen spec: replicas: 3 selector: matchLabels: app: deepwikiopen template: spec: containers: - name: deepwikiopen image: deepwikiopen:latest resources: requests: memory: "4Gi" cpu: "2" limits: memory: "8Gi" cpu: "4" --- apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: deepwikiopen-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: deepwikiopen minReplicas: 3 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 ``` ### Vertical Scaling ```python # Dynamic resource allocation class ResourceScaler: def __init__(self): self.current_resources = self.get_current_resources() def scale_up(self, factor=1.5): """Increase resources dynamically""" new_workers = int(self.current_resources['workers'] * factor) new_memory = int(self.current_resources['memory'] * factor) # Update worker pool self.update_worker_pool(new_workers) # Adjust memory limits self.adjust_memory_limits(new_memory) def auto_scale(self, metrics): """Auto-scale based on metrics""" if metrics['cpu_usage'] > 80: self.scale_up(1.5) elif metrics['cpu_usage'] < 20: self.scale_down(0.7) ``` ### Distributed Processing ```python # Distributed embedding generation from ray import ray import ray.serve @ray.remote class EmbeddingWorker: def __init__(self, model_name): self.model = self.load_model(model_name) def generate_embedding(self, text): return self.model.encode(text) # Initialize Ray cluster ray.init(address='ray://head-node:10001') # Create worker pool workers = [EmbeddingWorker.remote("model-name") for _ in range(10)] # Distribute work futures = [] for batch in data_batches: worker = workers[len(futures) % len(workers)] futures.append(worker.generate_embedding.remote(batch)) # Collect results results = ray.get(futures) ``` ## Performance Tuning Checklist ### Pre-Deployment * [ ] Profile application for bottlenecks * [ ] Optimize database queries and indexes * [ ] Configure connection pooling * [ ] Set up caching layers * [ ] Choose appropriate models * [ ] Configure resource limits * [ ] Set up monitoring ### Runtime Optimization * [ ] Monitor query patterns * [ ] Adjust cache TTLs * [ ] Tune garbage collection * [ ] Optimize batch sizes * [ ] Balance load across instances * [ ] Update model selection * [ ] Clean up unused resources ### Continuous Improvement * [ ] Analyze performance metrics * [ ] Identify slow queries * [ ] Review resource utilization * [ ] Update optimization strategies * [ ] Test scaling policies * [ ] Benchmark improvements * [ ] Document best practices ## Performance Best Practices 1. **Start Small, Scale Gradually**: Begin with minimal resources and scale based on actual usage 2. **Monitor Everything**: Use comprehensive monitoring to identify bottlenecks 3. **Cache Aggressively**: Implement multi-level caching for frequently accessed data 4. **Optimize Hot Paths**: Focus optimization efforts on the most frequently used code paths 5. **Use Async Operations**: Leverage async/await for I/O-bound operations 6. **Batch Processing**: Process data in batches to reduce overhead 7. **Profile Regularly**: Regular profiling helps identify new bottlenecks 8. **Document Changes**: Keep track of performance improvements and their impact ## Troubleshooting Performance Issues ### Common Issues and Solutions | Issue | Symptoms | Solution | | ----------------- | --------------------- | --------------------------------------- | | Slow Queries | Response time >1s | Add indexes, optimize query patterns | | High Memory Usage | OOM errors | Implement streaming, reduce batch sizes | | CPU Bottlenecks | 100% CPU usage | Scale horizontally, optimize algorithms | | Cache Misses | Repeated computations | Increase cache size, adjust TTL | | Network Latency | Slow API responses | Use connection pooling, CDN | | Disk I/O | Slow file operations | Use SSD, implement caching | ### Performance Debugging Commands ```bash # Monitor system resources htop iotop nethogs # Database performance pg_stat_activity EXPLAIN ANALYZE # Application profiling py-spy top --pid memory_profiler run script.py # Network analysis tcpdump -i any -w trace.pcap wireshark trace.pcap ``` ## Conclusion Optimizing deepwikiopen performance requires a holistic approach covering infrastructure, application code, and operational practices. Regular monitoring, profiling, and iterative improvements are key to maintaining optimal performance as your codebase and user base grow. # Security Guidelines Source: https://asyncfunc.mintlify.app/reference/security Comprehensive security guidelines for deploying and operating DeepWiki-Open safely # Security Guidelines This document provides comprehensive security guidelines for deploying and operating DeepWiki-Open in production environments. Follow these best practices to ensure your deployment is secure and protects sensitive data. ## Overview DeepWiki-Open processes source code repositories and requires access to various APIs and services. This guide covers all security aspects from API key management to network security and vulnerability handling. ## API Key and Token Management ### Secure Storage Never hardcode API keys in your source code. Always use environment variables: ```bash # .env file (never commit this to version control) GOOGLE_API_KEY=your_google_api_key OPENAI_API_KEY=your_openai_api_key OPENROUTER_API_KEY=your_openrouter_api_key AZURE_OPENAI_API_KEY=your_azure_api_key ``` Add `.env` to your `.gitignore` file immediately after creation For production deployments, use dedicated secret management: ```bash # AWS Secrets Manager aws secretsmanager create-secret \ --name deepwiki/api-keys \ --secret-string file://.env # Kubernetes Secrets kubectl create secret generic deepwiki-secrets \ --from-env-file=.env # HashiCorp Vault vault kv put secret/deepwiki/api-keys @.env ``` Regularly rotate API keys and implement automated rotation: ```python # Example rotation script import os import time from datetime import datetime, timedelta def rotate_api_keys(): # Check key age key_age = datetime.now() - datetime.fromtimestamp( os.path.getmtime('.env') ) if key_age > timedelta(days=90): # Trigger rotation process print("API keys need rotation") ``` ### Access Token Security ```bash # Minimum required permissions for GitHub tokens: - repo (for private repositories) - read:org (for organization repositories) - read:user (for user information) # Create fine-grained personal access tokens when possible # Set expiration dates (maximum 90 days recommended) ``` ```bash # Required scopes for GitLab tokens: - read_api (for API access) - read_repository (for repository access) - read_user (for user information) # Use project or group tokens instead of personal tokens # Enable token expiration ``` ```bash # Required permissions for Bitbucket: - repository:read - account:read - team:read (for team repositories) # Use app passwords with minimal scopes # Regularly audit token usage ``` ## Authentication and Authorization ### Wiki Authentication DeepWiki-Open supports optional authentication for the wiki interface: ```python # Enable authentication in environment variables DEEPWIKI_AUTH_MODE=true DEEPWIKI_AUTH_CODE=your_secure_auth_code # The auth code should be: # - At least 20 characters long # - Randomly generated # - Changed regularly ``` ```python title="Generate Secure Auth Code" import secrets import string def generate_auth_code(length=32): alphabet = string.ascii_letters + string.digits + string.punctuation return ''.join(secrets.choice(alphabet) for _ in range(length)) # Generate a secure code auth_code = generate_auth_code() print(f"DEEPWIKI_AUTH_CODE={auth_code}") ``` ```bash title="Using OpenSSL" # Generate with OpenSSL openssl rand -base64 32 # Generate with /dev/urandom head -c 32 /dev/urandom | base64 ``` ### Role-Based Access Control (RBAC) Implement RBAC for multi-user deployments: ```yaml # Example RBAC configuration roles: admin: permissions: - wiki:create - wiki:read - wiki:update - wiki:delete - config:modify developer: permissions: - wiki:create - wiki:read - wiki:update viewer: permissions: - wiki:read ``` ## Data Privacy and Protection ### Repository Data Handling DeepWiki-Open collects the following data from repositories: * Source code files (filtered by configuration) * Directory structure * File metadata (size, last modified) * Git history (if enabled) **Never collected:** * Credentials or secrets * Binary files * Files matching exclusion patterns Repository data is stored in: * Local cache directory (`./cache/`) * Embedded vector database * Session memory (cleared on restart) **Security measures:** * Cache files are created with restricted permissions (0600) * Temporary files are securely deleted after processing * No permanent storage of sensitive data All data transmission should be encrypted: * Use HTTPS for all API communications * Enable TLS for database connections * Encrypt WebSocket connections ```nginx # Nginx SSL configuration server { listen 443 ssl http2; ssl_certificate /path/to/cert.pem; ssl_certificate_key /path/to/key.pem; ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers HIGH:!aNULL:!MD5; } ``` ### GDPR Compliance For GDPR compliance, implement: 1. **Data Minimization**: Only process necessary files 2. **Right to Erasure**: Provide cache clearing endpoints 3. **Data Portability**: Export processed wiki data 4. **Privacy by Design**: Default to secure configurations ```python # Example GDPR compliance endpoints @app.delete("/api/user-data/{user_id}") async def delete_user_data(user_id: str): """Implement right to erasure""" # Clear user's cached data # Remove from vector database # Delete processing history @app.get("/api/user-data/{user_id}/export") async def export_user_data(user_id: str): """Implement data portability""" # Export all user-related data ``` ## Network Security ### Firewall Configuration ```bash title="UFW (Ubuntu)" # Allow only necessary ports sudo ufw default deny incoming sudo ufw default allow outgoing # Frontend sudo ufw allow 3000/tcp # Backend API sudo ufw allow 8001/tcp # SSH (if needed) sudo ufw allow 22/tcp # Enable firewall sudo ufw enable ``` ```bash title="iptables" # Basic iptables rules iptables -A INPUT -i lo -j ACCEPT iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT iptables -A INPUT -p tcp --dport 3000 -j ACCEPT iptables -A INPUT -p tcp --dport 8001 -j ACCEPT iptables -A INPUT -j DROP ``` ```bash title="Docker Compose" # Restrict container ports services: frontend: ports: - "127.0.0.1:3000:3000" # Local only backend: ports: - "127.0.0.1:8001:8001" # Local only ``` ### Reverse Proxy Security Use a reverse proxy for additional security: ```nginx # Nginx security headers server { # Security headers add_header X-Frame-Options "SAMEORIGIN" always; add_header X-Content-Type-Options "nosniff" always; add_header X-XSS-Protection "1; mode=block" always; add_header Referrer-Policy "strict-origin-when-cross-origin" always; add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline';" always; # Rate limiting limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s; limit_req zone=api burst=20 nodelay; # Proxy settings location / { proxy_pass http://localhost:3000; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $host; } } ``` ## Private Repository Security ### Access Control Private repository access requires careful security consideration Create tokens with only necessary permissions: * Read-only access * Specific repository scope * Time-limited tokens ```python def validate_repository_access(token: str, repo: str) -> bool: """Validate token has access to repository""" try: # Attempt to access repository response = requests.get( f"https://api.github.com/repos/{repo}", headers={"Authorization": f"token {token}"} ) return response.status_code == 200 except: return False ``` Maintain detailed logs of private repository access: ```python import logging from datetime import datetime def log_repository_access(user: str, repo: str, action: str): logging.info({ "timestamp": datetime.utcnow().isoformat(), "user": user, "repository": repo, "action": action, "ip_address": request.remote_addr }) ``` ### Data Isolation Ensure private repository data is isolated: ```python # Example data isolation class RepositoryCache: def get_cache_path(self, repo: str, user: str) -> Path: """Generate isolated cache path per user/repo""" # Hash user ID to prevent directory traversal user_hash = hashlib.sha256(user.encode()).hexdigest()[:12] repo_hash = hashlib.sha256(repo.encode()).hexdigest()[:12] return Path(f"./cache/{user_hash}/{repo_hash}/") ``` ## Vulnerability Management ### Security Scanning ```bash # Python dependencies pip install safety safety check # JavaScript dependencies npm audit npm audit fix # Docker images docker scan deepwiki-open:latest ``` ```bash # Python security analysis pip install bandit bandit -r api/ # JavaScript security analysis npm install -g eslint-plugin-security eslint --ext .js,.jsx,.ts,.tsx src/ # SAST scanning semgrep --config=auto . ``` ```dockerfile # Use specific versions, not latest FROM python:3.11-slim # Run as non-root user RUN useradd -m -u 1000 deepwiki USER deepwiki # Use multi-stage builds FROM node:18-alpine AS builder # Build stage... FROM node:18-alpine # Copy only necessary files ``` ### Vulnerability Reporting Found a security vulnerability? Please report it responsibly. #### Reporting Process 1. **Do NOT** create public GitHub issues for security vulnerabilities 2. Email security details to: [security@deepwiki-open.org](mailto:security@deepwiki-open.org) 3. Include: * Vulnerability description * Steps to reproduce * Potential impact * Suggested fixes (if any) #### Response Timeline * **24 hours**: Initial acknowledgment * **72 hours**: Vulnerability assessment * **7 days**: Fix development and testing * **14 days**: Patch release and disclosure ### Security Headers Checklist Ensure all security headers are properly configured: Strict-Transport-Security: max-age=31536000; includeSubDomains X-Content-Type-Options: nosniff X-Frame-Options: SAMEORIGIN X-XSS-Protection: 1; mode=block Content-Security-Policy: default-src 'self' Referrer-Policy: strict-origin-when-cross-origin ## Security Updates and Maintenance ### Update Schedule * Monitor security advisories * Review access logs * Check for anomalies * Run dependency updates * Perform security scans * Review authentication logs * Rotate API keys * Update base images * Security audit * Penetration testing * Full security review * Update security policies ### Automated Security Updates ```yaml # GitHub Actions for automated updates name: Security Updates on: schedule: - cron: '0 0 * * 1' # Weekly on Monday jobs: update-dependencies: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Update Python dependencies run: | pip install pip-audit pip-audit --fix - name: Update Node dependencies run: | npm audit fix npm update - name: Create Pull Request uses: peter-evans/create-pull-request@v5 with: title: Automated Security Updates body: Automated dependency updates branch: automated-security-updates ``` ## Security Checklist Before deploying to production, ensure: All API keys are stored in environment variables or secret management systems HTTPS is enabled for all endpoints Authentication is configured for sensitive operations Rate limiting is implemented Security headers are properly configured Logging and monitoring are enabled Regular backups are configured Incident response plan is documented Security scanning is automated Access controls are properly configured ## Additional Resources * [OWASP Security Guidelines](https://owasp.org/) * [CIS Security Benchmarks](https://www.cisecurity.org/) * [NIST Cybersecurity Framework](https://www.nist.gov/cyberframework) * [DeepWiki Security Updates](https://github.com/AsyncFuncAI/deepwiki-open/security) # Troubleshooting Guide Source: https://asyncfunc.mintlify.app/reference/troubleshooting Comprehensive troubleshooting guide for common DeepWiki-Open issues, errors, and performance problems This comprehensive guide covers common issues, their causes, and step-by-step solutions for DeepWiki-Open deployment and usage. ## Quick Diagnostic Checklist Before diving into specific issues, run through this quick diagnostic checklist: ```bash # 1. Backend API health curl http://localhost:8001/health # 2. Frontend accessibility curl http://localhost:3000 # 3. Environment variables loaded python -c "import os; print('Google:', bool(os.getenv('GOOGLE_API_KEY'))); print('OpenAI:', bool(os.getenv('OPENAI_API_KEY')))" # 4. Process status ps aux | grep -E "(python.*api|node.*next)" ``` All services running and responding correctly ```bash # Backend logs tail -50 ./api/logs/application.log # Docker logs (if using Docker) docker-compose logs --tail=50 # System logs journalctl -u deepwiki --since "1 hour ago" ``` Look for error patterns, failed requests, or configuration warnings ```bash # Test simple wiki generation curl -X POST "http://localhost:8001/wiki/generate" \ -H "Content-Type: application/json" \ -d '{ "repo_url": "https://github.com/octocat/Hello-World", "model_provider": "google" }' ``` Basic functionality working as expected ## Installation Issues ### Python Installation Problems **Symptoms:** * `python: command not found` * `ImportError: No module named 'fastapi'` * Version mismatch errors **Diagnosis:** ```bash # Check Python version and location python --version python3 --version which python which python3 # Check virtual environment echo $VIRTUAL_ENV pip list | head -10 ``` **Solutions:** ```bash # Create alias (temporary) alias python=python3 alias pip=pip3 # Or use python3 directly python3 -m venv venv source venv/bin/activate python3 -m pip install -r api/requirements.txt ``` ```bash # Remove existing virtual environment rm -rf venv # Create new virtual environment with explicit Python version python3.10 -m venv venv # Or virtualenv -p python3.10 venv # Activate and install source venv/bin/activate pip install --upgrade pip pip install -r api/requirements.txt ``` ```bash # Install pyenv (macOS) brew install pyenv # Install specific Python version pyenv install 3.10.9 pyenv local 3.10.9 # Create virtual environment python -m venv venv source venv/bin/activate pip install -r api/requirements.txt ``` **Symptoms:** * `error: Microsoft Visual C++ 14.0 is required` (Windows) * `Failed building wheel for numpy` * Permission denied errors **Solutions:** ```powershell # Install Visual Studio Build Tools # Download from: https://visualstudio.microsoft.com/visual-cpp-build-tools/ # Or use conda instead of pip conda create -n deepwiki python=3.10 conda activate deepwiki conda install fastapi uvicorn numpy # Install remaining packages with pip pip install -r api/requirements.txt ``` ```bash # Install Xcode command line tools xcode-select --install # Install system dependencies via Homebrew brew install python@3.10 # Use system Python with virtual environment python3 -m venv venv source venv/bin/activate pip install --upgrade pip setuptools wheel pip install -r api/requirements.txt ``` ```bash # Install system dependencies (Ubuntu/Debian) sudo apt update sudo apt install python3-dev python3-pip python3-venv sudo apt install build-essential libssl-dev libffi-dev # Create virtual environment python3 -m venv venv source venv/bin/activate pip install --upgrade pip pip install -r api/requirements.txt ``` ### Node.js and Frontend Issues **Symptoms:** * `npm ERR! unsupported engine` * React/Next.js compatibility errors * Build failures **Diagnosis:** ```bash # Check Node.js and npm versions node --version npm --version # Check package.json requirements grep -A 5 -B 5 "engines" package.json # Check installed packages npm list --depth=0 ``` **Solutions:** ```bash # Using Node Version Manager (recommended) curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash source ~/.bashrc nvm install 18 nvm use 18 # Verify version node --version # Should show v18.x.x # Reinstall dependencies rm -rf node_modules package-lock.json npm install ``` ```bash # Install Yarn npm install -g yarn # Clear npm cache and use Yarn rm -rf node_modules package-lock.json yarn install yarn dev ``` ```bash # Use Node.js in Docker to avoid version conflicts docker run -it --rm \ -v $(pwd):/app \ -w /app \ -p 3000:3000 \ node:18-alpine \ sh -c "npm install && npm run dev" ``` **Symptoms:** * `npm ERR! EACCES: permission denied` * `cannot run in wd` errors * Global package installation failures **Solutions:** ```bash # Set npm global directory to user folder mkdir ~/.npm-global npm config set prefix '~/.npm-global' # Add to PATH (add to ~/.bashrc or ~/.zshrc) export PATH=~/.npm-global/bin:$PATH source ~/.bashrc # Reinstall packages npm install -g npm@latest ``` ```bash # Install nvm (handles permissions automatically) curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash source ~/.bashrc # Install and use Node.js nvm install 18 nvm use 18 nvm alias default 18 # Now npm will work without sudo npm install -g npm@latest ``` ## API and Service Issues ### API Connection Problems **Symptoms:** * `Connection refused` errors * No response from `localhost:8001` * FastAPI startup failures **Diagnosis:** ```bash # Check if process is running ps aux | grep python | grep api # Check port availability netstat -tulpn | grep 8001 # Or lsof -i :8001 # Check API logs tail -f ./api/logs/application.log # Test direct Python execution cd api && python main.py ``` **Solutions:** ```bash # Kill process using port 8001 sudo lsof -ti:8001 | xargs sudo kill -9 # Or use different port export PORT=8002 python -m api.main # Update frontend configuration # In .env or next.config.js: # SERVER_BASE_URL=http://localhost:8002 ``` ```bash # Ensure virtual environment is activated source venv/bin/activate # Check if all dependencies are installed pip check # Reinstall requirements pip install --force-reinstall -r api/requirements.txt # Start with verbose logging LOG_LEVEL=DEBUG python -m api.main ``` ```bash # Check environment variables python -c " import os from dotenv import load_dotenv load_dotenv() print('GOOGLE_API_KEY:', 'SET' if os.getenv('GOOGLE_API_KEY') else 'NOT SET') print('OPENAI_API_KEY:', 'SET' if os.getenv('OPENAI_API_KEY') else 'NOT SET') print('PORT:', os.getenv('PORT', '8001')) " # Verify .env file exists and is readable ls -la .env cat .env | grep -v "API_KEY" # Don't show actual keys ``` **Symptoms:** * Frontend loads but can't connect to API * CORS errors in browser console * Network request failures **Diagnosis:** ```bash # Check if frontend is running curl -I http://localhost:3000 # Check browser console for errors # Open Developer Tools → Console # Test API from command line curl -v http://localhost:8001/health # Check frontend configuration grep -r "localhost:8001" src/ grep -r "SERVER_BASE_URL" . ``` **Solutions:** ```python # In api/api.py, ensure CORS is configured: app.add_middleware( CORSMiddleware, allow_origins=["*"], # In production, specify exact origins allow_credentials=True, allow_methods=["*"], allow_headers=["*"], ) ``` ```javascript // In next.config.js or environment const nextConfig = { env: { API_BASE_URL: process.env.SERVER_BASE_URL || 'http://localhost:8001' } } // Or in .env.local SERVER_BASE_URL=http://localhost:8001 ``` ```bash # Test network connectivity curl -v http://localhost:8001/health # Check firewall settings sudo ufw status # Linux # Or check Windows Firewall/macOS Firewall # Test from different terminal/shell telnet localhost 8001 ``` ### AI Provider Issues **Symptoms:** * "Invalid API key" errors * Authentication failures * Provider-specific error messages **Diagnosis:** ```bash # Test API keys directly curl -H "Authorization: Bearer $OPENAI_API_KEY" \ https://api.openai.com/v1/models curl -H "Authorization: Bearer $GOOGLE_API_KEY" \ "https://generativelanguage.googleapis.com/v1/models" # Check API key format echo $GOOGLE_API_KEY | grep -E "^AIza[0-9A-Za-z-_]{35}$" echo $OPENAI_API_KEY | grep -E "^sk-[0-9A-Za-z]{48}$" ``` **Solutions:** ```bash # Verify key is active curl "https://generativelanguage.googleapis.com/v1/models?key=$GOOGLE_API_KEY" # Common issues: # 1. Key not enabled for Generative AI API # 2. Billing not enabled # 3. Key restrictions (IP/referrer) # Fix: Go to Google Cloud Console # Enable Generative AI API # Check API key restrictions ``` ```bash # Test key validity curl https://api.openai.com/v1/models \ -H "Authorization: Bearer $OPENAI_API_KEY" # Common issues: # 1. Account needs billing setup # 2. Key expired or revoked # 3. Rate limits exceeded # Fix: Check OpenAI dashboard # Add payment method # Generate new key if needed ``` ```bash # Ensure .env is being loaded python -c " from dotenv import load_dotenv import os load_dotenv() print('.env file loaded') print('Keys found:', [k for k in os.environ if 'API_KEY' in k]) " # Check file permissions ls -la .env # Verify no BOM or special characters file .env hexdump -C .env | head -5 ``` **Symptoms:** * "Too many requests" errors * Slow or hanging generation * HTTP 429 responses **Diagnosis:** ```bash # Check current usage curl -s http://localhost:8001/auth/status | jq '.providers' # Monitor API logs for rate limit messages tail -f ./api/logs/application.log | grep -i "rate" # Test with small repository first curl -X POST "http://localhost:8001/wiki/generate" \ -d '{"repo_url": "https://github.com/octocat/Hello-World"}' ``` **Solutions:** ```python # In api configuration RATE_LIMIT_CONFIG = { "google": { "requests_per_minute": 12, # Under 15 limit "retry_delay": 5, "backoff_factor": 2 }, "openai": { "requests_per_minute": 45, # Under 50 limit "retry_delay": 10 } } ``` ```javascript { "provider_fallback": { "primary": "google", "fallback_on_rate_limit": "openai", "fallback_delay": 60 } } ``` ```python # Reduce token usage MAX_INPUT_TOKENS = 50000 # Reduce from default BATCH_SIZE = 5 # Process fewer files at once RETRY_ATTEMPTS = 3 # Reduce retry attempts ``` ## Wiki Generation Issues ### Generation Failures **Symptoms:** * "Repository not found" errors * Private repository access denied * Authentication failures for repositories **Diagnosis:** ```bash # Test repository access manually git clone https://github.com/user/repo.git test-clone rm -rf test-clone # For private repos, test with token git clone https://token@github.com/user/private-repo.git # Check repository URL format echo "https://github.com/microsoft/vscode" | \ grep -E "^https://(github|gitlab|bitbucket)\.(com|org)/.+/.+$" ``` **Solutions:** ```bash # Verify repository exists and is public curl -s https://api.github.com/repos/microsoft/vscode # Check for typos in URL # Correct: https://github.com/microsoft/vscode # Wrong: https://github.com/Microsoft/VSCode ``` ```bash # GitHub personal access token # 1. Go to Settings → Developer settings → Personal access tokens # 2. Generate token with 'repo' scope # 3. Use in request: curl -X POST "http://localhost:8001/wiki/generate" \ -d '{ "repo_url": "https://github.com/company/private-repo", "access_token": "ghp_xxxxxxxxxxxxxxxxxxxx", "model_provider": "google" }' ``` ```bash # GitLab personal access token # Scope: read_repository # BitBucket app password # Permission: Repositories: Read # Test access curl -H "Authorization: Bearer $ACCESS_TOKEN" \ https://api.gitlab.com/v4/projects/owner%2Frepo ``` **Symptoms:** * Generation hangs indefinitely * Timeout errors after long wait * Partial generation results **Diagnosis:** ```bash # Check generation progress curl -s http://localhost:8001/wiki/projects | jq '.[].status' # Monitor resource usage during generation top -p $(pgrep -f "python.*api") free -h df -h # Check for specific timeout errors grep -i timeout ./api/logs/application.log ``` **Solutions:** ```python # In API configuration GENERATION_TIMEOUT = 1800 # 30 minutes REQUEST_TIMEOUT = 300 # 5 minutes per API call CLONE_TIMEOUT = 600 # 10 minutes for repository clone ``` ```javascript { "optimization": { "skip_large_files": true, // Skip files > 1MB "ignore_patterns": [ "*.min.js", "node_modules/**", "vendor/**", "build/**", "dist/**" ], "max_files_per_batch": 10, // Process fewer files "enable_caching": true // Cache intermediate results } } ``` ```bash # Monitor during generation watch -n 5 'ps aux | grep python | grep -v grep' watch -n 5 'free -h && df -h' # Set system limits ulimit -v 8388608 # Limit to 8GB RAM ulimit -t 1800 # 30 minute CPU limit ``` ### Quality Issues **Symptoms:** * Generic, unhelpful descriptions * Missing technical details * Incorrect architecture analysis * Nonsensical content **Diagnosis:** ```bash # Check repository quality indicators cd /path/to/cloned/repo find . -name "README*" -o -name "*.md" | head -10 grep -r "class\|function\|def\|interface" . | wc -l ls -la package.json setup.py requirements.txt 2>/dev/null ``` **Solutions:** ```bash # Enhance repository before generation # 1. Update README.md with current info # 2. Add code comments to complex functions # 3. Include configuration examples # 4. Add API documentation files # Example improvements: echo "# Updated Project Description" >> README.md echo "## Architecture Overview" >> README.md echo "## API Endpoints" >> README.md ``` ```javascript { "model_upgrade": { "from": "gemini-1.5-flash", "to": "gpt-4o", "reason": "Need higher quality analysis for complex codebase" } } ``` ```javascript { "generation_config": { "deep_research": true, "analysis_depth": "detailed", "include_examples": true, "cross_reference": true } } ``` **Symptoms:** * Missing pages or sections * Truncated content * Empty or broken diagrams **Diagnosis:** ```bash # Check generation logs for truncation grep -i "truncated\|incomplete\|error" ./api/logs/application.log # Verify file processing curl -s http://localhost:8001/wiki/projects | \ jq '.[] | select(.repo_url | contains("your-repo")) | .metadata' # Check token limits grep -i "token.*limit" ./api/logs/application.log ``` **Solutions:** ```python # Adjust token limits MAX_INPUT_TOKENS = 100000 # Increase context MAX_OUTPUT_TOKENS = 8192 # Increase response length CHUNK_OVERLAP = 0.2 # More overlap between chunks ``` ```bash # Generate in stages # 1. Basic structure first curl -X POST "http://localhost:8001/wiki/generate" \ -d '{"repo_url": "...", "focus": "structure"}' # 2. Detailed content second curl -X POST "http://localhost:8001/wiki/generate" \ -d '{"repo_url": "...", "focus": "content", "force_regenerate": false}' ``` ```bash # Use Ask feature to complete missing sections curl -X POST "http://localhost:8001/chat/stream" \ -d '{ "message": "Generate detailed documentation for the authentication module", "repo_url": "https://github.com/user/repo", "deep_research": true }' ``` ## Performance Issues ### Slow Generation Speed **Diagnosis:** ```bash # Profile generation time time curl -X POST "http://localhost:8001/wiki/generate" \ -d '{"repo_url": "https://github.com/small/repo"}' # Check bottlenecks grep -E "took|duration|time" ./api/logs/application.log | tail -10 # Monitor resource usage iostat -x 1 5 # Disk I/O top -p $(pgrep python) # CPU/Memory ``` **Solutions:** ```javascript { "speed_optimization": { "fast_model": "gemini-2.0-flash", // Fastest "balanced": "gpt-4o-mini", // Fast + good quality "quality": "gpt-4o" // Slow but best quality } } ``` ```python # Optimize file processing CONCURRENT_REQUESTS = 3 # Parallel API calls BATCH_SIZE = 5 # Files per batch CACHE_EMBEDDINGS = True # Reuse embeddings SKIP_BINARY_FILES = True # Skip non-text files ``` ```bash # Use faster storage sudo mount -t tmpfs -o size=2G tmpfs /tmp/deepwiki-cache # Increase worker processes export WORKERS=4 gunicorn -w 4 -k uvicorn.workers.UvicornWorker api.api:app # Use faster DNS echo "nameserver 1.1.1.1" >> /etc/resolv.conf ``` **Symptoms:** * Out of memory errors * System becoming unresponsive * Process killed by OS **Solutions:** ```bash # Monitor memory usage watch -n 2 'free -h && echo "---" && ps aux --sort=-%mem | head -10' # Set memory limits ulimit -m 8000000 # 8GB limit export MALLOC_ARENA_MAX=2 # Configure Python garbage collection export PYTHONOPTIMIZE=1 export PYTHONUNBUFFERED=1 # Use memory-efficient models # Prefer: gemini-flash, gpt-4o-mini # Avoid: large local models if RAM limited ``` ## Docker Issues ### Container Problems **Symptoms:** * Containers fail to start * Service communication failures * Volume mount issues **Diagnosis:** ```bash # Check container status docker-compose ps # View logs docker-compose logs api docker-compose logs frontend # Check network connectivity docker-compose exec api curl http://localhost:8001/health docker-compose exec frontend curl http://api:8001/health ``` **Solutions:** ```yaml # In docker-compose.yml services: api: environment: - PORT=8001 frontend: environment: - SERVER_BASE_URL=http://api:8001 # Use service name ``` ```bash # Check volume permissions docker-compose exec api ls -la /app docker-compose exec api ls -la /app/.env # Fix permissions sudo chown -R 1000:1000 ./api/logs chmod 644 .env ``` ```yaml # In docker-compose.yml services: api: deploy: resources: limits: memory: 4G cpus: '2' reservations: memory: 2G ``` **Solutions:** ```bash # Clear build cache docker system prune -f docker-compose build --no-cache # Check Dockerfile syntax docker build --dry-run -f Dockerfile . # Build with verbose output docker-compose build --progress=plain # Use multi-stage build for debugging docker build --target=development -t deepwiki-debug . docker run -it deepwiki-debug /bin/bash ``` ## Advanced Troubleshooting ### Debug Mode ```env LOG_LEVEL=DEBUG LOG_FILE_PATH=./api/logs/debug.log ``` ```bash # Start API with debug output DEBUG=1 PYTHONPATH=. python -m api.main # Or with uvicorn directly uvicorn api.api:app --host 0.0.0.0 --port 8001 --log-level debug ``` ```bash # Next.js debug mode DEBUG=* npm run dev # Or with detailed logging NODE_OPTIONS='--inspect' npm run dev ``` ### Performance Profiling ```python Python Profiling import cProfile import pstats # Profile wiki generation pr = cProfile.Profile() pr.enable() # Your generation code here generate_wiki(repo_url, model_provider) pr.disable() stats = pstats.Stats(pr) stats.sort_stats('cumulative').print_stats(20) ``` ```bash System Monitoring # Continuous monitoring script #!/bin/bash while true; do echo "=== $(date) ===" echo "Memory:" free -h echo "CPU:" top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1 echo "Disk I/O:" iostat -x 1 1 | grep -E "Device|sda" echo "Network:" ss -tuln | grep -E ":8001|:3000" echo "---" sleep 30 done ``` ## Getting Help ### Collect Debug Information Before seeking help, collect this information: ```bash # Create debug report cat > debug_report.txt << EOF === System Information === OS: $(uname -a) Python: $(python --version) Node: $(node --version) Docker: $(docker --version) === Environment === $(env | grep -E "DEEPWIKI|GOOGLE|OPENAI" | sed 's/=.*$/=***/') === Process Status === $(ps aux | grep -E "python.*api|node.*next") === Port Status === $(netstat -tulpn | grep -E ":8001|:3000") === Recent Logs === $(tail -50 ./api/logs/application.log) === Error Messages === $(grep -i error ./api/logs/application.log | tail -20) EOF ``` Create a minimal example that reproduces the issue: ```bash # Test with small, public repository curl -X POST "http://localhost:8001/wiki/generate" \ -H "Content-Type: application/json" \ -d '{ "repo_url": "https://github.com/octocat/Hello-World", "model_provider": "google" }' ``` ### Community Resources Report bugs and request features Get community support and discuss issues Review comprehensive setup and usage guides Check frequently asked questions When reporting issues, please include: * Your debug report * Steps to reproduce the issue * Expected vs actual behavior * Screenshots or log excerpts * Your DeepWiki version and environment ## Next Steps Optimize DeepWiki for better performance Secure your DeepWiki deployment Deploy DeepWiki in production environments Set up monitoring and alerting