Documentation Index
Fetch the complete documentation index at: https://asyncfunc.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Using Custom Models with DeepWiki
DeepWiki supports a wide range of AI models through various providers. This guide covers how to configure and use custom models for optimal performance and cost efficiency.
Overview
DeepWiki’s flexible architecture allows you to use models from:
- OpenRouter (access to 100+ models)
- Ollama (local models)
- Azure OpenAI
- Any OpenAI-compatible endpoint
- Custom API endpoints
OpenRouter Integration
OpenRouter provides access to multiple model providers through a single API.
Configuration
// generator.json
{
"provider": "openrouter",
"apiKey": "YOUR_OPENROUTER_API_KEY",
"model": "anthropic/claude-3-opus",
"baseURL": "https://openrouter.ai/api/v1",
"headers": {
"HTTP-Referer": "https://yourapp.com",
"X-Title": "DeepWiki"
}
}
Available Models
Popular models on OpenRouter:
anthropic/claude-3-opus - Best for complex reasoning
anthropic/claude-3-sonnet - Balanced performance/cost
openai/gpt-4-turbo - Latest GPT-4 variant
google/gemini-pro - Google’s latest model
meta-llama/llama-3-70b - Open source alternative
Usage Example
// app/lib/ai/generator.ts
import { OpenRouter } from '@openrouter/sdk';
const client = new OpenRouter({
apiKey: process.env.OPENROUTER_API_KEY,
defaultHeaders: {
'HTTP-Referer': process.env.APP_URL,
'X-Title': 'DeepWiki'
}
});
export async function generateContent(prompt: string) {
const response = await client.chat.completions.create({
model: 'anthropic/claude-3-opus',
messages: [{ role: 'user', content: prompt }],
temperature: 0.7,
max_tokens: 4000
});
return response.choices[0].message.content;
}
Ollama for Local Models
Run models locally for privacy and zero API costs.
Installation
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull models
ollama pull llama3
ollama pull mistral
ollama pull codellama
Configuration
// generator.json
{
"provider": "ollama",
"baseURL": "http://localhost:11434",
"model": "llama3:70b",
"options": {
"temperature": 0.7,
"num_predict": 4096
}
}
Integration
// app/lib/ai/ollama-provider.ts
export class OllamaProvider {
private baseURL: string;
constructor(baseURL = 'http://localhost:11434') {
this.baseURL = baseURL;
}
async generate(prompt: string, model = 'llama3') {
const response = await fetch(`${this.baseURL}/api/generate`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model,
prompt,
stream: false,
options: {
temperature: 0.7,
num_predict: 4096
}
})
});
const data = await response.json();
return data.response;
}
}
Azure OpenAI Configuration
Use Azure’s enterprise-grade OpenAI deployment.
Setup
// generator.json
{
"provider": "azure-openai",
"apiKey": "YOUR_AZURE_API_KEY",
"baseURL": "https://YOUR_RESOURCE.openai.azure.com",
"apiVersion": "2024-02-15-preview",
"deployment": "gpt-4-turbo",
"model": "gpt-4-turbo"
}
Environment Variables
# .env.local
AZURE_OPENAI_API_KEY=your_key_here
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_DEPLOYMENT=gpt-4-turbo
AZURE_OPENAI_API_VERSION=2024-02-15-preview
Implementation
// app/lib/ai/azure-provider.ts
import { AzureOpenAI } from '@azure/openai';
const client = new AzureOpenAI({
apiKey: process.env.AZURE_OPENAI_API_KEY,
endpoint: process.env.AZURE_OPENAI_ENDPOINT,
apiVersion: process.env.AZURE_OPENAI_API_VERSION
});
export async function generateWithAzure(prompt: string) {
const result = await client.getChatCompletions(
process.env.AZURE_OPENAI_DEPLOYMENT,
[{ role: 'user', content: prompt }],
{
temperature: 0.7,
maxTokens: 4000
}
);
return result.choices[0].message?.content;
}
Custom Model Selection UI
Implement a model selector in your DeepWiki interface.
Model Selector Component
// app/components/model-selector.tsx
import { useState } from 'react';
import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from '@/components/ui/select';
const AVAILABLE_MODELS = [
{ id: 'gpt-4-turbo', name: 'GPT-4 Turbo', provider: 'openai' },
{ id: 'claude-3-opus', name: 'Claude 3 Opus', provider: 'anthropic' },
{ id: 'llama3:70b', name: 'Llama 3 70B', provider: 'ollama' },
{ id: 'mistral-large', name: 'Mistral Large', provider: 'mistral' }
];
export function ModelSelector({ onModelChange }: { onModelChange: (model: string) => void }) {
const [selectedModel, setSelectedModel] = useState('gpt-4-turbo');
const handleChange = (value: string) => {
setSelectedModel(value);
onModelChange(value);
};
return (
<Select value={selectedModel} onValueChange={handleChange}>
<SelectTrigger className="w-[200px]">
<SelectValue placeholder="Select a model" />
</SelectTrigger>
<SelectContent>
{AVAILABLE_MODELS.map((model) => (
<SelectItem key={model.id} value={model.id}>
<div className="flex flex-col">
<span>{model.name}</span>
<span className="text-xs text-muted-foreground">{model.provider}</span>
</div>
</SelectItem>
))}
</SelectContent>
</Select>
);
}
Dynamic Model Configuration
// app/lib/ai/model-config.ts
export interface ModelConfig {
provider: string;
model: string;
apiKey?: string;
baseURL?: string;
temperature?: number;
maxTokens?: number;
}
export const MODEL_CONFIGS: Record<string, ModelConfig> = {
'gpt-4-turbo': {
provider: 'openai',
model: 'gpt-4-turbo-preview',
temperature: 0.7,
maxTokens: 4000
},
'claude-3-opus': {
provider: 'openrouter',
model: 'anthropic/claude-3-opus',
baseURL: 'https://openrouter.ai/api/v1',
temperature: 0.7,
maxTokens: 4000
},
'llama3:70b': {
provider: 'ollama',
model: 'llama3:70b',
baseURL: 'http://localhost:11434',
temperature: 0.8,
maxTokens: 4096
}
};
Modifying generator.json
The generator.json file controls model configuration.
Basic Structure
{
"provider": "openai",
"model": "gpt-4-turbo",
"apiKey": "${OPENAI_API_KEY}",
"temperature": 0.7,
"maxTokens": 4000,
"systemPrompt": "You are a helpful wiki content generator...",
"retryAttempts": 3,
"retryDelay": 1000
}
Multi-Provider Configuration
{
"providers": {
"primary": {
"provider": "openai",
"model": "gpt-4-turbo",
"apiKey": "${OPENAI_API_KEY}"
},
"fallback": {
"provider": "openrouter",
"model": "meta-llama/llama-3-70b",
"apiKey": "${OPENROUTER_API_KEY}",
"baseURL": "https://openrouter.ai/api/v1"
},
"local": {
"provider": "ollama",
"model": "llama3",
"baseURL": "http://localhost:11434"
}
},
"strategy": "fallback",
"timeout": 30000
}
OpenAI-Compatible Endpoints
Many providers offer OpenAI-compatible APIs.
Generic Configuration
// app/lib/ai/openai-compatible.ts
export class OpenAICompatibleProvider {
private apiKey: string;
private baseURL: string;
constructor(config: { apiKey: string; baseURL: string }) {
this.apiKey = config.apiKey;
this.baseURL = config.baseURL;
}
async chat(messages: any[], options: any = {}) {
const response = await fetch(`${this.baseURL}/v1/chat/completions`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${this.apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
messages,
...options
})
});
return response.json();
}
}
Supported Providers
- Perplexity AI:
https://api.perplexity.ai
- Together AI:
https://api.together.xyz/v1
- Anyscale:
https://api.endpoints.anyscale.com/v1
- Groq:
https://api.groq.com/openai/v1
Benchmark Results
| Model | Tokens/Second | Quality Score | Cost/1M Tokens |
|---|
| GPT-4 Turbo | 50 | 9.5/10 | $10.00 |
| Claude 3 Opus | 40 | 9.3/10 | $15.00 |
| Llama 3 70B (Local) | 30 | 8.5/10 | $0.00 |
| Mistral Large | 60 | 8.8/10 | $8.00 |
| GPT-3.5 Turbo | 80 | 7.5/10 | $0.50 |
// scripts/benchmark-models.ts
async function benchmarkModel(provider: any, prompt: string) {
const startTime = Date.now();
let tokens = 0;
try {
const response = await provider.generate(prompt);
tokens = response.usage?.total_tokens || 0;
const duration = Date.now() - startTime;
return {
duration,
tokens,
tokensPerSecond: tokens / (duration / 1000),
cost: calculateCost(provider.model, tokens)
};
} catch (error) {
return { error: error.message };
}
}
Cost Optimization Strategies
1. Model Cascading
Use cheaper models first, escalate to expensive ones only when needed.
// app/lib/ai/cascade-strategy.ts
export async function generateWithCascade(prompt: string, complexity: 'low' | 'medium' | 'high') {
const models = {
low: 'gpt-3.5-turbo',
medium: 'claude-3-sonnet',
high: 'gpt-4-turbo'
};
const model = models[complexity];
return await generate(prompt, { model });
}
2. Caching Responses
// app/lib/ai/cache-manager.ts
import { Redis } from '@upstash/redis';
const redis = new Redis({
url: process.env.UPSTASH_REDIS_URL,
token: process.env.UPSTASH_REDIS_TOKEN
});
export async function getCachedOrGenerate(
prompt: string,
generator: () => Promise<string>
) {
const cacheKey = `ai:${createHash('sha256').update(prompt).digest('hex')}`;
// Check cache
const cached = await redis.get(cacheKey);
if (cached) return cached;
// Generate and cache
const result = await generator();
await redis.set(cacheKey, result, { ex: 3600 }); // 1 hour TTL
return result;
}
3. Batch Processing
// app/lib/ai/batch-processor.ts
export async function processBatch(prompts: string[], model: string) {
const batchSize = 10;
const results = [];
for (let i = 0; i < prompts.length; i += batchSize) {
const batch = prompts.slice(i, i + batchSize);
const batchResults = await Promise.all(
batch.map(prompt => generate(prompt, { model }))
);
results.push(...batchResults);
}
return results;
}
4. Token Optimization
// app/lib/ai/token-optimizer.ts
export function optimizePrompt(prompt: string, maxTokens: number = 2000) {
// Remove unnecessary whitespace
let optimized = prompt.replace(/\s+/g, ' ').trim();
// Truncate if too long
const encoder = new GPT3Tokenizer({ type: 'gpt3' });
const tokens = encoder.encode(optimized);
if (tokens.length > maxTokens) {
const truncated = tokens.slice(0, maxTokens);
optimized = encoder.decode(truncated);
}
return optimized;
}
Best Practices
1. Error Handling
export async function generateWithRetry(
prompt: string,
options: any,
maxRetries = 3
) {
for (let i = 0; i < maxRetries; i++) {
try {
return await generate(prompt, options);
} catch (error) {
if (i === maxRetries - 1) throw error;
await new Promise(resolve => setTimeout(resolve, 1000 * (i + 1)));
}
}
}
2. Model Selection Logic
export function selectOptimalModel(requirements: {
maxCost?: number;
minQuality?: number;
maxLatency?: number;
}) {
const models = getAvailableModels();
return models
.filter(m => m.costPer1M <= (requirements.maxCost || Infinity))
.filter(m => m.qualityScore >= (requirements.minQuality || 0))
.filter(m => m.avgLatency <= (requirements.maxLatency || Infinity))
.sort((a, b) => b.qualityScore - a.qualityScore)[0];
}
3. Monitoring and Logging
export async function trackModelUsage(
model: string,
tokens: number,
duration: number
) {
await db.modelUsage.create({
data: {
model,
tokens,
duration,
cost: calculateCost(model, tokens),
timestamp: new Date()
}
});
}
Conclusion
DeepWiki’s flexible model system allows you to optimize for your specific needs:
- Use OpenRouter for access to multiple models
- Deploy Ollama for privacy and zero API costs
- Choose Azure OpenAI for enterprise requirements
- Implement cascading strategies for cost optimization
- Monitor usage and performance to make informed decisions
Remember to regularly review your model usage and costs to ensure you’re using the most appropriate models for your use case.