Using Custom Models with DeepWiki

DeepWiki supports a wide range of AI models through various providers. This guide covers how to configure and use custom models for optimal performance and cost efficiency.

Overview

DeepWiki’s flexible architecture allows you to use models from:

OpenRouter (access to 100+ models)
Ollama (local models)
Azure OpenAI
Any OpenAI-compatible endpoint
Custom API endpoints

OpenRouter Integration

OpenRouter provides access to multiple model providers through a single API.

Configuration

// generator.json
{
  "provider": "openrouter",
  "apiKey": "YOUR_OPENROUTER_API_KEY",
  "model": "anthropic/claude-3-opus",
  "baseURL": "https://openrouter.ai/api/v1",
  "headers": {
    "HTTP-Referer": "https://yourapp.com",
    "X-Title": "DeepWiki"
  }
}

Available Models

Popular models on OpenRouter:

anthropic/claude-3-opus - Best for complex reasoning
anthropic/claude-3-sonnet - Balanced performance/cost
openai/gpt-4-turbo - Latest GPT-4 variant
google/gemini-pro - Google’s latest model
meta-llama/llama-3-70b - Open source alternative

Usage Example

// app/lib/ai/generator.ts
import { OpenRouter } from '@openrouter/sdk';

const client = new OpenRouter({
  apiKey: process.env.OPENROUTER_API_KEY,
  defaultHeaders: {
    'HTTP-Referer': process.env.APP_URL,
    'X-Title': 'DeepWiki'
  }
});

export async function generateContent(prompt: string) {
  const response = await client.chat.completions.create({
    model: 'anthropic/claude-3-opus',
    messages: [{ role: 'user', content: prompt }],
    temperature: 0.7,
    max_tokens: 4000
  });
  
  return response.choices[0].message.content;
}

Ollama for Local Models

Run models locally for privacy and zero API costs.

Installation

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull models
ollama pull llama3
ollama pull mistral
ollama pull codellama

Configuration

// generator.json
{
  "provider": "ollama",
  "baseURL": "http://localhost:11434",
  "model": "llama3:70b",
  "options": {
    "temperature": 0.7,
    "num_predict": 4096
  }
}

Integration

// app/lib/ai/ollama-provider.ts
export class OllamaProvider {
  private baseURL: string;
  
  constructor(baseURL = 'http://localhost:11434') {
    this.baseURL = baseURL;
  }
  
  async generate(prompt: string, model = 'llama3') {
    const response = await fetch(`${this.baseURL}/api/generate`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model,
        prompt,
        stream: false,
        options: {
          temperature: 0.7,
          num_predict: 4096
        }
      })
    });
    
    const data = await response.json();
    return data.response;
  }
}

Azure OpenAI Configuration

Use Azure’s enterprise-grade OpenAI deployment.

Setup

// generator.json
{
  "provider": "azure-openai",
  "apiKey": "YOUR_AZURE_API_KEY",
  "baseURL": "https://YOUR_RESOURCE.openai.azure.com",
  "apiVersion": "2024-02-15-preview",
  "deployment": "gpt-4-turbo",
  "model": "gpt-4-turbo"
}

Environment Variables

# .env.local
AZURE_OPENAI_API_KEY=your_key_here
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_DEPLOYMENT=gpt-4-turbo
AZURE_OPENAI_API_VERSION=2024-02-15-preview

Implementation

// app/lib/ai/azure-provider.ts
import { AzureOpenAI } from '@azure/openai';

const client = new AzureOpenAI({
  apiKey: process.env.AZURE_OPENAI_API_KEY,
  endpoint: process.env.AZURE_OPENAI_ENDPOINT,
  apiVersion: process.env.AZURE_OPENAI_API_VERSION
});

export async function generateWithAzure(prompt: string) {
  const result = await client.getChatCompletions(
    process.env.AZURE_OPENAI_DEPLOYMENT,
    [{ role: 'user', content: prompt }],
    {
      temperature: 0.7,
      maxTokens: 4000
    }
  );
  
  return result.choices[0].message?.content;
}

Custom Model Selection UI

Implement a model selector in your DeepWiki interface.

Model Selector Component

// app/components/model-selector.tsx
import { useState } from 'react';
import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from '@/components/ui/select';

const AVAILABLE_MODELS = [
  { id: 'gpt-4-turbo', name: 'GPT-4 Turbo', provider: 'openai' },
  { id: 'claude-3-opus', name: 'Claude 3 Opus', provider: 'anthropic' },
  { id: 'llama3:70b', name: 'Llama 3 70B', provider: 'ollama' },
  { id: 'mistral-large', name: 'Mistral Large', provider: 'mistral' }
];

export function ModelSelector({ onModelChange }: { onModelChange: (model: string) => void }) {
  const [selectedModel, setSelectedModel] = useState('gpt-4-turbo');
  
  const handleChange = (value: string) => {
    setSelectedModel(value);
    onModelChange(value);
  };
  
  return (
    <Select value={selectedModel} onValueChange={handleChange}>
      <SelectTrigger className="w-[200px]">
        <SelectValue placeholder="Select a model" />
      </SelectTrigger>
      <SelectContent>
        {AVAILABLE_MODELS.map((model) => (
          <SelectItem key={model.id} value={model.id}>
            <div className="flex flex-col">
              <span>{model.name}</span>
              <span className="text-xs text-muted-foreground">{model.provider}</span>
            </div>
          </SelectItem>
        ))}
      </SelectContent>
    </Select>
  );
}

Dynamic Model Configuration

// app/lib/ai/model-config.ts
export interface ModelConfig {
  provider: string;
  model: string;
  apiKey?: string;
  baseURL?: string;
  temperature?: number;
  maxTokens?: number;
}

export const MODEL_CONFIGS: Record<string, ModelConfig> = {
  'gpt-4-turbo': {
    provider: 'openai',
    model: 'gpt-4-turbo-preview',
    temperature: 0.7,
    maxTokens: 4000
  },
  'claude-3-opus': {
    provider: 'openrouter',
    model: 'anthropic/claude-3-opus',
    baseURL: 'https://openrouter.ai/api/v1',
    temperature: 0.7,
    maxTokens: 4000
  },
  'llama3:70b': {
    provider: 'ollama',
    model: 'llama3:70b',
    baseURL: 'http://localhost:11434',
    temperature: 0.8,
    maxTokens: 4096
  }
};

Modifying generator.json

The generator.json file controls model configuration.

Basic Structure

{
  "provider": "openai",
  "model": "gpt-4-turbo",
  "apiKey": "${OPENAI_API_KEY}",
  "temperature": 0.7,
  "maxTokens": 4000,
  "systemPrompt": "You are a helpful wiki content generator...",
  "retryAttempts": 3,
  "retryDelay": 1000
}

Multi-Provider Configuration

{
  "providers": {
    "primary": {
      "provider": "openai",
      "model": "gpt-4-turbo",
      "apiKey": "${OPENAI_API_KEY}"
    },
    "fallback": {
      "provider": "openrouter",
      "model": "meta-llama/llama-3-70b",
      "apiKey": "${OPENROUTER_API_KEY}",
      "baseURL": "https://openrouter.ai/api/v1"
    },
    "local": {
      "provider": "ollama",
      "model": "llama3",
      "baseURL": "http://localhost:11434"
    }
  },
  "strategy": "fallback",
  "timeout": 30000
}

OpenAI-Compatible Endpoints

Many providers offer OpenAI-compatible APIs.

Generic Configuration

// app/lib/ai/openai-compatible.ts
export class OpenAICompatibleProvider {
  private apiKey: string;
  private baseURL: string;
  
  constructor(config: { apiKey: string; baseURL: string }) {
    this.apiKey = config.apiKey;
    this.baseURL = config.baseURL;
  }
  
  async chat(messages: any[], options: any = {}) {
    const response = await fetch(`${this.baseURL}/v1/chat/completions`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        messages,
        ...options
      })
    });
    
    return response.json();
  }
}

Supported Providers

Perplexity AI: https://api.perplexity.ai
Together AI: https://api.together.xyz/v1
Anyscale: https://api.endpoints.anyscale.com/v1
Groq: https://api.groq.com/openai/v1

Performance Comparisons

Benchmark Results

Model	Tokens/Second	Quality Score	Cost/1M Tokens
GPT-4 Turbo	50	9.5/10	$10.00
Claude 3 Opus	40	9.3/10	$15.00
Llama 3 70B (Local)	30	8.5/10	$0.00
Mistral Large	60	8.8/10	$8.00
GPT-3.5 Turbo	80	7.5/10	$0.50

Performance Testing Script

// scripts/benchmark-models.ts
async function benchmarkModel(provider: any, prompt: string) {
  const startTime = Date.now();
  let tokens = 0;
  
  try {
    const response = await provider.generate(prompt);
    tokens = response.usage?.total_tokens || 0;
    const duration = Date.now() - startTime;
    
    return {
      duration,
      tokens,
      tokensPerSecond: tokens / (duration / 1000),
      cost: calculateCost(provider.model, tokens)
    };
  } catch (error) {
    return { error: error.message };
  }
}

Cost Optimization Strategies

1. Model Cascading

Use cheaper models first, escalate to expensive ones only when needed.

// app/lib/ai/cascade-strategy.ts
export async function generateWithCascade(prompt: string, complexity: 'low' | 'medium' | 'high') {
  const models = {
    low: 'gpt-3.5-turbo',
    medium: 'claude-3-sonnet',
    high: 'gpt-4-turbo'
  };
  
  const model = models[complexity];
  return await generate(prompt, { model });
}

2. Caching Responses

// app/lib/ai/cache-manager.ts
import { Redis } from '@upstash/redis';

const redis = new Redis({
  url: process.env.UPSTASH_REDIS_URL,
  token: process.env.UPSTASH_REDIS_TOKEN
});

export async function getCachedOrGenerate(
  prompt: string,
  generator: () => Promise<string>
) {
  const cacheKey = `ai:${createHash('sha256').update(prompt).digest('hex')}`;
  
  // Check cache
  const cached = await redis.get(cacheKey);
  if (cached) return cached;
  
  // Generate and cache
  const result = await generator();
  await redis.set(cacheKey, result, { ex: 3600 }); // 1 hour TTL
  
  return result;
}

3. Batch Processing

// app/lib/ai/batch-processor.ts
export async function processBatch(prompts: string[], model: string) {
  const batchSize = 10;
  const results = [];
  
  for (let i = 0; i < prompts.length; i += batchSize) {
    const batch = prompts.slice(i, i + batchSize);
    const batchResults = await Promise.all(
      batch.map(prompt => generate(prompt, { model }))
    );
    results.push(...batchResults);
  }
  
  return results;
}

4. Token Optimization

// app/lib/ai/token-optimizer.ts
export function optimizePrompt(prompt: string, maxTokens: number = 2000) {
  // Remove unnecessary whitespace
  let optimized = prompt.replace(/\s+/g, ' ').trim();
  
  // Truncate if too long
  const encoder = new GPT3Tokenizer({ type: 'gpt3' });
  const tokens = encoder.encode(optimized);
  
  if (tokens.length > maxTokens) {
    const truncated = tokens.slice(0, maxTokens);
    optimized = encoder.decode(truncated);
  }
  
  return optimized;
}

Best Practices

1. Error Handling

export async function generateWithRetry(
  prompt: string,
  options: any,
  maxRetries = 3
) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await generate(prompt, options);
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      await new Promise(resolve => setTimeout(resolve, 1000 * (i + 1)));
    }
  }
}

2. Model Selection Logic

export function selectOptimalModel(requirements: {
  maxCost?: number;
  minQuality?: number;
  maxLatency?: number;
}) {
  const models = getAvailableModels();
  
  return models
    .filter(m => m.costPer1M <= (requirements.maxCost || Infinity))
    .filter(m => m.qualityScore >= (requirements.minQuality || 0))
    .filter(m => m.avgLatency <= (requirements.maxLatency || Infinity))
    .sort((a, b) => b.qualityScore - a.qualityScore)[0];
}

3. Monitoring and Logging

export async function trackModelUsage(
  model: string,
  tokens: number,
  duration: number
) {
  await db.modelUsage.create({
    data: {
      model,
      tokens,
      duration,
      cost: calculateCost(model, tokens),
      timestamp: new Date()
    }
  });
}

Conclusion

DeepWiki’s flexible model system allows you to optimize for your specific needs:

Use OpenRouter for access to multiple models
Deploy Ollama for privacy and zero API costs
Choose Azure OpenAI for enterprise requirements
Implement cascading strategies for cost optimization
Monitor usage and performance to make informed decisions

Remember to regularly review your model usage and costs to ensure you’re using the most appropriate models for your use case.

Examples

​Using Custom Models with DeepWiki

​Overview

​OpenRouter Integration

​Configuration

​Available Models

​Usage Example

​Ollama for Local Models

​Installation

​Configuration

​Integration

​Azure OpenAI Configuration

​Setup

​Environment Variables

​Implementation

​Custom Model Selection UI

​Model Selector Component

​Dynamic Model Configuration

​Modifying generator.json

​Basic Structure

​Multi-Provider Configuration

​OpenAI-Compatible Endpoints

​Generic Configuration

​Supported Providers

​Performance Comparisons

​Benchmark Results

​Performance Testing Script

​Cost Optimization Strategies

​1. Model Cascading

​2. Caching Responses

​3. Batch Processing

​4. Token Optimization

​Best Practices

​1. Error Handling

​2. Model Selection Logic

​3. Monitoring and Logging

​Conclusion

Using Custom Models with DeepWiki

Overview

OpenRouter Integration

Configuration

Available Models

Usage Example

Ollama for Local Models

Installation

Configuration

Integration

Azure OpenAI Configuration

Setup

Environment Variables

Implementation

Custom Model Selection UI

Model Selector Component

Dynamic Model Configuration

Modifying generator.json

Basic Structure

Multi-Provider Configuration

OpenAI-Compatible Endpoints

Generic Configuration

Supported Providers

Performance Comparisons

Benchmark Results

Performance Testing Script

Cost Optimization Strategies

1. Model Cascading

2. Caching Responses

3. Batch Processing

4. Token Optimization

Best Practices

1. Error Handling

2. Model Selection Logic

3. Monitoring and Logging

Conclusion