Using Custom Models with DeepWiki

DeepWiki supports a wide range of AI models through various providers. This guide covers how to configure and use custom models for optimal performance and cost efficiency.

Overview

DeepWiki’s flexible architecture allows you to use models from:
  • OpenRouter (access to 100+ models)
  • Ollama (local models)
  • Azure OpenAI
  • Any OpenAI-compatible endpoint
  • Custom API endpoints

OpenRouter Integration

OpenRouter provides access to multiple model providers through a single API.

Configuration

// generator.json
{
  "provider": "openrouter",
  "apiKey": "YOUR_OPENROUTER_API_KEY",
  "model": "anthropic/claude-3-opus",
  "baseURL": "https://openrouter.ai/api/v1",
  "headers": {
    "HTTP-Referer": "https://yourapp.com",
    "X-Title": "DeepWiki"
  }
}

Available Models

Popular models on OpenRouter:
  • anthropic/claude-3-opus - Best for complex reasoning
  • anthropic/claude-3-sonnet - Balanced performance/cost
  • openai/gpt-4-turbo - Latest GPT-4 variant
  • google/gemini-pro - Google’s latest model
  • meta-llama/llama-3-70b - Open source alternative

Usage Example

// app/lib/ai/generator.ts
import { OpenRouter } from '@openrouter/sdk';

const client = new OpenRouter({
  apiKey: process.env.OPENROUTER_API_KEY,
  defaultHeaders: {
    'HTTP-Referer': process.env.APP_URL,
    'X-Title': 'DeepWiki'
  }
});

export async function generateContent(prompt: string) {
  const response = await client.chat.completions.create({
    model: 'anthropic/claude-3-opus',
    messages: [{ role: 'user', content: prompt }],
    temperature: 0.7,
    max_tokens: 4000
  });
  
  return response.choices[0].message.content;
}

Ollama for Local Models

Run models locally for privacy and zero API costs.

Installation

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull models
ollama pull llama3
ollama pull mistral
ollama pull codellama

Configuration

// generator.json
{
  "provider": "ollama",
  "baseURL": "http://localhost:11434",
  "model": "llama3:70b",
  "options": {
    "temperature": 0.7,
    "num_predict": 4096
  }
}

Integration

// app/lib/ai/ollama-provider.ts
export class OllamaProvider {
  private baseURL: string;
  
  constructor(baseURL = 'http://localhost:11434') {
    this.baseURL = baseURL;
  }
  
  async generate(prompt: string, model = 'llama3') {
    const response = await fetch(`${this.baseURL}/api/generate`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model,
        prompt,
        stream: false,
        options: {
          temperature: 0.7,
          num_predict: 4096
        }
      })
    });
    
    const data = await response.json();
    return data.response;
  }
}

Azure OpenAI Configuration

Use Azure’s enterprise-grade OpenAI deployment.

Setup

// generator.json
{
  "provider": "azure-openai",
  "apiKey": "YOUR_AZURE_API_KEY",
  "baseURL": "https://YOUR_RESOURCE.openai.azure.com",
  "apiVersion": "2024-02-15-preview",
  "deployment": "gpt-4-turbo",
  "model": "gpt-4-turbo"
}

Environment Variables

# .env.local
AZURE_OPENAI_API_KEY=your_key_here
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_DEPLOYMENT=gpt-4-turbo
AZURE_OPENAI_API_VERSION=2024-02-15-preview

Implementation

// app/lib/ai/azure-provider.ts
import { AzureOpenAI } from '@azure/openai';

const client = new AzureOpenAI({
  apiKey: process.env.AZURE_OPENAI_API_KEY,
  endpoint: process.env.AZURE_OPENAI_ENDPOINT,
  apiVersion: process.env.AZURE_OPENAI_API_VERSION
});

export async function generateWithAzure(prompt: string) {
  const result = await client.getChatCompletions(
    process.env.AZURE_OPENAI_DEPLOYMENT,
    [{ role: 'user', content: prompt }],
    {
      temperature: 0.7,
      maxTokens: 4000
    }
  );
  
  return result.choices[0].message?.content;
}

Custom Model Selection UI

Implement a model selector in your DeepWiki interface.

Model Selector Component

// app/components/model-selector.tsx
import { useState } from 'react';
import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from '@/components/ui/select';

const AVAILABLE_MODELS = [
  { id: 'gpt-4-turbo', name: 'GPT-4 Turbo', provider: 'openai' },
  { id: 'claude-3-opus', name: 'Claude 3 Opus', provider: 'anthropic' },
  { id: 'llama3:70b', name: 'Llama 3 70B', provider: 'ollama' },
  { id: 'mistral-large', name: 'Mistral Large', provider: 'mistral' }
];

export function ModelSelector({ onModelChange }: { onModelChange: (model: string) => void }) {
  const [selectedModel, setSelectedModel] = useState('gpt-4-turbo');
  
  const handleChange = (value: string) => {
    setSelectedModel(value);
    onModelChange(value);
  };
  
  return (
    <Select value={selectedModel} onValueChange={handleChange}>
      <SelectTrigger className="w-[200px]">
        <SelectValue placeholder="Select a model" />
      </SelectTrigger>
      <SelectContent>
        {AVAILABLE_MODELS.map((model) => (
          <SelectItem key={model.id} value={model.id}>
            <div className="flex flex-col">
              <span>{model.name}</span>
              <span className="text-xs text-muted-foreground">{model.provider}</span>
            </div>
          </SelectItem>
        ))}
      </SelectContent>
    </Select>
  );
}

Dynamic Model Configuration

// app/lib/ai/model-config.ts
export interface ModelConfig {
  provider: string;
  model: string;
  apiKey?: string;
  baseURL?: string;
  temperature?: number;
  maxTokens?: number;
}

export const MODEL_CONFIGS: Record<string, ModelConfig> = {
  'gpt-4-turbo': {
    provider: 'openai',
    model: 'gpt-4-turbo-preview',
    temperature: 0.7,
    maxTokens: 4000
  },
  'claude-3-opus': {
    provider: 'openrouter',
    model: 'anthropic/claude-3-opus',
    baseURL: 'https://openrouter.ai/api/v1',
    temperature: 0.7,
    maxTokens: 4000
  },
  'llama3:70b': {
    provider: 'ollama',
    model: 'llama3:70b',
    baseURL: 'http://localhost:11434',
    temperature: 0.8,
    maxTokens: 4096
  }
};

Modifying generator.json

The generator.json file controls model configuration.

Basic Structure

{
  "provider": "openai",
  "model": "gpt-4-turbo",
  "apiKey": "${OPENAI_API_KEY}",
  "temperature": 0.7,
  "maxTokens": 4000,
  "systemPrompt": "You are a helpful wiki content generator...",
  "retryAttempts": 3,
  "retryDelay": 1000
}

Multi-Provider Configuration

{
  "providers": {
    "primary": {
      "provider": "openai",
      "model": "gpt-4-turbo",
      "apiKey": "${OPENAI_API_KEY}"
    },
    "fallback": {
      "provider": "openrouter",
      "model": "meta-llama/llama-3-70b",
      "apiKey": "${OPENROUTER_API_KEY}",
      "baseURL": "https://openrouter.ai/api/v1"
    },
    "local": {
      "provider": "ollama",
      "model": "llama3",
      "baseURL": "http://localhost:11434"
    }
  },
  "strategy": "fallback",
  "timeout": 30000
}

OpenAI-Compatible Endpoints

Many providers offer OpenAI-compatible APIs.

Generic Configuration

// app/lib/ai/openai-compatible.ts
export class OpenAICompatibleProvider {
  private apiKey: string;
  private baseURL: string;
  
  constructor(config: { apiKey: string; baseURL: string }) {
    this.apiKey = config.apiKey;
    this.baseURL = config.baseURL;
  }
  
  async chat(messages: any[], options: any = {}) {
    const response = await fetch(`${this.baseURL}/v1/chat/completions`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        messages,
        ...options
      })
    });
    
    return response.json();
  }
}

Supported Providers

  • Perplexity AI: https://api.perplexity.ai
  • Together AI: https://api.together.xyz/v1
  • Anyscale: https://api.endpoints.anyscale.com/v1
  • Groq: https://api.groq.com/openai/v1

Performance Comparisons

Benchmark Results

ModelTokens/SecondQuality ScoreCost/1M Tokens
GPT-4 Turbo509.5/10$10.00
Claude 3 Opus409.3/10$15.00
Llama 3 70B (Local)308.5/10$0.00
Mistral Large608.8/10$8.00
GPT-3.5 Turbo807.5/10$0.50

Performance Testing Script

// scripts/benchmark-models.ts
async function benchmarkModel(provider: any, prompt: string) {
  const startTime = Date.now();
  let tokens = 0;
  
  try {
    const response = await provider.generate(prompt);
    tokens = response.usage?.total_tokens || 0;
    const duration = Date.now() - startTime;
    
    return {
      duration,
      tokens,
      tokensPerSecond: tokens / (duration / 1000),
      cost: calculateCost(provider.model, tokens)
    };
  } catch (error) {
    return { error: error.message };
  }
}

Cost Optimization Strategies

1. Model Cascading

Use cheaper models first, escalate to expensive ones only when needed.
// app/lib/ai/cascade-strategy.ts
export async function generateWithCascade(prompt: string, complexity: 'low' | 'medium' | 'high') {
  const models = {
    low: 'gpt-3.5-turbo',
    medium: 'claude-3-sonnet',
    high: 'gpt-4-turbo'
  };
  
  const model = models[complexity];
  return await generate(prompt, { model });
}

2. Caching Responses

// app/lib/ai/cache-manager.ts
import { Redis } from '@upstash/redis';

const redis = new Redis({
  url: process.env.UPSTASH_REDIS_URL,
  token: process.env.UPSTASH_REDIS_TOKEN
});

export async function getCachedOrGenerate(
  prompt: string,
  generator: () => Promise<string>
) {
  const cacheKey = `ai:${createHash('sha256').update(prompt).digest('hex')}`;
  
  // Check cache
  const cached = await redis.get(cacheKey);
  if (cached) return cached;
  
  // Generate and cache
  const result = await generator();
  await redis.set(cacheKey, result, { ex: 3600 }); // 1 hour TTL
  
  return result;
}

3. Batch Processing

// app/lib/ai/batch-processor.ts
export async function processBatch(prompts: string[], model: string) {
  const batchSize = 10;
  const results = [];
  
  for (let i = 0; i < prompts.length; i += batchSize) {
    const batch = prompts.slice(i, i + batchSize);
    const batchResults = await Promise.all(
      batch.map(prompt => generate(prompt, { model }))
    );
    results.push(...batchResults);
  }
  
  return results;
}

4. Token Optimization

// app/lib/ai/token-optimizer.ts
export function optimizePrompt(prompt: string, maxTokens: number = 2000) {
  // Remove unnecessary whitespace
  let optimized = prompt.replace(/\s+/g, ' ').trim();
  
  // Truncate if too long
  const encoder = new GPT3Tokenizer({ type: 'gpt3' });
  const tokens = encoder.encode(optimized);
  
  if (tokens.length > maxTokens) {
    const truncated = tokens.slice(0, maxTokens);
    optimized = encoder.decode(truncated);
  }
  
  return optimized;
}

Best Practices

1. Error Handling

export async function generateWithRetry(
  prompt: string,
  options: any,
  maxRetries = 3
) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await generate(prompt, options);
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      await new Promise(resolve => setTimeout(resolve, 1000 * (i + 1)));
    }
  }
}

2. Model Selection Logic

export function selectOptimalModel(requirements: {
  maxCost?: number;
  minQuality?: number;
  maxLatency?: number;
}) {
  const models = getAvailableModels();
  
  return models
    .filter(m => m.costPer1M <= (requirements.maxCost || Infinity))
    .filter(m => m.qualityScore >= (requirements.minQuality || 0))
    .filter(m => m.avgLatency <= (requirements.maxLatency || Infinity))
    .sort((a, b) => b.qualityScore - a.qualityScore)[0];
}

3. Monitoring and Logging

export async function trackModelUsage(
  model: string,
  tokens: number,
  duration: number
) {
  await db.modelUsage.create({
    data: {
      model,
      tokens,
      duration,
      cost: calculateCost(model, tokens),
      timestamp: new Date()
    }
  });
}

Conclusion

DeepWiki’s flexible model system allows you to optimize for your specific needs:
  • Use OpenRouter for access to multiple models
  • Deploy Ollama for privacy and zero API costs
  • Choose Azure OpenAI for enterprise requirements
  • Implement cascading strategies for cost optimization
  • Monitor usage and performance to make informed decisions
Remember to regularly review your model usage and costs to ensure you’re using the most appropriate models for your use case.