> ## Documentation Index
> Fetch the complete documentation index at: https://asyncfunc.mintlify.app/llms.txt
> Use this file to discover all available pages before exploring further.

# Custom Models

> Configure and use custom AI models with DeepWiki

# Using Custom Models with DeepWiki

DeepWiki supports a wide range of AI models through various providers. This guide covers how to configure and use custom models for optimal performance and cost efficiency.

## Overview

DeepWiki's flexible architecture allows you to use models from:

* OpenRouter (access to 100+ models)
* Ollama (local models)
* Azure OpenAI
* Any OpenAI-compatible endpoint
* Custom API endpoints

## OpenRouter Integration

OpenRouter provides access to multiple model providers through a single API.

### Configuration

```json theme={null}
// generator.json
{
  "provider": "openrouter",
  "apiKey": "YOUR_OPENROUTER_API_KEY",
  "model": "anthropic/claude-3-opus",
  "baseURL": "https://openrouter.ai/api/v1",
  "headers": {
    "HTTP-Referer": "https://yourapp.com",
    "X-Title": "DeepWiki"
  }
}
```

### Available Models

Popular models on OpenRouter:

* `anthropic/claude-3-opus` - Best for complex reasoning
* `anthropic/claude-3-sonnet` - Balanced performance/cost
* `openai/gpt-4-turbo` - Latest GPT-4 variant
* `google/gemini-pro` - Google's latest model
* `meta-llama/llama-3-70b` - Open source alternative

### Usage Example

```typescript theme={null}
// app/lib/ai/generator.ts
import { OpenRouter } from '@openrouter/sdk';

const client = new OpenRouter({
  apiKey: process.env.OPENROUTER_API_KEY,
  defaultHeaders: {
    'HTTP-Referer': process.env.APP_URL,
    'X-Title': 'DeepWiki'
  }
});

export async function generateContent(prompt: string) {
  const response = await client.chat.completions.create({
    model: 'anthropic/claude-3-opus',
    messages: [{ role: 'user', content: prompt }],
    temperature: 0.7,
    max_tokens: 4000
  });
  
  return response.choices[0].message.content;
}
```

## Ollama for Local Models

Run models locally for privacy and zero API costs.

### Installation

```bash theme={null}
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull models
ollama pull llama3
ollama pull mistral
ollama pull codellama
```

### Configuration

```json theme={null}
// generator.json
{
  "provider": "ollama",
  "baseURL": "http://localhost:11434",
  "model": "llama3:70b",
  "options": {
    "temperature": 0.7,
    "num_predict": 4096
  }
}
```

### Integration

```typescript theme={null}
// app/lib/ai/ollama-provider.ts
export class OllamaProvider {
  private baseURL: string;
  
  constructor(baseURL = 'http://localhost:11434') {
    this.baseURL = baseURL;
  }
  
  async generate(prompt: string, model = 'llama3') {
    const response = await fetch(`${this.baseURL}/api/generate`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model,
        prompt,
        stream: false,
        options: {
          temperature: 0.7,
          num_predict: 4096
        }
      })
    });
    
    const data = await response.json();
    return data.response;
  }
}
```

## Azure OpenAI Configuration

Use Azure's enterprise-grade OpenAI deployment.

### Setup

```json theme={null}
// generator.json
{
  "provider": "azure-openai",
  "apiKey": "YOUR_AZURE_API_KEY",
  "baseURL": "https://YOUR_RESOURCE.openai.azure.com",
  "apiVersion": "2024-02-15-preview",
  "deployment": "gpt-4-turbo",
  "model": "gpt-4-turbo"
}
```

### Environment Variables

```bash theme={null}
# .env.local
AZURE_OPENAI_API_KEY=your_key_here
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_DEPLOYMENT=gpt-4-turbo
AZURE_OPENAI_API_VERSION=2024-02-15-preview
```

### Implementation

```typescript theme={null}
// app/lib/ai/azure-provider.ts
import { AzureOpenAI } from '@azure/openai';

const client = new AzureOpenAI({
  apiKey: process.env.AZURE_OPENAI_API_KEY,
  endpoint: process.env.AZURE_OPENAI_ENDPOINT,
  apiVersion: process.env.AZURE_OPENAI_API_VERSION
});

export async function generateWithAzure(prompt: string) {
  const result = await client.getChatCompletions(
    process.env.AZURE_OPENAI_DEPLOYMENT,
    [{ role: 'user', content: prompt }],
    {
      temperature: 0.7,
      maxTokens: 4000
    }
  );
  
  return result.choices[0].message?.content;
}
```

## Custom Model Selection UI

Implement a model selector in your DeepWiki interface.

### Model Selector Component

```tsx theme={null}
// app/components/model-selector.tsx
import { useState } from 'react';
import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from '@/components/ui/select';

const AVAILABLE_MODELS = [
  { id: 'gpt-4-turbo', name: 'GPT-4 Turbo', provider: 'openai' },
  { id: 'claude-3-opus', name: 'Claude 3 Opus', provider: 'anthropic' },
  { id: 'llama3:70b', name: 'Llama 3 70B', provider: 'ollama' },
  { id: 'mistral-large', name: 'Mistral Large', provider: 'mistral' }
];

export function ModelSelector({ onModelChange }: { onModelChange: (model: string) => void }) {
  const [selectedModel, setSelectedModel] = useState('gpt-4-turbo');
  
  const handleChange = (value: string) => {
    setSelectedModel(value);
    onModelChange(value);
  };
  
  return (
    <Select value={selectedModel} onValueChange={handleChange}>
      <SelectTrigger className="w-[200px]">
        <SelectValue placeholder="Select a model" />
      </SelectTrigger>
      <SelectContent>
        {AVAILABLE_MODELS.map((model) => (
          <SelectItem key={model.id} value={model.id}>
            <div className="flex flex-col">
              <span>{model.name}</span>
              <span className="text-xs text-muted-foreground">{model.provider}</span>
            </div>
          </SelectItem>
        ))}
      </SelectContent>
    </Select>
  );
}
```

### Dynamic Model Configuration

```typescript theme={null}
// app/lib/ai/model-config.ts
export interface ModelConfig {
  provider: string;
  model: string;
  apiKey?: string;
  baseURL?: string;
  temperature?: number;
  maxTokens?: number;
}

export const MODEL_CONFIGS: Record<string, ModelConfig> = {
  'gpt-4-turbo': {
    provider: 'openai',
    model: 'gpt-4-turbo-preview',
    temperature: 0.7,
    maxTokens: 4000
  },
  'claude-3-opus': {
    provider: 'openrouter',
    model: 'anthropic/claude-3-opus',
    baseURL: 'https://openrouter.ai/api/v1',
    temperature: 0.7,
    maxTokens: 4000
  },
  'llama3:70b': {
    provider: 'ollama',
    model: 'llama3:70b',
    baseURL: 'http://localhost:11434',
    temperature: 0.8,
    maxTokens: 4096
  }
};
```

## Modifying generator.json

The `generator.json` file controls model configuration.

### Basic Structure

```json theme={null}
{
  "provider": "openai",
  "model": "gpt-4-turbo",
  "apiKey": "${OPENAI_API_KEY}",
  "temperature": 0.7,
  "maxTokens": 4000,
  "systemPrompt": "You are a helpful wiki content generator...",
  "retryAttempts": 3,
  "retryDelay": 1000
}
```

### Multi-Provider Configuration

```json theme={null}
{
  "providers": {
    "primary": {
      "provider": "openai",
      "model": "gpt-4-turbo",
      "apiKey": "${OPENAI_API_KEY}"
    },
    "fallback": {
      "provider": "openrouter",
      "model": "meta-llama/llama-3-70b",
      "apiKey": "${OPENROUTER_API_KEY}",
      "baseURL": "https://openrouter.ai/api/v1"
    },
    "local": {
      "provider": "ollama",
      "model": "llama3",
      "baseURL": "http://localhost:11434"
    }
  },
  "strategy": "fallback",
  "timeout": 30000
}
```

## OpenAI-Compatible Endpoints

Many providers offer OpenAI-compatible APIs.

### Generic Configuration

```typescript theme={null}
// app/lib/ai/openai-compatible.ts
export class OpenAICompatibleProvider {
  private apiKey: string;
  private baseURL: string;
  
  constructor(config: { apiKey: string; baseURL: string }) {
    this.apiKey = config.apiKey;
    this.baseURL = config.baseURL;
  }
  
  async chat(messages: any[], options: any = {}) {
    const response = await fetch(`${this.baseURL}/v1/chat/completions`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        messages,
        ...options
      })
    });
    
    return response.json();
  }
}
```

### Supported Providers

* **Perplexity AI**: `https://api.perplexity.ai`
* **Together AI**: `https://api.together.xyz/v1`
* **Anyscale**: `https://api.endpoints.anyscale.com/v1`
* **Groq**: `https://api.groq.com/openai/v1`

## Performance Comparisons

### Benchmark Results

| Model               | Tokens/Second | Quality Score | Cost/1M Tokens |
| ------------------- | ------------- | ------------- | -------------- |
| GPT-4 Turbo         | 50            | 9.5/10        | \$10.00        |
| Claude 3 Opus       | 40            | 9.3/10        | \$15.00        |
| Llama 3 70B (Local) | 30            | 8.5/10        | \$0.00         |
| Mistral Large       | 60            | 8.8/10        | \$8.00         |
| GPT-3.5 Turbo       | 80            | 7.5/10        | \$0.50         |

### Performance Testing Script

```typescript theme={null}
// scripts/benchmark-models.ts
async function benchmarkModel(provider: any, prompt: string) {
  const startTime = Date.now();
  let tokens = 0;
  
  try {
    const response = await provider.generate(prompt);
    tokens = response.usage?.total_tokens || 0;
    const duration = Date.now() - startTime;
    
    return {
      duration,
      tokens,
      tokensPerSecond: tokens / (duration / 1000),
      cost: calculateCost(provider.model, tokens)
    };
  } catch (error) {
    return { error: error.message };
  }
}
```

## Cost Optimization Strategies

### 1. Model Cascading

Use cheaper models first, escalate to expensive ones only when needed.

```typescript theme={null}
// app/lib/ai/cascade-strategy.ts
export async function generateWithCascade(prompt: string, complexity: 'low' | 'medium' | 'high') {
  const models = {
    low: 'gpt-3.5-turbo',
    medium: 'claude-3-sonnet',
    high: 'gpt-4-turbo'
  };
  
  const model = models[complexity];
  return await generate(prompt, { model });
}
```

### 2. Caching Responses

```typescript theme={null}
// app/lib/ai/cache-manager.ts
import { Redis } from '@upstash/redis';

const redis = new Redis({
  url: process.env.UPSTASH_REDIS_URL,
  token: process.env.UPSTASH_REDIS_TOKEN
});

export async function getCachedOrGenerate(
  prompt: string,
  generator: () => Promise<string>
) {
  const cacheKey = `ai:${createHash('sha256').update(prompt).digest('hex')}`;
  
  // Check cache
  const cached = await redis.get(cacheKey);
  if (cached) return cached;
  
  // Generate and cache
  const result = await generator();
  await redis.set(cacheKey, result, { ex: 3600 }); // 1 hour TTL
  
  return result;
}
```

### 3. Batch Processing

```typescript theme={null}
// app/lib/ai/batch-processor.ts
export async function processBatch(prompts: string[], model: string) {
  const batchSize = 10;
  const results = [];
  
  for (let i = 0; i < prompts.length; i += batchSize) {
    const batch = prompts.slice(i, i + batchSize);
    const batchResults = await Promise.all(
      batch.map(prompt => generate(prompt, { model }))
    );
    results.push(...batchResults);
  }
  
  return results;
}
```

### 4. Token Optimization

```typescript theme={null}
// app/lib/ai/token-optimizer.ts
export function optimizePrompt(prompt: string, maxTokens: number = 2000) {
  // Remove unnecessary whitespace
  let optimized = prompt.replace(/\s+/g, ' ').trim();
  
  // Truncate if too long
  const encoder = new GPT3Tokenizer({ type: 'gpt3' });
  const tokens = encoder.encode(optimized);
  
  if (tokens.length > maxTokens) {
    const truncated = tokens.slice(0, maxTokens);
    optimized = encoder.decode(truncated);
  }
  
  return optimized;
}
```

## Best Practices

### 1. Error Handling

```typescript theme={null}
export async function generateWithRetry(
  prompt: string,
  options: any,
  maxRetries = 3
) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await generate(prompt, options);
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      await new Promise(resolve => setTimeout(resolve, 1000 * (i + 1)));
    }
  }
}
```

### 2. Model Selection Logic

```typescript theme={null}
export function selectOptimalModel(requirements: {
  maxCost?: number;
  minQuality?: number;
  maxLatency?: number;
}) {
  const models = getAvailableModels();
  
  return models
    .filter(m => m.costPer1M <= (requirements.maxCost || Infinity))
    .filter(m => m.qualityScore >= (requirements.minQuality || 0))
    .filter(m => m.avgLatency <= (requirements.maxLatency || Infinity))
    .sort((a, b) => b.qualityScore - a.qualityScore)[0];
}
```

### 3. Monitoring and Logging

```typescript theme={null}
export async function trackModelUsage(
  model: string,
  tokens: number,
  duration: number
) {
  await db.modelUsage.create({
    data: {
      model,
      tokens,
      duration,
      cost: calculateCost(model, tokens),
      timestamp: new Date()
    }
  });
}
```

## Conclusion

DeepWiki's flexible model system allows you to optimize for your specific needs:

* Use **OpenRouter** for access to multiple models
* Deploy **Ollama** for privacy and zero API costs
* Choose **Azure OpenAI** for enterprise requirements
* Implement **cascading strategies** for cost optimization
* Monitor usage and performance to make informed decisions

Remember to regularly review your model usage and costs to ensure you're using the most appropriate models for your use case.
