Using Custom Models with DeepWiki
DeepWiki supports a wide range of AI models through various providers. This guide covers how to configure and use custom models for optimal performance and cost efficiency.Overview
DeepWiki’s flexible architecture allows you to use models from:- OpenRouter (access to 100+ models)
- Ollama (local models)
- Azure OpenAI
- Any OpenAI-compatible endpoint
- Custom API endpoints
OpenRouter Integration
OpenRouter provides access to multiple model providers through a single API.Configuration
Available Models
Popular models on OpenRouter:anthropic/claude-3-opus
- Best for complex reasoninganthropic/claude-3-sonnet
- Balanced performance/costopenai/gpt-4-turbo
- Latest GPT-4 variantgoogle/gemini-pro
- Google’s latest modelmeta-llama/llama-3-70b
- Open source alternative
Usage Example
Ollama for Local Models
Run models locally for privacy and zero API costs.Installation
Configuration
Integration
Azure OpenAI Configuration
Use Azure’s enterprise-grade OpenAI deployment.Setup
Environment Variables
Implementation
Custom Model Selection UI
Implement a model selector in your DeepWiki interface.Model Selector Component
Dynamic Model Configuration
Modifying generator.json
Thegenerator.json
file controls model configuration.
Basic Structure
Multi-Provider Configuration
OpenAI-Compatible Endpoints
Many providers offer OpenAI-compatible APIs.Generic Configuration
Supported Providers
- Perplexity AI:
https://api.perplexity.ai
- Together AI:
https://api.together.xyz/v1
- Anyscale:
https://api.endpoints.anyscale.com/v1
- Groq:
https://api.groq.com/openai/v1
Performance Comparisons
Benchmark Results
Model | Tokens/Second | Quality Score | Cost/1M Tokens |
---|---|---|---|
GPT-4 Turbo | 50 | 9.5/10 | $10.00 |
Claude 3 Opus | 40 | 9.3/10 | $15.00 |
Llama 3 70B (Local) | 30 | 8.5/10 | $0.00 |
Mistral Large | 60 | 8.8/10 | $8.00 |
GPT-3.5 Turbo | 80 | 7.5/10 | $0.50 |
Performance Testing Script
Cost Optimization Strategies
1. Model Cascading
Use cheaper models first, escalate to expensive ones only when needed.2. Caching Responses
3. Batch Processing
4. Token Optimization
Best Practices
1. Error Handling
2. Model Selection Logic
3. Monitoring and Logging
Conclusion
DeepWiki’s flexible model system allows you to optimize for your specific needs:- Use OpenRouter for access to multiple models
- Deploy Ollama for privacy and zero API costs
- Choose Azure OpenAI for enterprise requirements
- Implement cascading strategies for cost optimization
- Monitor usage and performance to make informed decisions