Model Endpoints API

DeepWiki provides a flexible provider-based model selection system that supports multiple LLM providers. This documentation covers the model-related API endpoints and how to work with different model providers.

Overview

DeepWiki’s model provider system allows you to choose from various AI model providers including:
  • Google - Gemini models
  • OpenAI - GPT models
  • OpenRouter - Access to multiple model providers through a unified API
  • Azure OpenAI - Azure-hosted OpenAI models
  • Ollama - Locally running open-source models
  • AWS Bedrock - Amazon’s managed AI models
  • DashScope - Alibaba’s AI models
Each provider offers different models with specific capabilities and pricing. The system is designed to be extensible, allowing service providers to add custom models as needed.

Authentication

Before using any model provider, you need to configure the appropriate API keys as environment variables:
# Google Gemini
GOOGLE_API_KEY=your_google_api_key

# OpenAI
OPENAI_API_KEY=your_openai_api_key

# OpenRouter
OPENROUTER_API_KEY=your_openrouter_api_key

# Azure OpenAI
AZURE_OPENAI_API_KEY=your_azure_openai_api_key
AZURE_OPENAI_ENDPOINT=your_azure_openai_endpoint
AZURE_OPENAI_VERSION=your_azure_openai_version

# AWS Bedrock
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=your_aws_region

# Ollama (if not local)
OLLAMA_HOST=http://your-ollama-host:11434

# DashScope
DASHSCOPE_API_KEY=your_dashscope_api_key

Endpoints

Get Model Configuration

Retrieves the available model providers and their supported models.
GET /models/config

Response

{
  "providers": [
    {
      "id": "google",
      "name": "Google",
      "supportsCustomModel": true,
      "models": [
        {
          "id": "gemini-2.0-flash",
          "name": "gemini-2.0-flash"
        },
        {
          "id": "gemini-2.5-flash-preview-05-20",
          "name": "gemini-2.5-flash-preview-05-20"
        },
        {
          "id": "gemini-2.5-pro-preview-03-25",
          "name": "gemini-2.5-pro-preview-03-25"
        }
      ]
    },
    {
      "id": "openai",
      "name": "Openai",
      "supportsCustomModel": true,
      "models": [
        {
          "id": "gpt-4o",
          "name": "gpt-4o"
        },
        {
          "id": "gpt-4.1",
          "name": "gpt-4.1"
        },
        {
          "id": "o1",
          "name": "o1"
        },
        {
          "id": "o3",
          "name": "o3"
        },
        {
          "id": "o4-mini",
          "name": "o4-mini"
        }
      ]
    }
  ],
  "defaultProvider": "google"
}

Example Requests

cURL:
curl -X GET "http://localhost:8001/models/config" \
  -H "Accept: application/json"
Python:
import requests

response = requests.get("http://localhost:8001/models/config")
config = response.json()

# List all providers
for provider in config["providers"]:
    print(f"Provider: {provider['name']}")
    for model in provider["models"]:
        print(f"  - {model['id']}")
JavaScript:
const response = await fetch('http://localhost:8001/models/config');
const config = await response.json();

// Get available models for a specific provider
const googleModels = config.providers
  .find(p => p.id === 'google')
  ?.models || [];

Using Models in Chat Completions

The model selection is integrated into the chat completions endpoint. You specify the provider and model when making requests.
POST /chat/completions/stream

Request Body

{
  "repo_url": "https://github.com/user/repo",
  "messages": [
    {
      "role": "user",
      "content": "Explain the main functionality of this repository"
    }
  ],
  "provider": "google",
  "model": "gemini-2.0-flash",
  "language": "en",
  "token": "optional_github_token_for_private_repos"
}

Parameters

ParameterTypeRequiredDescription
repo_urlstringYesURL of the repository to analyze
messagesarrayYesArray of chat messages
providerstringNoModel provider ID (default: “google”)
modelstringNoModel ID for the specified provider (uses provider’s default if not specified)
languagestringNoLanguage for content generation (default: “en”)
tokenstringNoPersonal access token for private repositories
typestringNoRepository type: “github”, “gitlab”, or “bitbucket” (default: “github”)

Example Requests

cURL with Google Gemini:
curl -X POST "http://localhost:8001/chat/completions/stream" \
  -H "Content-Type: application/json" \
  -d '{
    "repo_url": "https://github.com/asyncfuncai/deepwiki-open",
    "messages": [
      {
        "role": "user",
        "content": "What is the main purpose of this project?"
      }
    ],
    "provider": "google",
    "model": "gemini-2.0-flash"
  }'
Python with OpenAI:
import requests
import json

url = "http://localhost:8001/chat/completions/stream"
data = {
    "repo_url": "https://github.com/asyncfuncai/deepwiki-open",
    "messages": [
        {
            "role": "user",
            "content": "Explain the architecture of this application"
        }
    ],
    "provider": "openai",
    "model": "gpt-4o"
}

response = requests.post(url, json=data, stream=True)
for line in response.iter_lines():
    if line:
        print(line.decode('utf-8'))
JavaScript with OpenRouter:
const response = await fetch('http://localhost:8001/chat/completions/stream', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    repo_url: 'https://github.com/asyncfuncai/deepwiki-open',
    messages: [
      {
        role: 'user',
        content: 'What are the key features of this repository?'
      }
    ],
    provider: 'openrouter',
    model: 'anthropic/claude-3.5-sonnet'
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  console.log(decoder.decode(value));
}

Model Provider Details

Google (Gemini)

Default provider with fast and capable models. Available Models:
  • gemini-2.0-flash - Fast, efficient model (default)
  • gemini-2.5-flash-preview-05-20 - Preview of upcoming flash model
  • gemini-2.5-pro-preview-03-25 - Preview of pro model
Configuration:
{
  "provider": "google",
  "model": "gemini-2.0-flash"
}

OpenAI

Industry-standard GPT models. Available Models:
  • gpt-4o - Latest GPT-4 model (default)
  • gpt-4.1 - Updated GPT-4 version
  • o1 - Reasoning model
  • o3 - Advanced model
  • o4-mini - Smaller, faster model
Configuration:
{
  "provider": "openai",
  "model": "gpt-4o"
}

OpenRouter

Access multiple model providers through a unified API. Available Models:
  • openai/gpt-4o - OpenAI GPT-4 (default)
  • deepseek/deepseek-r1 - DeepSeek reasoning model
  • anthropic/claude-3.7-sonnet - Claude 3.7 Sonnet
  • anthropic/claude-3.5-sonnet - Claude 3.5 Sonnet
  • And many more…
Configuration:
{
  "provider": "openrouter",
  "model": "anthropic/claude-3.5-sonnet"
}

Azure OpenAI

Azure-hosted OpenAI models with enterprise features. Available Models:
  • gpt-4o - GPT-4 on Azure (default)
  • gpt-4 - Standard GPT-4
  • gpt-35-turbo - GPT-3.5 Turbo
  • gpt-4-turbo - GPT-4 Turbo
Configuration:
{
  "provider": "azure",
  "model": "gpt-4o"
}
Note: Requires Azure OpenAI endpoint and API version configuration.

Ollama

Run models locally for privacy and cost efficiency. Available Models:
  • qwen3:1.7b - Small, fast model (default)
  • llama3:8b - Llama 3 8B model
  • qwen3:8b - Qwen 3 8B model
Configuration:
{
  "provider": "ollama",
  "model": "llama3:8b"
}
Note: Requires Ollama to be running locally or accessible via OLLAMA_HOST.

AWS Bedrock

Amazon’s managed AI service. Available Models:
  • anthropic.claude-3-sonnet-20240229-v1:0 - Claude 3 Sonnet (default)
  • anthropic.claude-3-haiku-20240307-v1:0 - Claude 3 Haiku
  • anthropic.claude-3-opus-20240229-v1:0 - Claude 3 Opus
  • amazon.titan-text-express-v1 - Amazon Titan
  • cohere.command-r-v1:0 - Cohere Command R
  • ai21.j2-ultra-v1 - AI21 Jurassic
Configuration:
{
  "provider": "bedrock",
  "model": "anthropic.claude-3-sonnet-20240229-v1:0"
}

DashScope

Alibaba’s AI models. Available Models:
  • qwen-plus - Qwen Plus (default)
  • qwen-turbo - Qwen Turbo
  • deepseek-r1 - DeepSeek R1
Configuration:
{
  "provider": "dashscope",
  "model": "qwen-plus"
}

Custom Models

Providers that support custom models (where supportsCustomModel: true) allow you to specify model IDs not listed in the predefined options. This is useful for:
  • Newly released models
  • Fine-tuned models
  • Private or custom deployments
Example with custom model:
{
  "provider": "openai",
  "model": "ft:gpt-3.5-turbo-0125:custom:model:id"
}

Error Handling

The API returns standard HTTP status codes and error messages.

Common Errors

400 Bad Request:
{
  "detail": "No messages provided"
}
401 Unauthorized:
{
  "detail": "Invalid API key for provider"
}
404 Not Found:
{
  "detail": "Model not found for provider"
}
500 Internal Server Error:
{
  "detail": "Error preparing retriever: No valid document embeddings found"
}

Error Handling Examples

Python:
try:
    response = requests.post(url, json=data)
    response.raise_for_status()
    result = response.json()
except requests.exceptions.HTTPError as e:
    if e.response.status_code == 400:
        print(f"Bad request: {e.response.json()['detail']}")
    elif e.response.status_code == 500:
        print(f"Server error: {e.response.json()['detail']}")
JavaScript:
try {
  const response = await fetch(url, options);
  if (!response.ok) {
    const error = await response.json();
    throw new Error(error.detail);
  }
  const data = await response.json();
} catch (error) {
  console.error('API Error:', error.message);
}

Rate Limiting

Rate limiting depends on the model provider being used:
  • Google Gemini: Subject to Google AI Studio quotas
  • OpenAI: Based on your OpenAI tier and usage
  • OpenRouter: Depends on the specific model and your OpenRouter credits
  • Azure OpenAI: Based on your Azure deployment quotas
  • Ollama: Limited by local hardware resources
  • AWS Bedrock: Subject to AWS service quotas
  • DashScope: Based on Alibaba Cloud quotas
It’s recommended to implement retry logic with exponential backoff for production applications.

Best Practices

  1. Model Selection: Choose models based on your specific needs:
    • Use faster models (e.g., gemini-2.0-flash, gpt-4o-mini) for simple queries
    • Use more capable models (e.g., gpt-4o, claude-3.5-sonnet) for complex analysis
  2. Error Handling: Always implement proper error handling for API calls
  3. Streaming: The chat endpoint supports streaming responses for better user experience
  4. Caching: DeepWiki automatically caches wiki generation results to improve performance
  5. Security: Never expose API keys in client-side code; use environment variables
  6. Cost Optimization: Monitor usage and costs, especially with premium models

Configuration Files

DeepWiki uses JSON configuration files to manage model settings:
  • api/config/generator.json - Model provider configurations
  • api/config/embedder.json - Embedding model settings
  • api/config/repo.json - Repository processing settings
You can customize these files or use the DEEPWIKI_CONFIG_DIR environment variable to specify a custom configuration directory.