Model Endpoints API

DeepWiki provides a flexible provider-based model selection system that supports multiple LLM providers. This documentation covers the model-related API endpoints and how to work with different model providers.

Overview

DeepWiki’s model provider system allows you to choose from various AI model providers including:

Google - Gemini models
OpenAI - GPT models
OpenRouter - Access to multiple model providers through a unified API
Azure OpenAI - Azure-hosted OpenAI models
Ollama - Locally running open-source models
AWS Bedrock - Amazon’s managed AI models
DashScope - Alibaba’s AI models

Each provider offers different models with specific capabilities and pricing. The system is designed to be extensible, allowing service providers to add custom models as needed.

Authentication

Before using any model provider, you need to configure the appropriate API keys as environment variables:

# Google Gemini
GOOGLE_API_KEY=your_google_api_key

# OpenAI
OPENAI_API_KEY=your_openai_api_key

# OpenRouter
OPENROUTER_API_KEY=your_openrouter_api_key

# Azure OpenAI
AZURE_OPENAI_API_KEY=your_azure_openai_api_key
AZURE_OPENAI_ENDPOINT=your_azure_openai_endpoint
AZURE_OPENAI_VERSION=your_azure_openai_version

# AWS Bedrock
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=your_aws_region

# Ollama (if not local)
OLLAMA_HOST=http://your-ollama-host:11434

# DashScope
DASHSCOPE_API_KEY=your_dashscope_api_key

Endpoints

Get Model Configuration

Retrieves the available model providers and their supported models.

GET /models/config

Response

{
  "providers": [
    {
      "id": "google",
      "name": "Google",
      "supportsCustomModel": true,
      "models": [
        {
          "id": "gemini-2.0-flash",
          "name": "gemini-2.0-flash"
        },
        {
          "id": "gemini-2.5-flash-preview-05-20",
          "name": "gemini-2.5-flash-preview-05-20"
        },
        {
          "id": "gemini-2.5-pro-preview-03-25",
          "name": "gemini-2.5-pro-preview-03-25"
        }
      ]
    },
    {
      "id": "openai",
      "name": "Openai",
      "supportsCustomModel": true,
      "models": [
        {
          "id": "gpt-4o",
          "name": "gpt-4o"
        },
        {
          "id": "gpt-4.1",
          "name": "gpt-4.1"
        },
        {
          "id": "o1",
          "name": "o1"
        },
        {
          "id": "o3",
          "name": "o3"
        },
        {
          "id": "o4-mini",
          "name": "o4-mini"
        }
      ]
    }
  ],
  "defaultProvider": "google"
}

Example Requests

cURL:

curl -X GET "http://localhost:8001/models/config" \
  -H "Accept: application/json"

Python:

import requests

response = requests.get("http://localhost:8001/models/config")
config = response.json()

# List all providers
for provider in config["providers"]:
    print(f"Provider: {provider['name']}")
    for model in provider["models"]:
        print(f"  - {model['id']}")

JavaScript:

const response = await fetch('http://localhost:8001/models/config');
const config = await response.json();

// Get available models for a specific provider
const googleModels = config.providers
  .find(p => p.id === 'google')
  ?.models || [];

Using Models in Chat Completions

The model selection is integrated into the chat completions endpoint. You specify the provider and model when making requests.

POST /chat/completions/stream

Request Body

{
  "repo_url": "https://github.com/user/repo",
  "messages": [
    {
      "role": "user",
      "content": "Explain the main functionality of this repository"
    }
  ],
  "provider": "google",
  "model": "gemini-2.0-flash",
  "language": "en",
  "token": "optional_github_token_for_private_repos"
}

Parameters

Parameter	Type	Required	Description
`repo_url`	string	Yes	URL of the repository to analyze
`messages`	array	Yes	Array of chat messages
`provider`	string	No	Model provider ID (default: “google”)
`model`	string	No	Model ID for the specified provider (uses provider’s default if not specified)
`language`	string	No	Language for content generation (default: “en”)
`token`	string	No	Personal access token for private repositories
`type`	string	No	Repository type: “github”, “gitlab”, or “bitbucket” (default: “github”)

Example Requests

cURL with Google Gemini:

curl -X POST "http://localhost:8001/chat/completions/stream" \
  -H "Content-Type: application/json" \
  -d '{
    "repo_url": "https://github.com/asyncfuncai/deepwiki-open",
    "messages": [
      {
        "role": "user",
        "content": "What is the main purpose of this project?"
      }
    ],
    "provider": "google",
    "model": "gemini-2.0-flash"
  }'

Python with OpenAI:

import requests
import json

url = "http://localhost:8001/chat/completions/stream"
data = {
    "repo_url": "https://github.com/asyncfuncai/deepwiki-open",
    "messages": [
        {
            "role": "user",
            "content": "Explain the architecture of this application"
        }
    ],
    "provider": "openai",
    "model": "gpt-4o"
}

response = requests.post(url, json=data, stream=True)
for line in response.iter_lines():
    if line:
        print(line.decode('utf-8'))

JavaScript with OpenRouter:

const response = await fetch('http://localhost:8001/chat/completions/stream', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    repo_url: 'https://github.com/asyncfuncai/deepwiki-open',
    messages: [
      {
        role: 'user',
        content: 'What are the key features of this repository?'
      }
    ],
    provider: 'openrouter',
    model: 'anthropic/claude-3.5-sonnet'
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  console.log(decoder.decode(value));
}

Model Provider Details

Google (Gemini)

Default provider with fast and capable models. Available Models:

gemini-2.0-flash - Fast, efficient model (default)
gemini-2.5-flash-preview-05-20 - Preview of upcoming flash model
gemini-2.5-pro-preview-03-25 - Preview of pro model

Configuration:

{
  "provider": "google",
  "model": "gemini-2.0-flash"
}

OpenAI

Industry-standard GPT models. Available Models:

gpt-4o - Latest GPT-4 model (default)
gpt-4.1 - Updated GPT-4 version
o1 - Reasoning model
o3 - Advanced model
o4-mini - Smaller, faster model

Configuration:

{
  "provider": "openai",
  "model": "gpt-4o"
}

OpenRouter

Access multiple model providers through a unified API. Available Models:

openai/gpt-4o - OpenAI GPT-4 (default)
deepseek/deepseek-r1 - DeepSeek reasoning model
anthropic/claude-3.7-sonnet - Claude 3.7 Sonnet
anthropic/claude-3.5-sonnet - Claude 3.5 Sonnet
And many more…

Configuration:

{
  "provider": "openrouter",
  "model": "anthropic/claude-3.5-sonnet"
}

Azure OpenAI

Azure-hosted OpenAI models with enterprise features. Available Models:

gpt-4o - GPT-4 on Azure (default)
gpt-4 - Standard GPT-4
gpt-35-turbo - GPT-3.5 Turbo
gpt-4-turbo - GPT-4 Turbo

Configuration:

{
  "provider": "azure",
  "model": "gpt-4o"
}

Note: Requires Azure OpenAI endpoint and API version configuration.

Ollama

Run models locally for privacy and cost efficiency. Available Models:

qwen3:1.7b - Small, fast model (default)
llama3:8b - Llama 3 8B model
qwen3:8b - Qwen 3 8B model

Configuration:

{
  "provider": "ollama",
  "model": "llama3:8b"
}

Note: Requires Ollama to be running locally or accessible via OLLAMA_HOST.

AWS Bedrock

Amazon’s managed AI service. Available Models:

anthropic.claude-3-sonnet-20240229-v1:0 - Claude 3 Sonnet (default)
anthropic.claude-3-haiku-20240307-v1:0 - Claude 3 Haiku
anthropic.claude-3-opus-20240229-v1:0 - Claude 3 Opus
amazon.titan-text-express-v1 - Amazon Titan
cohere.command-r-v1:0 - Cohere Command R
ai21.j2-ultra-v1 - AI21 Jurassic

Configuration:

{
  "provider": "bedrock",
  "model": "anthropic.claude-3-sonnet-20240229-v1:0"
}

DashScope

Alibaba’s AI models. Available Models:

qwen-plus - Qwen Plus (default)
qwen-turbo - Qwen Turbo
deepseek-r1 - DeepSeek R1

Configuration:

{
  "provider": "dashscope",
  "model": "qwen-plus"
}

Custom Models

Providers that support custom models (where supportsCustomModel: true) allow you to specify model IDs not listed in the predefined options. This is useful for:

Newly released models
Fine-tuned models
Private or custom deployments

Example with custom model:

{
  "provider": "openai",
  "model": "ft:gpt-3.5-turbo-0125:custom:model:id"
}

Error Handling

The API returns standard HTTP status codes and error messages.

Common Errors

400 Bad Request:

{
  "detail": "No messages provided"
}

401 Unauthorized:

{
  "detail": "Invalid API key for provider"
}

404 Not Found:

{
  "detail": "Model not found for provider"
}

500 Internal Server Error:

{
  "detail": "Error preparing retriever: No valid document embeddings found"
}

Error Handling Examples

Python:

try:
    response = requests.post(url, json=data)
    response.raise_for_status()
    result = response.json()
except requests.exceptions.HTTPError as e:
    if e.response.status_code == 400:
        print(f"Bad request: {e.response.json()['detail']}")
    elif e.response.status_code == 500:
        print(f"Server error: {e.response.json()['detail']}")

JavaScript:

try {
  const response = await fetch(url, options);
  if (!response.ok) {
    const error = await response.json();
    throw new Error(error.detail);
  }
  const data = await response.json();
} catch (error) {
  console.error('API Error:', error.message);
}

Rate Limiting

Rate limiting depends on the model provider being used:

Google Gemini: Subject to Google AI Studio quotas
OpenAI: Based on your OpenAI tier and usage
OpenRouter: Depends on the specific model and your OpenRouter credits
Azure OpenAI: Based on your Azure deployment quotas
Ollama: Limited by local hardware resources
AWS Bedrock: Subject to AWS service quotas
DashScope: Based on Alibaba Cloud quotas

It’s recommended to implement retry logic with exponential backoff for production applications.

Best Practices

Model Selection: Choose models based on your specific needs:
- Use faster models (e.g., gemini-2.0-flash, gpt-4o-mini) for simple queries
- Use more capable models (e.g., gpt-4o, claude-3.5-sonnet) for complex analysis
Error Handling: Always implement proper error handling for API calls
Streaming: The chat endpoint supports streaming responses for better user experience
Caching: DeepWiki automatically caches wiki generation results to improve performance
Security: Never expose API keys in client-side code; use environment variables
Cost Optimization: Monitor usage and costs, especially with premium models

Configuration Files

DeepWiki uses JSON configuration files to manage model settings:

api/config/generator.json - Model provider configurations
api/config/embedder.json - Embedding model settings
api/config/repo.json - Repository processing settings

You can customize these files or use the DEEPWIKI_CONFIG_DIR environment variable to specify a custom configuration directory.

API Documentation

Model endpoints

Model Endpoints API

Overview

Authentication

Endpoints

Get Model Configuration

Response

Example Requests

Using Models in Chat Completions

Request Body

Parameters

Example Requests

Model Provider Details

Google (Gemini)

OpenAI

OpenRouter

Azure OpenAI

Ollama

AWS Bedrock

DashScope

Custom Models

Error Handling

Common Errors

Error Handling Examples

Rate Limiting

Best Practices

Configuration Files

API Documentation

​Model Endpoints API

​Overview

​Authentication

​Endpoints

​Get Model Configuration

​Response

​Example Requests

​Using Models in Chat Completions

​Request Body

​Parameters

​Example Requests

​Model Provider Details

​Google (Gemini)

​OpenAI

​OpenRouter

​Azure OpenAI

​Ollama

​AWS Bedrock

​DashScope

​Custom Models

​Error Handling

​Common Errors

​Error Handling Examples

​Rate Limiting

​Best Practices

​Configuration Files

Model Endpoints API

Overview

Authentication

Endpoints

Get Model Configuration

Response

Example Requests

Using Models in Chat Completions

Request Body

Parameters

Example Requests

Model Provider Details

Google (Gemini)

OpenAI

OpenRouter

Azure OpenAI

Ollama

AWS Bedrock

DashScope

Custom Models

Error Handling

Common Errors

Error Handling Examples

Rate Limiting

Best Practices

Configuration Files