TokenLX

TokenLx Developer Docs

TokenLx is a unified AI gateway with OpenAI-style API compatibility. It supports text, image, video, and embedding capabilities across multiple models.

Overview

  • Base URL: https://api.tokenlx.ai/v1
  • Protocol: HTTPS + JSON (OpenAI-compatible)
  • Auth: Bearer API Key (create in console)
  • Response format: OpenAI-compatible

Quick Start

1) Create an API Key

Create and manage your API key in the console "API Keys" page. The key is only shown once at creation — save it securely.

2) Make Your First Request

curl https://api.tokenlx.ai/v1/chat/completions \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-plus",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": false
  }'

Authentication

All requests require an API Key in the header:

Authorization: Bearer <YOUR_API_KEY>
Content-Type: application/json

Store your key in server-side environment variables — never expose it in frontend code.

Model Names

Each model can be called using two name formats:

TypeExampleDescription
Short name (modelName)qwen-plusTokenLx registered name
Aliasalibaba/qwen-plusWith vendor prefix for clarity

Both formats are valid in the model field. See the pricing page for a full model list.

Optional Parameters

TokenLx normalizes requests across multiple models and providers. Optional parameters follow these rules:

  • Used if supported: If the model supports the parameter, it takes effect normally
  • Ignored if unsupported: If the model doesn't support a parameter, it is silently ignored — no error is returned
  • Defaults applied: Omitted optional parameters use model or system defaults

This means you can use the same request structure across different models without worrying about parameter compatibility.


Chat Completions

  • Endpoint: POST /v1/chat/completions

Chat Parameters

ParameterTypeRequiredDefaultDescription
modelstringYes-Model name
messagesarrayYes-Conversation message list
streambooleanNofalseEnable streaming output
temperaturefloatNo1.0Sampling temperature [0, 2]. Lower = more deterministic
top_pfloatNo1.0Nucleus sampling (0, 1]. Alternative to temperature
top_kintegerNo0Limit candidate tokens (supported by some models)
max_tokensintegerNoModel defaultMaximum tokens to generate
max_completion_tokensintegerNoModel defaultSame as max_tokens (OpenAI new parameter name)
frequency_penaltyfloatNo0Frequency penalty [-2, 2]. Positive reduces repetition
presence_penaltyfloatNo0Presence penalty [-2, 2]. Positive encourages new topics
repetition_penaltyfloatNo1.0Repetition penalty (0, 2] (supported by some models)
stopstring/arrayNonullStop sequences
seedintegerNonullRandom seed for deterministic output
nintegerNo1Number of completions to generate
response_formatobjectNonullOutput format, e.g. {"type": "json_object"}
toolsarrayNonullTool/function definitions
tool_choicestring/objectNo"auto"Tool calling strategy: "none" / "auto" / "required"
reasoning_effortstringNonullThinking depth: "high" / "medium" / "low" / "none"
thinkingobjectNonullThinking config: {"type": "enabled", "budget_tokens": 8000}

Non-streaming Request

curl https://api.tokenlx.ai/v1/chat/completions \
  -H "Authorization: Bearer $TOKENLX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Write a haiku"}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

Streaming

Set "stream": true to receive incremental results via SSE:

curl https://api.tokenlx.ai/v1/chat/completions \
  -H "Authorization: Bearer $TOKENLX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-plus",
    "messages": [{"role": "user", "content": "Write a haiku"}],
    "stream": true
  }'

Reasoning Mode

Models with reasoning support can be controlled via reasoning_effort:

{
  "model": "qwen3-235b",
  "messages": [{"role": "user", "content": "Prove that sqrt(2) is irrational"}],
  "reasoning_effort": "high",
  "stream": true
}

Models supporting reasoning: qwen3-235b, qwen3-32b, qwen-max, qwen-plus, deepseek-r1, o1, o3-mini, o4-mini


Multimodal Input

Image Input

{
  "model": "qwen-vl-max",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this image"},
        {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
      ]
    }
  ]
}

Video Input

{
  "model": "qwen-vl-max",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Summarize this video"},
        {"type": "video_url", "video_url": {"url": "https://example.com/video.mp4"}}
      ]
    }
  ]
}

Anthropic Messages API

  • Endpoint: POST /v1/messages

Compatible with the Anthropic Messages protocol. Use Claude Agent SDK, Anthropic Python/TS SDK directly. Supports streaming and non-streaming, and works with non-Claude models too (protocol translation is handled internally).

Integration

Python SDK:

from anthropic import Anthropic

client = Anthropic(
    base_url="https://api.tokenlx.ai",
    api_key="<YOUR_API_KEY>"
)

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)
print(message.content[0].text)

Claude Agent SDK:

export ANTHROPIC_BASE_URL=https://api.tokenlx.ai
export ANTHROPIC_API_KEY=<YOUR_API_KEY>

Request Parameters

ParameterTypeRequiredDescription
modelstringYesModel name
messagesarrayYesAnthropic format message list
max_tokensintegerYesMax output tokens
systemstring/arrayNoSystem prompt (supports cache_control)
streambooleanNoEnable streaming
temperaturefloatNoSampling temperature
top_pfloatNoNucleus sampling
top_kfloatNoTop-K
toolsarrayNoAnthropic format tool definitions
tool_choiceobjectNoTool choice strategy
thinkingobjectNoExtended thinking, e.g. {"type": "enabled", "budget_tokens": 10000}
stop_sequencesarrayNoStop sequences

Anthropic Authentication

Two methods supported (either works):

  • x-api-key: <YOUR_API_KEY>
  • Authorization: Bearer <YOUR_API_KEY>

Function Calling (Tools)

{
  "model": "gpt-4o",
  "messages": [{"role": "user", "content": "What's the weather in Beijing?"}],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather information for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {"type": "string", "description": "City name"}
          },
          "required": ["city"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Image Generation

  • Endpoint: POST /v1/aigc/image/generations
  • Response format: {"data": ["url1", "url2", ...]}

Image generation is synchronous — returns generated image URLs directly.

Image Generation Parameters

ParameterTypeRequiredDescription
modelstringYesModel name
promptstringYesGeneration prompt
imagesarrayNoInput images (for editing)
images[].file_uristringYesImage URL or base64 data URI
images[].mime_typestringNoMIME type, default image/png
sizestringNoOutput size, e.g. "1024x1024" / "2K"
nintegerNoNumber of images, default 1
qualitystringNoQuality: "standard" / "hd" (OpenAI models)
aspectRatiostringNoAspect ratio, e.g. "16:9" (Gemini/Kling)
resolutionstringNoResolution, e.g. "1K" / "2K" / "4K" (Gemini/Kling)
seedintegerNoRandom seed (Doubao models)
guidanceScalefloatNoGuidance scale (Doubao 3.0/4.5)
promptExtendbooleanNoExtend prompt (Alibaba models)
temperaturefloatNoTemperature (Gemini models)
topPfloatNoTop-P (Gemini models)
maxOutTokensintegerNoMax output tokens (Gemini models)

Supported Image Models

ModelVendorNotes
wanx2.7-image-proAlibabaLatest, multi-image editing
doubao-seedream-5.0VolcengineHigh-res (2K/3K)
doubao-seedream-4.0Volcengine
gemini-3-pro-imageGoogleaspectRatio + resolution
gpt-image-2OpenAI
gpt-image-1OpenAI
dall-e-3OpenAI
tc-gpt-image-2Tencent
kling-v3-omniKlingAsync, requires polling

Text-to-Image Example

{
  "model": "doubao-seedream-5.0",
  "prompt": "An orange cat napping on a windowsill in warm sunlight",
  "size": "2K",
  "seed": 42
}

Image Editing Example

{
  "model": "wanx2.7-image-pro",
  "prompt": "Repaint the first image in the oil painting style of the second",
  "images": [
    {"file_uri": "https://example.com/content.jpg", "mime_type": "image/jpeg"},
    {"file_uri": "https://example.com/style.jpg", "mime_type": "image/jpeg"}
  ],
  "size": "2K"
}

Video Generation

Video generation is asynchronous: submit a task → get task_id → poll for result → get video URL.

Create Video Task

  • Endpoint: POST /v1/aigc/video/tasks

Video Task Parameters

ParameterTypeRequiredDescription
modelstringYesModel name
promptstringYesVideo description
durationintegerNoDuration in seconds
aspectRatiostringNoAspect ratio, e.g. "16:9" / "9:16"
resolutionstringNoResolution, e.g. "720p" / "1080p"
sizestringNoPixel size, e.g. "1280x720"
seedintegerNoRandom seed
widthintegerNoExplicit width (Sora)
heightintegerNoExplicit height (Sora)
generateAudiobooleanNoGenerate audio track
referenceImageUrlsarrayNoReference image URLs (image-to-video)
referenceVideoUrlsarrayNoReference video URLs
referenceAudioUrlsarrayNoReference audio URLs (audio-driven)
videoInputModestringNoInput mode (see below)
cameraFixedbooleanNoFix camera (Seedance 2.0)
templatestringNoScene template
toolsarrayNoSpecial tools, e.g. [{"type": "lip_sync"}]
viduExtraJsonstringNoVidu extended params JSON (subjects, callback_url, etc.)

videoInputMode Values

ValueDescriptionImages
first_frameFirst frame driven1
first_last_frameFirst + last frame interpolation2
referenceReference-guided generation1+
Not setAuto-infer (1→first_frame, 2→first_last_frame, 3+→reference)-

Query Video Task Result

  • Endpoint: GET /v1/aigc/video/tasks/{taskId}?model=xxx

Polling recommendation: 10-30 second intervals. Generation typically takes 30s to several minutes.

Supported Video Models

ModelVendorMax DurationImage-to-VideoAudio
wan2.7Alibaba15sreference
wan2.6Alibaba15sfirst_frame/reference
doubao-seedance-2-0Volcengine10sfirst/last/reference
kling-v3-omni-videoKling15sfirst/last/reference
viduq3-proVidu16sfirst_frame/reference
veo3.1Google8sfirst_frame/reference
veo3Google8s
sora-2.0OpenAI20s
MiniMax-Hailuo-2.3MiniMax10sfirst_frame

Text-to-Video Example

{
  "model": "veo3.1",
  "prompt": "Aerial drone shot over a tropical island at golden sunset",
  "aspectRatio": "16:9",
  "duration": 8,
  "resolution": "1080p"
}

Image-to-Video Example

{
  "model": "wan2.6",
  "prompt": "The person in the image slowly turns and smiles",
  "videoInputMode": "first_frame",
  "referenceImageUrls": ["https://example.com/first-frame.jpg"],
  "duration": 5
}

Embeddings

  • Endpoint: POST /v1/embeddings
curl https://api.tokenlx.ai/v1/embeddings \
  -H "Authorization: Bearer $TOKENLX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-text-embedding-v4",
    "input": ["TokenLx makes model integration easier"]
  }'

SDK Examples

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    api_key="<YOUR_API_KEY>",
    base_url="https://api.tokenlx.ai/v1"
)

response = client.chat.completions.create(
    model="qwen-plus",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Node.js (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: '<YOUR_API_KEY>',
  baseURL: 'https://api.tokenlx.ai/v1',
});

const response = await client.chat.completions.create({
  model: 'qwen-plus',
  messages: [{ role: 'user', content: 'Hello' }],
});
console.log(response.choices[0].message.content);

Response Format

Non-streaming Response

{
  "id": "chatcmpl-xxx",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "qwen-plus",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 15,
    "total_tokens": 25
  }
}

Streaming Response (SSE)

data: {"id":"chatcmpl-xxx","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"id":"chatcmpl-xxx","choices":[{"delta":{"content":"!"},"index":0}]}

data: {"id":"chatcmpl-xxx","choices":[{"delta":{},"finish_reason":"stop","index":0}]}

data: [DONE]

Error Codes

HTTP StatusError CodeMeaning
400invalid_request_errorInvalid parameters or missing required fields
401invalid_api_keyMissing or invalid API key
402insufficient_balanceInsufficient credit balance
429rate_limit_exceededRequest rate too high (default 60 RPM)
500upstream_errorUpstream model service failure
503model_unavailableModel is temporarily unavailable

Billing

  • 1 credit = 1 USD
  • Billed by actual token usage, no minimum spend
  • Text models: per million tokens (input/output)
  • Image models: per request or per resolution
  • Video models: per request or per second
  • Some models have discounts — see pricing page
  • Requests return 402 when credit balance is insufficient

Limits

ItemDefault
API KeysMax 5 per user
Request rate (RPM)60 requests/minute (adjustable in Key settings)
Token rate (TPM)100,000 tokens/minute

Best Practices

Error Retry

Implement exponential backoff for 429 (rate limit) and 5xx (server errors):

import time
from openai import OpenAI, RateLimitError, APIError

client = OpenAI(api_key="<YOUR_API_KEY>", base_url="https://api.tokenlx.ai/v1")

def chat_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(model="qwen-plus", messages=messages)
        except RateLimitError:
            wait = 2 ** attempt
            time.sleep(wait)
        except APIError as e:
            if e.status_code >= 500:
                time.sleep(2 ** attempt)
            else:
                raise
    raise Exception("Max retries exceeded")

Security Tips

  • Never expose API Keys in frontend code
  • Store keys in environment variables
  • Rotate API Keys regularly
  • Monitor for unusual usage

API Reference

APIMethodEndpointDescription
Chat CompletionsPOST/v1/chat/completionsText generation, supports streaming
Anthropic Messages APIPOST/v1/messagesClaude SDK compatible, supports Agent SDK
Image GenerationPOST/v1/aigc/image/generationsSync image generation, returns URL list
Video Generation (Create)POST/v1/aigc/video/tasksAsync task, returns task_id
Video Generation (Query)GET/v1/aigc/video/tasks/{task_id}?model=xxxPoll task status and result
EmbeddingsPOST/v1/embeddingsText embedding
RerankPOST/v1/rerank/textRerank models
SpeechPOST/v1/generate/speechTTS text-to-speech

Base URL: https://api.tokenlx.ai/v1