TokenLx Developer Docs
TokenLx is a unified AI gateway with OpenAI-style API compatibility. It supports text, image, video, and embedding capabilities across multiple models.
Overview
- Base URL:
https://api.tokenlx.ai/v1 - Protocol: HTTPS + JSON (OpenAI-compatible)
- Auth: Bearer API Key (create in console)
- Response format: OpenAI-compatible
Quick Start
1) Create an API Key
Create and manage your API key in the console "API Keys" page. The key is only shown once at creation — save it securely.
2) Make Your First Request
curl https://api.tokenlx.ai/v1/chat/completions \
-H "Authorization: Bearer <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [{"role": "user", "content": "Hello"}],
"stream": false
}'
Authentication
All requests require an API Key in the header:
Authorization: Bearer <YOUR_API_KEY>
Content-Type: application/json
Store your key in server-side environment variables — never expose it in frontend code.
Model Names
Each model can be called using two name formats:
| Type | Example | Description |
|---|---|---|
| Short name (modelName) | qwen-plus | TokenLx registered name |
| Alias | alibaba/qwen-plus | With vendor prefix for clarity |
Both formats are valid in the model field. See the pricing page for a full model list.
Optional Parameters
TokenLx normalizes requests across multiple models and providers. Optional parameters follow these rules:
- Used if supported: If the model supports the parameter, it takes effect normally
- Ignored if unsupported: If the model doesn't support a parameter, it is silently ignored — no error is returned
- Defaults applied: Omitted optional parameters use model or system defaults
This means you can use the same request structure across different models without worrying about parameter compatibility.
Chat Completions
- Endpoint:
POST /v1/chat/completions
Chat Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | - | Model name |
messages | array | Yes | - | Conversation message list |
stream | boolean | No | false | Enable streaming output |
temperature | float | No | 1.0 | Sampling temperature [0, 2]. Lower = more deterministic |
top_p | float | No | 1.0 | Nucleus sampling (0, 1]. Alternative to temperature |
top_k | integer | No | 0 | Limit candidate tokens (supported by some models) |
max_tokens | integer | No | Model default | Maximum tokens to generate |
max_completion_tokens | integer | No | Model default | Same as max_tokens (OpenAI new parameter name) |
frequency_penalty | float | No | 0 | Frequency penalty [-2, 2]. Positive reduces repetition |
presence_penalty | float | No | 0 | Presence penalty [-2, 2]. Positive encourages new topics |
repetition_penalty | float | No | 1.0 | Repetition penalty (0, 2] (supported by some models) |
stop | string/array | No | null | Stop sequences |
seed | integer | No | null | Random seed for deterministic output |
n | integer | No | 1 | Number of completions to generate |
response_format | object | No | null | Output format, e.g. {"type": "json_object"} |
tools | array | No | null | Tool/function definitions |
tool_choice | string/object | No | "auto" | Tool calling strategy: "none" / "auto" / "required" |
reasoning_effort | string | No | null | Thinking depth: "high" / "medium" / "low" / "none" |
thinking | object | No | null | Thinking config: {"type": "enabled", "budget_tokens": 8000} |
Non-streaming Request
curl https://api.tokenlx.ai/v1/chat/completions \
-H "Authorization: Bearer $TOKENLX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a haiku"}
],
"temperature": 0.7,
"max_tokens": 500
}'
Streaming
Set "stream": true to receive incremental results via SSE:
curl https://api.tokenlx.ai/v1/chat/completions \
-H "Authorization: Bearer $TOKENLX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [{"role": "user", "content": "Write a haiku"}],
"stream": true
}'
Reasoning Mode
Models with reasoning support can be controlled via reasoning_effort:
{
"model": "qwen3-235b",
"messages": [{"role": "user", "content": "Prove that sqrt(2) is irrational"}],
"reasoning_effort": "high",
"stream": true
}
Models supporting reasoning: qwen3-235b, qwen3-32b, qwen-max, qwen-plus, deepseek-r1, o1, o3-mini, o4-mini
Multimodal Input
Image Input
{
"model": "qwen-vl-max",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}
]
}
Video Input
{
"model": "qwen-vl-max",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Summarize this video"},
{"type": "video_url", "video_url": {"url": "https://example.com/video.mp4"}}
]
}
]
}
Anthropic Messages API
- Endpoint:
POST /v1/messages
Compatible with the Anthropic Messages protocol. Use Claude Agent SDK, Anthropic Python/TS SDK directly. Supports streaming and non-streaming, and works with non-Claude models too (protocol translation is handled internally).
Integration
Python SDK:
from anthropic import Anthropic
client = Anthropic(
base_url="https://api.tokenlx.ai",
api_key="<YOUR_API_KEY>"
)
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
print(message.content[0].text)
Claude Agent SDK:
export ANTHROPIC_BASE_URL=https://api.tokenlx.ai
export ANTHROPIC_API_KEY=<YOUR_API_KEY>
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model name |
messages | array | Yes | Anthropic format message list |
max_tokens | integer | Yes | Max output tokens |
system | string/array | No | System prompt (supports cache_control) |
stream | boolean | No | Enable streaming |
temperature | float | No | Sampling temperature |
top_p | float | No | Nucleus sampling |
top_k | float | No | Top-K |
tools | array | No | Anthropic format tool definitions |
tool_choice | object | No | Tool choice strategy |
thinking | object | No | Extended thinking, e.g. {"type": "enabled", "budget_tokens": 10000} |
stop_sequences | array | No | Stop sequences |
Anthropic Authentication
Two methods supported (either works):
x-api-key: <YOUR_API_KEY>Authorization: Bearer <YOUR_API_KEY>
Function Calling (Tools)
{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "What's the weather in Beijing?"}],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
],
"tool_choice": "auto"
}
Image Generation
- Endpoint:
POST /v1/aigc/image/generations - Response format:
{"data": ["url1", "url2", ...]}
Image generation is synchronous — returns generated image URLs directly.
Image Generation Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model name |
prompt | string | Yes | Generation prompt |
images | array | No | Input images (for editing) |
images[].file_uri | string | Yes | Image URL or base64 data URI |
images[].mime_type | string | No | MIME type, default image/png |
size | string | No | Output size, e.g. "1024x1024" / "2K" |
n | integer | No | Number of images, default 1 |
quality | string | No | Quality: "standard" / "hd" (OpenAI models) |
aspectRatio | string | No | Aspect ratio, e.g. "16:9" (Gemini/Kling) |
resolution | string | No | Resolution, e.g. "1K" / "2K" / "4K" (Gemini/Kling) |
seed | integer | No | Random seed (Doubao models) |
guidanceScale | float | No | Guidance scale (Doubao 3.0/4.5) |
promptExtend | boolean | No | Extend prompt (Alibaba models) |
temperature | float | No | Temperature (Gemini models) |
topP | float | No | Top-P (Gemini models) |
maxOutTokens | integer | No | Max output tokens (Gemini models) |
Supported Image Models
| Model | Vendor | Notes |
|---|---|---|
wanx2.7-image-pro | Alibaba | Latest, multi-image editing |
doubao-seedream-5.0 | Volcengine | High-res (2K/3K) |
doubao-seedream-4.0 | Volcengine | |
gemini-3-pro-image | aspectRatio + resolution | |
gpt-image-2 | OpenAI | |
gpt-image-1 | OpenAI | |
dall-e-3 | OpenAI | |
tc-gpt-image-2 | Tencent | |
kling-v3-omni | Kling | Async, requires polling |
Text-to-Image Example
{
"model": "doubao-seedream-5.0",
"prompt": "An orange cat napping on a windowsill in warm sunlight",
"size": "2K",
"seed": 42
}
Image Editing Example
{
"model": "wanx2.7-image-pro",
"prompt": "Repaint the first image in the oil painting style of the second",
"images": [
{"file_uri": "https://example.com/content.jpg", "mime_type": "image/jpeg"},
{"file_uri": "https://example.com/style.jpg", "mime_type": "image/jpeg"}
],
"size": "2K"
}
Video Generation
Video generation is asynchronous: submit a task → get task_id → poll for result → get video URL.
Create Video Task
- Endpoint:
POST /v1/aigc/video/tasks
Video Task Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model name |
prompt | string | Yes | Video description |
duration | integer | No | Duration in seconds |
aspectRatio | string | No | Aspect ratio, e.g. "16:9" / "9:16" |
resolution | string | No | Resolution, e.g. "720p" / "1080p" |
size | string | No | Pixel size, e.g. "1280x720" |
seed | integer | No | Random seed |
width | integer | No | Explicit width (Sora) |
height | integer | No | Explicit height (Sora) |
generateAudio | boolean | No | Generate audio track |
referenceImageUrls | array | No | Reference image URLs (image-to-video) |
referenceVideoUrls | array | No | Reference video URLs |
referenceAudioUrls | array | No | Reference audio URLs (audio-driven) |
videoInputMode | string | No | Input mode (see below) |
cameraFixed | boolean | No | Fix camera (Seedance 2.0) |
template | string | No | Scene template |
tools | array | No | Special tools, e.g. [{"type": "lip_sync"}] |
viduExtraJson | string | No | Vidu extended params JSON (subjects, callback_url, etc.) |
videoInputMode Values
| Value | Description | Images |
|---|---|---|
first_frame | First frame driven | 1 |
first_last_frame | First + last frame interpolation | 2 |
reference | Reference-guided generation | 1+ |
| Not set | Auto-infer (1→first_frame, 2→first_last_frame, 3+→reference) | - |
Query Video Task Result
- Endpoint:
GET /v1/aigc/video/tasks/{taskId}?model=xxx
Polling recommendation: 10-30 second intervals. Generation typically takes 30s to several minutes.
Supported Video Models
| Model | Vendor | Max Duration | Image-to-Video | Audio |
|---|---|---|---|---|
wan2.7 | Alibaba | 15s | reference | ✗ |
wan2.6 | Alibaba | 15s | first_frame/reference | ✓ |
doubao-seedance-2-0 | Volcengine | 10s | first/last/reference | ✓ |
kling-v3-omni-video | Kling | 15s | first/last/reference | ✓ |
viduq3-pro | Vidu | 16s | first_frame/reference | ✓ |
veo3.1 | 8s | first_frame/reference | ✗ | |
veo3 | 8s | ✗ | ✗ | |
sora-2.0 | OpenAI | 20s | ✗ | ✗ |
MiniMax-Hailuo-2.3 | MiniMax | 10s | first_frame | ✗ |
Text-to-Video Example
{
"model": "veo3.1",
"prompt": "Aerial drone shot over a tropical island at golden sunset",
"aspectRatio": "16:9",
"duration": 8,
"resolution": "1080p"
}
Image-to-Video Example
{
"model": "wan2.6",
"prompt": "The person in the image slowly turns and smiles",
"videoInputMode": "first_frame",
"referenceImageUrls": ["https://example.com/first-frame.jpg"],
"duration": 5
}
Embeddings
- Endpoint:
POST /v1/embeddings
curl https://api.tokenlx.ai/v1/embeddings \
-H "Authorization: Bearer $TOKENLX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-text-embedding-v4",
"input": ["TokenLx makes model integration easier"]
}'
SDK Examples
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
api_key="<YOUR_API_KEY>",
base_url="https://api.tokenlx.ai/v1"
)
response = client.chat.completions.create(
model="qwen-plus",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
Node.js (OpenAI SDK)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: '<YOUR_API_KEY>',
baseURL: 'https://api.tokenlx.ai/v1',
});
const response = await client.chat.completions.create({
model: 'qwen-plus',
messages: [{ role: 'user', content: 'Hello' }],
});
console.log(response.choices[0].message.content);
Response Format
Non-streaming Response
{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"created": 1234567890,
"model": "qwen-plus",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 15,
"total_tokens": 25
}
}
Streaming Response (SSE)
data: {"id":"chatcmpl-xxx","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"chatcmpl-xxx","choices":[{"delta":{"content":"!"},"index":0}]}
data: {"id":"chatcmpl-xxx","choices":[{"delta":{},"finish_reason":"stop","index":0}]}
data: [DONE]
Error Codes
| HTTP Status | Error Code | Meaning |
|---|---|---|
| 400 | invalid_request_error | Invalid parameters or missing required fields |
| 401 | invalid_api_key | Missing or invalid API key |
| 402 | insufficient_balance | Insufficient credit balance |
| 429 | rate_limit_exceeded | Request rate too high (default 60 RPM) |
| 500 | upstream_error | Upstream model service failure |
| 503 | model_unavailable | Model is temporarily unavailable |
Billing
- 1 credit = 1 USD
- Billed by actual token usage, no minimum spend
- Text models: per million tokens (input/output)
- Image models: per request or per resolution
- Video models: per request or per second
- Some models have discounts — see pricing page
- Requests return 402 when credit balance is insufficient
Limits
| Item | Default |
|---|---|
| API Keys | Max 5 per user |
| Request rate (RPM) | 60 requests/minute (adjustable in Key settings) |
| Token rate (TPM) | 100,000 tokens/minute |
Best Practices
Error Retry
Implement exponential backoff for 429 (rate limit) and 5xx (server errors):
import time
from openai import OpenAI, RateLimitError, APIError
client = OpenAI(api_key="<YOUR_API_KEY>", base_url="https://api.tokenlx.ai/v1")
def chat_with_retry(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(model="qwen-plus", messages=messages)
except RateLimitError:
wait = 2 ** attempt
time.sleep(wait)
except APIError as e:
if e.status_code >= 500:
time.sleep(2 ** attempt)
else:
raise
raise Exception("Max retries exceeded")
Security Tips
- Never expose API Keys in frontend code
- Store keys in environment variables
- Rotate API Keys regularly
- Monitor for unusual usage
API Reference
| API | Method | Endpoint | Description |
|---|---|---|---|
| Chat Completions | POST | /v1/chat/completions | Text generation, supports streaming |
| Anthropic Messages API | POST | /v1/messages | Claude SDK compatible, supports Agent SDK |
| Image Generation | POST | /v1/aigc/image/generations | Sync image generation, returns URL list |
| Video Generation (Create) | POST | /v1/aigc/video/tasks | Async task, returns task_id |
| Video Generation (Query) | GET | /v1/aigc/video/tasks/{task_id}?model=xxx | Poll task status and result |
| Embeddings | POST | /v1/embeddings | Text embedding |
| Rerank | POST | /v1/rerank/text | Rerank models |
| Speech | POST | /v1/generate/speech | TTS text-to-speech |
Base URL: https://api.tokenlx.ai/v1