TokenLX
QWEN

alibaba/qwen-gte-rerank-v2

128K  context$0.12/M tokens input4.76B  tokens servedRerank

Alibaba Qwen series — Chinese-first LLMs with strong bilingual support. Wide range from turbo to max tiers.

Key strengths

  • Strong Chinese
  • Good English
  • Multiple size tiers
  • Tool calling

Use cases

  • Chinese assistants
  • Bilingual content
  • Enterprise chat
  • Cross-border apps
chinesebilingual

Alibaba's alibaba/qwen-gte-rerank-v2 is a cross-encoder reranking model. Designed for the second stage of retrieval pipelines, it scores query-document pairs to improve relevance ranking from initial vector or BM25 retrieval.

Particularly effective in hybrid search systems and RAG applications where surfacing the most relevant documents at the top of the result set materially improves downstream generation quality.

alibaba/qwen-gte-rerank-v2 is fully OpenAI-compatible — drop in your existing OpenAI Python or Node SDK and switch `baseURL` to `https://api.tokenlx.ai`. TokenLX transparently routes your requests to the optimal provider endpoint while preserving streaming, function-calling, and structured-output semantics.

Performance

Compare different providers across TokenLX · All locations.

Throughput
44
tok/s
Latency
128
ms
E2E Latency
200
ms
Tool Call Errors
0.07
%
Output Errors
0.36
%
Time to First Token
105
ms

Effective Pricing

Actual cost per million tokens across providers over the past 7 days.

Input
$0.12
per 1M tokens
7d agotoday

Recent activity

Total usage per day on TokenLX (last 30 days).

Prompt
1.38B
Completion
3.38B
30d ago15d agotoday

Sample code & API

TokenLX normalizes requests and responses across providers. Use any OpenAI SDK or our native SDK.

# Python — use HTTP client directly
# Endpoint: POST https://api.tokenlx.ai/v1/videos/generations
# Headers:  Authorization: Bearer $TOKENLX_API_KEY
# Body:     { "model": "qwen-gte-rerank-v2", "prompt": "...", "duration": 5 }

Replace sk-aihubrouter-… with your key from the dashboard.