Skip to content
Skip to content

Rate Limits

The Balchemy API enforces rate limits at three levels: per-IP for global HTTP traffic, per-user for AI request quotas, and per-user for fast (own-key) AI requests. All limits use Redis INCR + EXPIRE counters with a fail-open policy — if Redis is unavailable, traffic is allowed through rather than blocked.


Mechanism

Rate limiting is implemented as a NestJS guard applied globally to all API routes. The algorithm is a fixed window counter:

  1. On each request, the guard increments a Redis key scoped to IP + route.
  2. On the first increment, a TTL is set for the window duration.
  3. If the counter exceeds the limit, the guard throws 429 Too Many Requests.
  4. If Redis is unreachable, the guard fails open (request is allowed).

Rate limit state persists in Redis only. Restarting the backend does not reset counters.


Global API rate limit (per IP)

Applied to all HTTP API routes except health endpoints (/api/nest/health/*, /health).

ParameterValue
Window15 minutes
Max requests100 per window
KeyIP address + route scope
ConfigurableYes — via GlobalSettings.rateLimitConfig in MongoDB

The limit policy can be updated at runtime by an admin without a server restart. The guard polls MongoDB every 30 seconds and caches the active policy.

Override per endpoint

Individual endpoints can override the global limit using the @RateLimit() decorator:

@RateLimit({ windowMs: 60_000, max: 10, message: "Slow down." })
@Post("/some-route")
async sensitiveEndpoint() { ... }

Endpoints annotated with @SkipRateLimit() bypass the guard entirely.


AI usage quota (per user)

Applied to AI message processing — ask_bot, trade_command, and the web chat endpoint.

Platform API keys (daily quota)

When users route AI requests through Balchemy's API keys:

ParameterValue
Window24 hours (rolling from first request)
Redis keyrl:daily:<userId>
LimitConfigured per user via calculateUserLimits()
UnlimitedLimit value of -1 means no cap
Fail-opentrue — Redis outage allows all traffic

Own API keys (monthly quota)

When users bring their own LLM API keys:

ParameterValue
Window30 days (rolling from first request)
Redis keyrl:monthly:<userId>
LimitConfigured per user via calculateUserLimits()
DisabledLimit value of 0 means feature not provisioned
UnlimitedLimit value of -1 means no cap
Fail-opentrue

AI quotas are not AI-model-specific — they count request events, not token usage.


MCP rate limit

MCP tool calls go through the global per-IP guard and additionally respect the user's AI quota when ask_bot or trade_command is called.

There is no separate per-bot MCP rate limit distinct from the above. High-throughput integrations should distribute load across time rather than burst.


Rate limit response headers

When the global IP guard is active, every response includes:

HeaderValue
X-RateLimit-LimitMaximum requests allowed in the window
X-RateLimit-RemainingRequests remaining in the current window
X-RateLimit-ResetUnix timestamp (seconds) when the window resets
Retry-AfterSeconds to wait before retrying (only on 429 responses)

Example response headers when approaching the limit:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 3
X-RateLimit-Reset: 1742389200

Example response headers on a 429:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1742389200
Retry-After: 47
Content-Type: application/json
 
{
  "statusCode": 429,
  "message": "Too many requests from this IP, please try again later."
}

Rate limits by endpoint category

CategoryWindowLimitNotes
All API routes (default)15 min100 / IPConfigurable via GlobalSettings
Health endpointsExempt/api/nest/health/*, /health
AI requests (platform keys)24 hPer-user limitFails open if Redis unavailable
AI requests (own keys)30 daysPer-user limit0 = feature disabled
MCP tool callsInherits global100 / 15 min / IPPlus AI quota on LLM-backed tools

Quota exceeded behavior

When a user's AI daily quota is exhausted, the API does not return a 429. Instead, the message handler returns a 200 response with a user-friendly message:

You have reached your daily request limit. Your quota resets in 4 hours.

When the IP rate limit is exhausted, the API returns a strict 429 Too Many Requests with Retry-After.


Best practices

Respect Retry-After. When you receive a 429, always read the Retry-After header and wait at least that many seconds before retrying. The SDK does this automatically.

Use exponential backoff. For programmatic retries, use exponential backoff with jitter rather than fixed delays. The balchemy-agent-sdk ships a retry utility:

import { withRetry } from "@balchemy/agent-sdk";
 
const result = await withRetry(() => mcp.agentExecute({ instruction: "..." }), {
  maxAttempts: 3,
  baseDelayMs: 200,
  maxDelayMs: 5000,
  jitter: true,
});

Spread load. If your agent processes many tokens in parallel, introduce a delay between calls to avoid IP rate limit bursts. Batching reads into single multi-token calls (e.g. trading_market_dexscreener_tokens) is more efficient than calling single-token tools in a loop.

Monitor X-RateLimit-Remaining. For production agents, log the remaining header and emit an alert before you reach 0, so you can throttle proactively rather than receive 429s.

Cache read results. Market data tools (dexscreener_pairs, geckoterminal_pool_details, etc.) return data that is valid for seconds to minutes. Cache the response locally rather than re-calling within the same processing cycle.


WebSocket rate limit

WebSocket connections (Telegram/Discord bot gateway) go through a separate WS guard with a per-connection message rate limit. The limit is configurable and enforced independently of the HTTP guard.


Connection lost. Retrying...