Skip to content
LLM-friendly formats:

Capacity Management

The sync tier runs on dedicated capacity. When exhausted, requests return 503.

Understanding 503

HTTP/1.1 503 Service Unavailable
Retry-After: 30
{
"error": "capacity_exhausted",
"message": "sync tier at capacity, retry in 30s or use async"
}

Response Strategy

1. Retry with Backoff

import time
import requests
def generate_with_backoff(url, data, headers, max_attempts=5):
for attempt in range(max_attempts):
resp = requests.post(url, json=data, headers=headers)
if resp.status_code == 200:
return resp.content
if resp.status_code == 503:
retry_after = int(resp.headers.get('Retry-After', 30))
backoff = retry_after * (2 ** attempt)
time.sleep(backoff)
continue
resp.raise_for_status()

2. Fallback to Async

Switch to async tier when sync is exhausted.

# Try sync first
try:
resp = requests.post(sync_url, ...)
if resp.status_code == 200:
return resp.content
except:
pass
# Fall back to async
resp = requests.post(async_url, ...)