API Design Philosophy: Why We Chose gRPC + OpenAPI
API Design Philosophy: Why We Chose gRPC + OpenAPI
When building Weyl’s API, we faced a crucial decision: REST, gRPC, GraphQL, or something else? Here’s why we chose a hybrid approach.
The Requirements
Our API needed to satisfy multiple, sometimes conflicting constraints:
Performance Requirements
- Low latency: Less than 50ms P99 for synchronous endpoints
- High throughput: 10K+ requests/second per instance
- Streaming: Bidirectional real-time communication for video
Developer Experience
- Easy to explore: curl-friendly, no complex tooling required
- Type-safe: Generate clients for all major languages
- Self-documenting: Interactive docs for quick onboarding
Operational Requirements
- Versionable: Support multiple API versions simultaneously
- Observable: Rich telemetry and tracing
- Testable: Easy to mock and test locally
The Hybrid Solution
We expose three API surfaces:
1. REST/JSON (OpenAPI 3.1)
For exploration and simple integrations:
# Works in any terminalcurl -X POST https://api.weyl.ai/v1/generate \ -H "Authorization: Bearer $API_KEY" \ -d '{"prompt": "a sunset", "model": "sdxl"}'Benefits:
- Ubiquitous tooling (curl, Postman, httpie)
- Firewall-friendly (HTTPS only)
- Easy to debug (readable JSON)
Trade-offs:
- ~2ms serialization overhead per request
- Limited streaming capabilities
- No bidirectional communication
2. gRPC
For production workloads:
service Weyl { rpc Generate(GenerateRequest) returns (GenerateResponse); rpc GenerateStream(GenerateRequest) returns (stream Frame);}Benefits:
- Binary protocol (smaller payloads)
- HTTP/2 multiplexing (reuse connections)
- Native streaming support
- Generated clients with types
Trade-offs:
- Requires protobuf toolchain
- Harder to debug (binary format)
- Some proxies don’t handle well
3. WebSocket
For interactive applications:
const ws = new WebSocket('wss://api.weyl.ai/v1/ws');ws.send(JSON.stringify({ type: 'generate', ... }));Benefits:
- Bidirectional (server can push updates)
- Long-lived connections (no reconnection overhead)
- Browser-native (no CORS issues)
Trade-offs:
- Harder to load balance
- Connection management complexity
- Not HTTP-cacheable
Design Principles
Principle 1: Consistent Errors
All errors follow RFC 7807 (Problem Details):
{ "type": "https://api.weyl.ai/errors/rate-limit", "title": "Rate Limit Exceeded", "status": 429, "detail": "Tier allows 100 req/min, you sent 150", "instance": "/v1/generate", "retry_after": 30}This works identically across REST, gRPC (via status details), and WebSocket.
Principle 2: Idempotency Keys
All mutation endpoints accept an Idempotency-Key header:
curl -X POST https://api.weyl.ai/v1/generate \ -H "Idempotency-Key: unique-id-12345" \ ...Why it matters:
- Network issues? Retry safely
- No duplicate charges
- Deterministic behavior
Implementation: We cache responses for 24 hours keyed by (user_id, idempotency_key).
Principle 3: Versioning via Accept Header
# Request v1curl -H "Accept: application/vnd.weyl.v1+json" ...
# Request v2 (future)curl -H "Accept: application/vnd.weyl.v2+json" ...Why not /v1/ in path?
- Versions are content negotiation, not resources
- Easier to support multiple versions per endpoint
- Cleaner URLs
Principle 4: Pagination as Streams
Instead of page and limit, we use cursor-based pagination:
{ "data": [...], "cursor": "eyJpZCI6MTIzLCJ0cyI6MTYzODM2MDAwMH0"}Next page:
curl "https://api.weyl.ai/v1/jobs?cursor=eyJ..."Benefits:
- Consistent results even with mutations
- No “page drift” issues
- Efficient database queries (index scans)
Client Libraries
We auto-generate clients from our specs:
REST (OpenAPI)
# TypeScriptbun add @weyl/client
# Pythonpip install weyl
# Gogo get github.com/weyl-ai/weyl-goAll generated from openapi.yaml using openapi-generator.
gRPC (Protobuf)
# Install bufbuf generateThis generates clients in 10+ languages from our .proto files.
OpenAPI Specification Highlights
Our openapi.yaml is ~2000 lines and includes:
- Schemas: 50+ shared schemas (DRY principle)
- Examples: Every endpoint has real-world examples
- Security: OAuth2, API keys, and mTLS documented
- Extensions: Custom
x-weyl-*fields for our codegen
View it: api.weyl.ai/openapi.yaml
Performance Results
Comparing REST vs gRPC for the same workload (1000 generate calls):
| Metric | REST/JSON | gRPC | Improvement |
|---|---|---|---|
| Total time | 127s | 94s | 1.35x faster |
| Avg latency | 127ms | 94ms | 35% reduction |
| P99 latency | 310ms | 198ms | 56% reduction |
| Bandwidth | 2.1 MB | 0.8 MB | 2.6x smaller |
gRPC wins on performance, but REST is good enough for most use cases.
Lessons Learned
What Worked Well
- Hybrid approach: Use the right tool for the job
- OpenAPI-first: Generate docs, tests, and mocks from spec
- Conservative versioning: We haven’t needed v2 yet
What We’d Do Differently
- Avoid nested resources:
/users/{id}/jobs/{id}is verbose and rigid - More webhooks: Would reduce polling load
- JSON Schema: More validation = fewer runtime errors
Try It
Explore our API interactively:
- REST Playground
- OpenAPI Spec
- gRPC Reflection (use
grpcurl)
Explore our API: api.weyl.ai