API Design Philosophy: Why We Chose gRPC + OpenAPI

When building Weyl’s API, we faced a crucial decision: REST, gRPC, GraphQL, or something else? Here’s why we chose a hybrid approach.

The Requirements

Our API needed to satisfy multiple, sometimes conflicting constraints:

Performance Requirements

Low latency: Less than 50ms P99 for synchronous endpoints
High throughput: 10K+ requests/second per instance
Streaming: Bidirectional real-time communication for video

Developer Experience

Easy to explore: curl-friendly, no complex tooling required
Type-safe: Generate clients for all major languages
Self-documenting: Interactive docs for quick onboarding

Operational Requirements

Versionable: Support multiple API versions simultaneously
Observable: Rich telemetry and tracing
Testable: Easy to mock and test locally

The Hybrid Solution

We expose three API surfaces:

1. REST/JSON (OpenAPI 3.1)

For exploration and simple integrations:

# Works in any terminal
curl -X POST https://api.weyl.ai/v1/generate \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"prompt": "a sunset", "model": "sdxl"}'

Benefits:

Ubiquitous tooling (curl, Postman, httpie)
Firewall-friendly (HTTPS only)
Easy to debug (readable JSON)

Trade-offs:

~2ms serialization overhead per request
Limited streaming capabilities
No bidirectional communication

2. gRPC

For production workloads:

service Weyl {
  rpc Generate(GenerateRequest) returns (GenerateResponse);
  rpc GenerateStream(GenerateRequest) returns (stream Frame);
}

Benefits:

Binary protocol (smaller payloads)
HTTP/2 multiplexing (reuse connections)
Native streaming support
Generated clients with types

Trade-offs:

Requires protobuf toolchain
Harder to debug (binary format)
Some proxies don’t handle well

3. WebSocket

For interactive applications:

const ws = new WebSocket('wss://api.weyl.ai/v1/ws');
ws.send(JSON.stringify({ type: 'generate', ... }));

Benefits:

Bidirectional (server can push updates)
Long-lived connections (no reconnection overhead)
Browser-native (no CORS issues)

Trade-offs:

Harder to load balance
Connection management complexity
Not HTTP-cacheable

Design Principles

Principle 1: Consistent Errors

All errors follow RFC 7807 (Problem Details):

{
  "type": "https://api.weyl.ai/errors/rate-limit",
  "title": "Rate Limit Exceeded",
  "status": 429,
  "detail": "Tier allows 100 req/min, you sent 150",
  "instance": "/v1/generate",
  "retry_after": 30
}

This works identically across REST, gRPC (via status details), and WebSocket.

Principle 2: Idempotency Keys

All mutation endpoints accept an Idempotency-Key header:

curl -X POST https://api.weyl.ai/v1/generate \
  -H "Idempotency-Key: unique-id-12345" \
  ...

Why it matters:

Network issues? Retry safely
No duplicate charges
Deterministic behavior

Implementation: We cache responses for 24 hours keyed by (user_id, idempotency_key).

Principle 3: Versioning via Accept Header

# Request v1
curl -H "Accept: application/vnd.weyl.v1+json" ...

# Request v2 (future)
curl -H "Accept: application/vnd.weyl.v2+json" ...

Why not /v1/ in path?

Versions are content negotiation, not resources
Easier to support multiple versions per endpoint
Cleaner URLs

Principle 4: Pagination as Streams

Instead of page and limit, we use cursor-based pagination:

{
  "data": [...],
  "cursor": "eyJpZCI6MTIzLCJ0cyI6MTYzODM2MDAwMH0"
}

curl "https://api.weyl.ai/v1/jobs?cursor=eyJ..."

Benefits:

Consistent results even with mutations
No “page drift” issues
Efficient database queries (index scans)

Client Libraries

We auto-generate clients from our specs:

REST (OpenAPI)

# TypeScript
bun add @weyl/client

# Python
pip install weyl

# Go
go get github.com/weyl-ai/weyl-go

All generated from openapi.yaml using openapi-generator.

gRPC (Protobuf)

# Install buf
buf generate

This generates clients in 10+ languages from our .proto files.

OpenAPI Specification Highlights

Our openapi.yaml is ~2000 lines and includes:

Schemas: 50+ shared schemas (DRY principle)
Examples: Every endpoint has real-world examples
Security: OAuth2, API keys, and mTLS documented
Extensions: Custom x-weyl-* fields for our codegen

View it: api.weyl.ai/openapi.yaml

Performance Results

Comparing REST vs gRPC for the same workload (1000 generate calls):

Metric	REST/JSON	gRPC	Improvement
Total time	127s	94s	1.35x faster
Avg latency	127ms	94ms	35% reduction
P99 latency	310ms	198ms	56% reduction
Bandwidth	2.1 MB	0.8 MB	2.6x smaller

gRPC wins on performance, but REST is good enough for most use cases.

Lessons Learned

What Worked Well

Hybrid approach: Use the right tool for the job
OpenAPI-first: Generate docs, tests, and mocks from spec
Conservative versioning: We haven’t needed v2 yet

What We’d Do Differently

Avoid nested resources: /users/{id}/jobs/{id} is verbose and rigid
More webhooks: Would reduce polling load
JSON Schema: More validation = fewer runtime errors

Try It

Explore our API interactively:

Explore our API: api.weyl.ai