Introduction
Weyl is purpose-built inference infrastructure for generative media. We provide sub-100ms latency for diffusion models running on Blackwell architecture with FP4 precision.
Why Weyl?
- Low Latency: Sub-100ms p99 latency with optimized CUDA kernels
- Cost Optimized: FP4 quantization delivers 4x throughput improvement
- Reliable: Multi-region redundancy with 99.99% uptime SLA
- Scalable: From prototype to production with automatic scaling
Key Features
Hardware Acceleration
Built on NVIDIA Blackwell GB200 with custom kernels for FP4 Tensor Cores. Direct NVLink fabric access for zero-copy memory transfers.
Model Support
- Stable Diffusion XL
- FLUX.1
- Custom fine-tuned models
- Bring your own weights
API Design
REST and gRPC endpoints with WebSocket streaming for real-time generation. OpenAPI 3.1 specification with full TypeScript types.
Next Steps
- Quick Start - Get up and running in 5 minutes
- Authentication - Set up your API keys
- API Reference - Complete API documentation