Skip to content

Backend Comparison

Weyl supports three inference backends with different characteristics.

Overview

Backend	Precision	Speed	Models
`nunchaku`	FP4	⚡⚡⚡	FLUX, Z-Image
`torch`	FP16	⚡⚡	FLUX, WAN
`tensorrt`	Mixed	⚡⚡⚡	FLUX Dev/Schnell

Nunchaku

NVIDIA FP4 quantization on Blackwell GB200

Precision: FP4 (4-bit floating point)
Speed: Fastest (3-4× faster than FP16)
Quality: Minimal loss vs FP16

Supported Models:

FLUX Dev2 ✓
FLUX Dev ✓
FLUX Schnell ✓
Z-Image Turbo ✓

Torch

PyTorch diffusers with CUDA

Precision: FP16 (half precision)
Framework: diffusers + transformers
Flexibility: Maximum flexibility

Supported Models:

FLUX Dev2 ✓
FLUX Dev ✓
FLUX Schnell ✓
WAN ✓

TensorRT

NVIDIA TensorRT-LLM with ModelOpt

Precision: Mixed (INT8 + FP16)
Optimization: Ahead-of-time compilation

Supported Models:

FLUX Dev ✓
FLUX Schnell ✓

Performance

FLUX @ 1024×1024:

Model	Backend	Latency
schnell	nunchaku	450ms
schnell	tensorrt	380ms
dev	nunchaku	1.8s
dev	tensorrt	1.5s