Skip to content
LLM-friendly formats:

Backend Comparison

Weyl supports three inference backends with different characteristics.

Overview

BackendPrecisionSpeedModels
nunchakuFP4⚡⚡⚡FLUX, Z-Image
torchFP16⚡⚡FLUX, WAN
tensorrtMixed⚡⚡⚡FLUX Dev/Schnell

Nunchaku

NVIDIA FP4 quantization on Blackwell GB200

  • Precision: FP4 (4-bit floating point)
  • Speed: Fastest (3-4× faster than FP16)
  • Quality: Minimal loss vs FP16

Supported Models:

  • FLUX Dev2 ✓
  • FLUX Dev ✓
  • FLUX Schnell ✓
  • Z-Image Turbo ✓

Torch

PyTorch diffusers with CUDA

  • Precision: FP16 (half precision)
  • Framework: diffusers + transformers
  • Flexibility: Maximum flexibility

Supported Models:

  • FLUX Dev2 ✓
  • FLUX Dev ✓
  • FLUX Schnell ✓
  • WAN ✓

TensorRT

NVIDIA TensorRT-LLM with ModelOpt

  • Precision: Mixed (INT8 + FP16)
  • Optimization: Ahead-of-time compilation

Supported Models:

  • FLUX Dev ✓
  • FLUX Schnell ✓

Performance

FLUX @ 1024×1024:

ModelBackendLatency
schnellnunchaku450ms
schnelltensorrt380ms
devnunchaku1.8s
devtensorrt1.5s