Villa Straylight Papers - Part III: Built Him up From Nothing
Part III: Built Him up From Nothing in France
Complementation, the FTTC, and the Holes in Your Iteration Space
Complementation answers: “Given layout A in region [0,M), what layout B completes it without overlap?”
This is the operation that lets you:
- Tile a tensor between thread blocks
- Split work between warps
- Ensure tensor core loads don’t stomp on each other
The fundamental formula (rank-1 case)
For a layout A = (N) : (d) within a memory region [0, M):
Complement: B = (d, M/(N·d)) : (1, N·d)
Together they form:
Tiled layout: C = (N, d, M/(N·d)) : (d, 1, N·d)
This works because:
- A’s coordinates
[0,N)with stridedhit{0, d, 2d, ..., (N-1)d} - B’s first coordinate fills the gaps:
{0, 1, ..., d-1} - B’s second coordinate tiles across the full region
Connection to the FTTC
The Fundamental Theorem of TMA Correctness requires:
When scheduling tensor core operations with TMA loads, the box size must divide the tensor size, and the element stride must divide the box size.
This is exactly the divisibility constraint for complementation!
NVIDIA’s theorem tells you when you can safely split your iteration space. The complementation formula tells you how to construct that split.
Lean 4: complementation with proof obligations
def complement1 (N d M : Nat) (h : N * d ∣ M) : Layout := [ { extent := d, stride := 1 }, { extent := M / (N * d), stride := N * d } ]
theorem complement1_tiles (A : Layout) (N d M : Nat) (h : N * d ∣ M) (hA : A = [(N, d)]) : let B := complement1 N d M h let C := A.append B C.cosize = M ∧ (∀ c₁ c₂, A.eval c₁ = B.eval c₂ → c₁ = c₂) := -- disjoint sorry -- proof that tiling is complete and non-overlappingThe key: The divisibility proof h : N * d ∣ M is part of the type signature.
You cannot call complement1 without providing a proof that the constraint holds.
Error messages from failed complementation
When you try to complement with invalid parameters:
def badTile : Layout := complement1 128 16 2040 ?proof
-- Error: failed to synthesize proof that 128 * 16 ∣ 2040-- Note: 2048 ∣ 2040 is false-- Suggestion: use M = 2048 or M = 2032 with N = 127The type system rejects invalid tilings at compile time.