Villa Straylight Papers - Part I: The Rectilinear Chamber
Part I: The Rectilinear Chamber
Layouts, Coordinate Spaces, and the CuTe Contract
1. Layouts as coordinate → offset maps
A layout has two parts:
- a shape
S = (M₀, …, M₍ₙ₋₁₎)of positive integers, - a stride
D = (d₀, …, d₍ₙ₋₁₎)of positive integers,
with the same “profile” (same rank / structure).
The coordinate space is the finite product:
Coord(S) = [0,M₀) × … × [0,M₍ₙ₋₁₎)The semantics of the layout is the dot-product map:
eval_L(x₀,…,x₍ₙ₋₁₎) = Σᵢ xᵢ · dᵢThat’s the thing CuTe means by “a layout maps coordinate space(s) defined by Shape into an index space defined by Stride.”
Row-major / column-major are just special cases
For a 4×8 matrix:
- column-major:
S=(4,8),D=(1,4) - row-major:
S=(4,8),D=(8,1)
2. Size and cosize (tight definition)
Two numbers matter constantly:
- size: how many logical coordinates exist:
size(L) = ∏ᵢ Mᵢ - cosize: how far the layout’s image reaches in memory:
cosize(L) = 1 + max{ eval_L(c) | c ∈ Coord(S) }
3. Compact vs contiguous
Two properties you want to name cleanly:
- Compact (injective): no two coordinates collide in memory:
eval_L(c)=eval_L(c') ⇒ c=c' - Contiguous (a permutation of a block): compact and it fills exactly
[0,size)(no gaps, no overshoot). A convenient sufficient characterization: compact, andcosize(L) = size(L).
4. Coordinate isomorphism (why 1D indices still show up)
CuTe “thinks in coordinates”. Humans (and many APIs) still “think in linear indices”.
Given just a shape S, there is a standard mixed-radix bijection between:
- linear index
x ∈ [0, ∏ Mᵢ) - coordinate tuple
(x₀,…,x₍ₙ₋₁₎) ∈ Coord(S)
with:
x₀ = x mod M₀x₁ = ⌊x / M₀⌋ mod M₁…xᵢ = ⌊x / (∏₍ⱼ<ᵢ₎ Mⱼ)⌋ mod MᵢThis is not the layout yet; it’s the coordinate system induced by the shape.
Once you have a coordinate, the layout semantics is just the dot product with strides.
5. Lean 4: make the semantics the center
The biggest Lean cleanup is: don’t define the meaning of a layout as Nat → Nat first. Define it as a function on bounded coordinates (like CuTe does), then optionally add a linearization layer.
Lean: core data types
structure Mode where extent : Nat stride : Nat h_extent_pos : 0 < extent
def Layout : Type := List Mode
def Layout.eval (L : Layout) (c : List Nat) : Option Nat := if h : c.length = L.length ∧ (∀ i, c[i]? < L[i]?.extent) then some (List.sum (List.zipWith (· * ·) c (L.map Mode.stride))) else none
def Layout.cosize (L : Layout) : Nat := 1 + (List.finRange (Layout.size L)).maximum? (Layout.eval_from_lin L)Lean: linearization layer (kept explicit)
def linToCoord (shape : List Nat) (x : Nat) : List Nat := shape.scanl (· * ·) 1 |>.zip shape |>.map (fun (prod, m) => (x / prod) % m)
theorem linToCoord_bijective (shape : List Nat) (h : shape.all (0 < ·)) : Function.Bijective (linToCoord shape ∘ fun x => x < shape.prod) := sorry -- proof by mixed-radix isomorphism6. What’s next
- Part II: Coalescence — a terminating normalization that removes “fake rank” (unit modes) and merges compatible adjacent modes.
- Part III: Complementation — constructing the “rest of the space” so two layouts tile a memory region without overlap.
- Part IV: Composition — the operation that turns layouts into a real algebra of access patterns.