Split One Matmul Into Smaller Matmuls

Here we will describe how to split A x B into smaller matmuls. Choose dimensions of A (M x K) and B (K x N):

Select Output Chunks And See Dependencies

Click two cells in C to select a region in the output. The region of the inputs it depends on will be highlighted.

Splitting Non-Contracting Axes

Split Rows (M)

M/p K, K N -> M/p N
p 1

Split Columns (N)

M K, K N/p -> M N/p
p 1

Splitting Contraction Axis

Here we split only the contracting axis K.

Splitting K in a Dot Product

p 1

Splitting K in a Matmul

We apply the same idea to all the dot products in a matmul.

p 1

Split-K Kernels

GPUs combine both ideas: split work across output tiles (non-contracting axes), and loop over chunks of K.

Click a tile in C to see its input dependencies, then step through accumulator updates across splits of K.

splits of K 1
Accumulator update

Inputs accumulated

These are exact rewrites of the same matmul, organized into smaller pieces.