Mirrored from Sudopedia, the Free Sudoku Reference Guide

# Canonical Form

The best way to compare two Sudokus is to canonicalize them both. When the puzzles are mathematically equivalent, their canonicalized forms should be identical.

All sudokus that are mathematically equivalent belong to a single equivalence class. Each member of this class can be transformed into each other member of that class by permutating rows within a floor, columns within a tower, the floors or towers themselves, swapping rows and columns and relabeling the 9 digits. The algorithm that transforms each puzzle in an equivalence class to the same member is what we call the canonicalization algorithm. The resulting puzzle is the canonical form, which represents the entire class.

The opposite operation is to scramble a puzzle, which turns it into a random member of the same equivalence class.

## Algorithms

Either the original puzzle with just the givens, or the completed solution grid can be canonicalized.

### Original Puzzle

When the givens alone are canonicalized, unless each puzzle is absolutely mathematically equivalent, completely different results will be obtained for otherwise very similar puzzles.

There are math programs which can canonicalize graphs, but the Sudoku grid must be converted first, and the canonicalized graph is then converted back into Sudoku format.

Alternatively, one can permutate the rows and columns into a minimal position, by assigning a different weight to each cell containing a given. The next step would be to relabel the digits in such a way that the first rows or box has digits 1 through 9 in fixed order. When the resulting pattern has symmetry, they all need to be tested after relabeling. This method has the advantage that puzzles with the same distribution pattern for the givens can be compared.

### Solutions

Canonicalizing the solution is more straightforward, and has the advantage that similar puzzles with the same solution are easy to compare. A minor problem arises however when the solution shows automorphism, meaning that it can be transformed into itself.

```Canonical Map
. . . | . . . | 5 . .
. . . | 5 . . | . . .
5 . . | . . . | . . .
------+-------+------
. . . | 1 2 3 | . 5 .
. . . | 4 5 6 | . . .
. 5 . | 7 8 9 | . . .
------+-------+------
. . . | . . . | . . 5
. . . | . . 5 | . . .
. . 5 | . . . | . . .

```

This canonical map, with a rotationally symmetrical pattern for digit 5, is one of any number of convenient arrangements possible for use as a standard. Digit 5 is being used as an "anchor" digit and the rows and columns in the floors and towers are swapped to bring any selected digit into the mapped positions for it. The candidates are then renumbered into ascending order in box 5, and the resultant number given along the first row is used to give an identity number for the variation produced. When every permutation of the boxes and the digits in the anchor positions has been enumerated, the one giving the lowest identity number is used as the canonical form of the puzzle. For automorphic puzzles (estimated at ~0.1% of the total) the same identities will repeat, when the version to use must be decided by a ranking system based on the clue positions if they are different.

Enumerating the options:

Number of possible anchor digits = 9
Number of boxes to set as box 5 = 9
Number of boxes to set as box 2 = 4 (this is 2 in the original orientation and 2 with a 90 degree rotation)
Number of boxes to set as box 1 = 2 (effectively this swaps the left hand and right hand towers)
Row positions in each tier = 1 (each is forced by the 5 pattern)
Column positions in each stack = 1 (each is forced by the 5 pattern)
Renumbering remaining digits = 1 (forced by box 5 cell numbering)

This mapping therefore produces 9 x 9 x 4 x 2 = 648 different identities to enumerate.

#### Reducing the Count using Traveling Pairs

Some extra work using the pattern of traveling pairs in the puzzle provides a means (usually) to reduce the number of options:

```Example listing the products for the number of times a
digit is in a traveling pair in a floor and in a tower.
9 4 7 | 3 6 1 | 5 2 8      1: 3 x 3 = 9
6 5 8 | 9 2 7 | 3 1 4      2: 1 x 3 = 3
1 3 2 | 4 5 8 | 9 6 7      3: 3 x 3 = 9
------+-------+------
4 8 9 | 1 7 2 | 6 3 5      4: 2 x 3 = 6
7 6 1 | 8 3 5 | 4 9 2      5: 3 x 1 = 3
3 2 5 | 6 4 9 | 7 8 1      6: 1 x 0 = 0
------+-------+------
2 9 3 | 7 1 4 | 8 5 6      7: 2 x 1 = 2*
5 7 6 | 2 8 3 | 1 4 9      8: 1 x 2 = 2*
8 1 4 | 5 9 6 | 2 7 3      9: 2 x 2 = 4
```

Revised Enumeration:

Anchor digit restricted to those with the smallest non-zero product. This reduces the choice to digits 7 or 8 (x2)
Box 5: restricted to boxes with the anchor in a traveling pair in both directions. For both digits there are only two such boxes (x2)
Box 2: priority given to boxes with the anchor in a traveling pair across. This forces the selection of box 2 in both cases (x1).
Box 1: priority given to boxes with the anchor in a traveling pair down. This is not possible, so there are still 2 options for box 1 (x2)

In this case, the final number of contenders is reduced to 2 x 2 x 1 x 2 = 8.

#### The Worst Case

The so-called canonical grid has 648 automorphisms and cannot be canonicalized more quickly by "reducing the count", since all 648 options from the basic canonicalization method yield the same solution grid.