Warm-up Coding Interview: Combinatorial Search and Heuristic Methods

In this section, we introduce backtracking as a technique for listing all possible solutions for a combinatorial algorithm problem. We illustrate the power of clever pruning techniques to speed up real search applications. For problems that are too large to contemplate using brute-force combinatorial search, we introduce heuristic methods such as simulated annealing. Such heuristic methods are important weapons in any practical algorist’s arsenal.

Backtracking

Backtracking is a systematic way to iterate through all the possible configurations of a search space. These configurations may represent all possible arrangements of objects (permutations) or all possible ways of building a collection of them (subsets). Other situations may demand enumerating all spanning trees of a graph, all paths between two vertices, or all possible ways to partition vertices into collar classes.

What these problems have in common is that we must generate each one possible configuration exactly once. Avoiding both repetitions and missing configurations means that we must define a systematic generation order. We will model our combinatorial search solution as a vector a = (a_1, a_2, \cdots, a_n), where each element a_i is selected from a finite ordered set S_i. The vector can even represent a sequence of moves in a game or a path in a graph, where a_i contains the 9th event in the sequence.

At each step in the backtracking algorithm, we try to extend a given partial solution a = (a_1, a_2, \dots, a_k) by adding another element at the end. After extending it, we must test whether what we now have is a solution: if so, we should print it or count it. If not, we must check whether the partial solution is still potentially extendible to some complete solution.

Backtracking constructs a tree of partial solutions, where each vertex represents a partial solution. There is an edge from x to y if node y was created by advancing from x. This tree of partial solutions provides an alternative way to think about backtracking, for the process of constructing the solutions corresponds exactly to doing a depth-first traversal of the backtrack tree. Viewing backtracking as a depth-first search on an implicit graph yields a natural recursive implementation of the basic algorithm.

Although a breadth-first search could also be used to enumerate solutions, a depth-first search is greatly preferred because it uses less space. The current state of a search is completely represented by the path from the root to the current search depth-first node. This requires space proportional to the height of the tree. In the breadth-first search, the queue stores all the nodes at the current level, which is proportional to the width of the search tree. For most interesting problems, the width of the tree grows exponentially in its height.

Implementation

The honest working backtrack code is given below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
bool finished = FALSE; /* found all solutions yet? */

backtrack(int a[], int k, data input) {
    int c[MAXCANDIDATES];
    int ncandidates;
    int i;

    if (is_a_solution(a, k, input))
         process_solution(a, k, input);
    else {
        k = k+1;
        construct_candidate(a, k, input, c, & candidates);
        for (i=0; i<ncandidates; i++) {
            a[k] = c[i];
            make_move(a, k, input);
            backtrack(a, k, input);
            unmake_move(a, k, input);
            if (finished) return; /* terminate early */
        }
    }
}

Backtracking ensures correctness by enumerating all possibilities. It ensures efficiency by never visiting a state more than once.

Study how recursion yields an elegant and easy implementation of the backtracking algorithm. Because a new candidates array c is allocated with each recursive procedure call, the subsets of not-yet-considered extension candidates at each position will not interfere with each other.

The application-specific parts of this algorithm consists of five subroutines:

  • is_a_solution(a, k, input) – This Boolean function tests whether the first k elements of vector a from a compolete soution for the given problem.
  • construct_candidates(a, k, input, c, ncandidates) – This routine fills an array c with the complete set of possible candidates for the kth position of a, given the contents of the first k-1 positions. The number of candidates returned in this array is denoted by ncandidates.
  • process_solution(a, k, input) – This routine prints, counts, or however processes a complete solution once it is constructed.
  • make_move(a, k, input) and unmake_move(a, k, input) – These routines enable us to modify a data structure in response to the latest move, as well as clean up this data structure if we decide to take back the move.

These calls function as null stubs in all of this section’s examples, but will be employed in the Sudoku program.

We include a global finished flag to allow for premature termination, which could be set in any application-specific routine.

To really understand how backtracking works, you must see how such objects as permutations and subsets can be constructed by defining the right state spaces.

Constructing All Subsets

To construct all 2^n subsets, we set up an array/vector of n cells, where the value of a_i (true or false) signifies whether ith item is in the given subset. In the scheme of our general backtrack algorithm, S_k = (true, false) and a is a solution whenever k = n. We can now construct all subsets with simple implementations of is_a_solution(), construct_candidates(), process_solution().

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
is_a_solution(int a[], int k, int n) {
    return (k == n);
}

construct_candidates(int a[], int k, int n, int c[], int *ncandidates) {
    c[0] = TRUE;
    c[1] = FALSE;
    *ncandidates = 2;
}

process_solution(int a[], int k) {
    int i; /* counter */

    printf("{");
    for (i = 1; i<=k; i++)
        if (a[i] == TRUE) printf(" %d", i);

    printf(" }\n");
}

Printing each out subset after constructing it proves to be the most complicated of the three routines!

Finally, we must instantiate the call to backtrack with the right arguments. Specifically, this means giving a pointer to the empty solution vector, setting k = 0 to denote that it is empty, and specifying the number of elements in the universal set:

1
2
3
4
5
generate_subsets(int n) {
    int a[NMAX];

    backtrack(a, 0, n);
}

Constructing All Paths in a Graph

Enumerating all the simple s to t paths through a given path is a more complicated problem than listing permutations or subsets. There is no explicit formula that counts the number of solutions as a function of the number of edges or vertices, because the number of paths depends upon the structure of the graph.

The starting point of any path from s to t is always s. Thus, s is the only candidate for the first position and S_1 = \{s\}. The possible candidates for the second position are the vertices v such that (s,v) is an edge of the graph, for the path wanders from vertex using edges to define the legal steps. In general, S_{k+1} consists of the set of vertices adjacent to a_k that have not been used elsewhere in the partial solution A.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
construct_candidates(int a[], int k, int n, int c[], int *ncandidates) {
    int i;
    bool in_sol[NMAX];
    edgenode *p;
    int last;

    for (i=1; i<NMAX; i++) in_sol[i] = FALSE;
    for (i=1; i<k; i++) in_sol[ a[i] ] = TRUE;

    if (k==1) {
        c[0] = 1;
        *ncandidates = 1;
    } else {
        *ncandidates = 0;
        last = a[k-1];
        p = g.edges[last];
        while (p != NULL) {
            if (!in_sol[ p->y ]) {
                c[*ncandidates] = p->y;
                *ncandidates = *ncandidates + 1;
            }
            p = p->next;
        }
    }
}

We report a successful path whenever a_k = t.

1
2
3
4
5
6
7
is_a_solution(int a[], int k, int t) {
    return (a[k] == t);
}

process_solution(int a[], int k) {
    solution_count ++; /* count all s to t paths */
}

The solution vector A must have room for all n vertices, although most paths are likely shorter than this.

Searching Pruning

Backtracking ensures correctness by enumerating all possibilities. Enumerating all n! permutations of n vertices of the graph and selecting the best one yields the correct algorithm to find the optimal traveling salesman tour. For each permutation, we could see whether all edges implied by the tour really exists in the graph G, and if so, add the weights of these edges together.

Pruning is the technique of cutting off the search the instant we have established that a partial solution cannot be extended into a full solution.

Exploiting symmetry is another avenue for reducing combinatorial searches. Pruning away partial solutions identical to those previously considered requires recognizing underlying symmetries in the search space.

Sudoku

Backtracking lends itself nicely to the problem of solving Sudoku puzzles. We will use the puzzle here to better illustrate the algorithmic technique. Our state space will be the sequence of open squares, each of which must ultimately be filled in with a number. The candidates for open squares (i,j) are exactly the integers from 1 to 9 that have not yet appeared in row i, column j, or the 3 * 3 sector containing (i,j). We backtrack as soon as we are out of candidates for a square.

The solution vector a supported by backtrack only accepts a single integer per position. This is enough to store contents of a board square but not the coordinates of the board square. Thus, we keep a separate array of move positions as part of our board data type provided below. The basic data structures we need to support our solution are:

1
2
3
4
5
6
7
8
9
10
11
12
#define DIMENSION 9
#define NCELLS DIMENSION*DIMENSION

typedef struct {
    int x, y;
} point;

typedef struct {
    int m[DIMENSION+1][DIMENSION+1]; /* matrix of board contents */
    int freecount; /* how many open squares remain? */
    point move[NCELLS+1]; /* how did we fill the squares? */
} boardtype;

Constructing the candidates for the next solution position involves first pick the open square we want to fill next (next_square), and then identifying which numbers are candidates to fill that square (possible_values). These routines are basically bookkeeping, although the subtle details of how they work can have an enormous impact on performance.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
construct_candidates(int a[], int k, boardtype *board, int c[], int *ncandidates) {
    int x,y;
    int i;
    bool possible[DIMENSION+1];

    next_square(&x, &y, board);

    board->move[k].x = x;
    board->move[k].y = y;

    *ncandidates = 0;

    if ((x<0) && (y<0)) return; /* error condition, no moves possible */

    possible_values(x, y, board, possible);
    for (i=0; i<=DIMENSION; i++)
        if (possible[i] == TRUE) {
            c[*ncandidates] = i;
            *ncandidates = *ncandidates + 1;
        }
}

We must update our board data structure to reflect the effect of filling a candidate value into a square, as well as remove these changes should we backtrack away from this position. These updates are handled by make_move and unmake_move, both of which are called directly from backtrack:

1
2
3
4
5
6
7
make_move(int a[], int k, boardtype *board) {
    fill_square(board->move[k].x, board->move[k].y, a[k], board);
}

unmake_move(int a[], int k, boardtype *board) {
    free_square(board->move[k].x, board->move[k].y, board);
}

One important job for these board update routines is maintaining how many free squares remain on the board. A solution is found when there are no more free squares remaining to be filled:

1
2
3
4
5
6
is_a_solution(int a[], int k, boardtype *board) {
    if (board->freecount == 0)
        return (TRUE);
    else
        return (FALSE);
}

We print the configuration and turn of the backtrack search by setting off the global finished flag on finding a solution.

1
2
3
4
process_solution(int a[], int k, boardtype *board) {
    print_board(board);
    finished = TRUE;
}

Two reasonable ways to select the next square are:

  • Arbitrary Square Selection – Pick the first open square we encounter, possibly picking the first, the last, or a random open square.
  • Most Constrained Square Selection – Here, we check each of the open squares (i,j) to see how many number candidates remain for each – i.e., have not already been used in either row i, column j, or the sector containing (i,j). We pick the square with the fewest number of candidates.

Although both possibilities work correctly, the second option is much, much better. Often there will be open squares with only one remaining candidate.

Our final decision concerns the possible_values we all for each square. We have two possibles:

  • Local Count
  • Look ahead – But what if our current partial solution has some other open square where there are no candidates remaining under the local count criteria? There is no possible way to complete this partial solution into a full Sudoku grid. (?)

Successful pruning requires looking ahead to see when a solution is doomed to go nowhere, and backing off as soon as possible.

Heuristic Search Methods

Heuristic methods provide an alternate way to approach difficult combinatorial optimization problems. However, any algorithm searching all configurations is doomed to be impossible on large instances.

In particular, we will look at three different heuristic search methods: random sampling, gradient-descent search, and simulated annealing. The traveling salesman problem will be our ongoing example for comparing heuristics. All three methods have two common components:

  • Solution space representation – This is a complete yet concise description of the set of possible solutions for the problem. For traveling salesman, the solution space consists of (n-1)! elements — namely all possible circular permutations of the vertices. We need a data structure to represent each element of the solution space. For TSP, the candidate solutions can naturally be represented using an array S of n-1 vertices, where S_i defines the (i+1)st vertex on the tour starting from v_1.
  • Cost function – Search methods need a cost or evaluation function to access the quality of each element of the solution space. Our search heuristic identifies the element with the best possible score – either highest or lowest depending upon the nature of the problem. For TSP, the cost function for evaluating a given candidate solution S should just sum up the cost involved, namely the weight of all edges (S_i, S_{i+1}), where S_{n+1} denotes v_1.

Random Sampling

The simplest method to search in a solution space uses random sampling. It is also called the Monte Carlo method. We repeatedly construct random solutions and evaluate them, stopping as soon as we get a good enough solution, or (more likely) when we are tired of waiting. We report the best solution found over the course of our sampling.

True random sampling requires that we are able to select elements form the solution space uniformly at random. This means that each of the elements of the solution space must have an equal probability of being the next candidate selected. Such sampling can be a subtle problem.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
random_sampling(tsp_instance *t, int nsamples, tsp_solution *bestsol) {
    tsp_solution s;   /* current tsp solution */
    double best_cost; /* best cost so far */
    double cost_now;  /* current cost */
    int i; /* counter */

    initialize_solution(t->n, &s);
    best_cost = solution_cost(&s, t);
    copy_solution(&s, bestsol);

    for (i=1; i<=nsamples; i++) {
        random_solution(&s);
        cost_now = solution_cost(&s, t);
        if (cost_now < best_cost) {
            best_cost = cost_now;
            copy_solution(&s, bestsol);
        }
    }
}

When might random sampling do well?

  • When there are a high proportion of acceptable solutions. Finding prime numbers is a domain where a random search proves successful. Generating large random prime numbers for keys is an important aspect of cryptogrpahic systems such as RSA. Roughly one out of every \ln n integers are prime, so only a modest number of samples need to be taken to discover primes that are several hundred digits long.
  • When there is no coherence in the solution space – Random sampling is the right thing to do when there is no sense of when we are getting closer to a solution.

Consider again the problem of hunting for a large prime number. Prime are scattered quite arbitrarily among the integers. Random sampling is as good as anything else.

How does random sampling do on TSP? Pretty lousy.

Most problems we encounter, like TSP, have relatively few good solutions but a highly coherent solution space. More powerful heuristic search algorithms are required to deal effectively with such problems.

Stop and Think: Picking the Pair

Problem: We need an efficient and unbiased way to generate random pairs of vertices to perform random vertex swaps. Propose an efficient algorithm to generate elements from the n \choose k unordered pairs on \{1, \dots, n\} uniformly at random.

Local Search

A local search employs local neighborhood around every element in the solution space. Think of each element x in the solution space as a vertex, with a directed edge (x, y) to every candidate solution y that is a neighbor of x. Our search proceeds from x to the most promising candidate in x’s neighborhood.

We certainly do not want to explicitly construct this neighborhood graph for any sizable solution space. We are conducting a heuristic search precisely because we cannot hope to do these many operations in a reasonable amount of time.

Instead, we want a general transition mechanism that takes us to the next solution by slightly modifying the current one. Typical transition mechanisms include swapping a random pair of items of changing (inserting or deleting) a single item in the solution.

The most obvious transition mechanism for TSP would be to swap the current tour positions of a random pair of vertices S_i and S_j.

A local search heuristic starts from an arbitrary element of the solution space, and then scans the neighborhood looking for a favorable transition to take. For TSP, this would be transition, which lowers the cost of the tour. In a hill-climbing procedure, we try to find the top of a mountain (or alternately, the lowest point in a ditch) by starting at some arbitrary point and taking any step that leads in the direction we want to travel. We repeat until we have reached a point where all our neighbors lead us in the wrong direction.

Hill-climbing and closely related heuristics such as greedy search or gradient descent search are great at finding local optima quickly, but often fail to find the globally best solution.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
hill_climbing(tsp_instance *t, tsp_solution *s) {
    double cost;
    double delta;
    int i, j;
    bool stuck;
    double transition();

    initialize_solution(t->n, s);
    random_solution(s);
    cost = solution_cost(s, t);

    do {
        stuck = TRUE;
        for (i=1; i<t->n; i++)
            for (j=i+1; j<=t->n; j++) {
                delta = transition(s,t,i,j);
                if (delta < 0) {
                    stuck = FALSE;
                    cost = cost + delta;
                } else
                    transistion(s,t,j,i);
            }
    } while (!stuck);
}

When does local search do well?

  • When there is great coherence in the solution space – Hill climbing is at its best when the solution space is convex. In other words, it consists of exactly one hill.
  • Whenever the cost of incremental evaluation is much cheaper than global evaluation – It cost \Theta(n) to evaluate the cost of an arbitrary n-vertex candidate TSP solution, because we must total the cost of each edge in the circular permutation describing the tour. Once that is found, however, the cost of the tour after swapping a given pair of vertices can be determined in constant time.

If we are given a very large value of n and a very small budget of how much time we can spend searching, we are better off using it to do several incremental evaluations than a few random samples, even if we are looking for a needle in a haystack.

The primary drawback of a local search is that soon there isn’t anything left for us to do as we find the local optimum.

How does local search do on TSP? Much better than random sampling for a similar amount of time.

Simulated Annealing

Simulated annealing is a heuristic search procedure that allows occasional transitions leading to more expensive (and hence inferior) solutions. This may not sound like process, but it helps keep our search from getting stuck in local optima.

The inspiration for simulated annealing comes from the physical process of cooling molten materials down to the solid state. In the thermodynamic theory, the energy state of a system is described by the energy state of each particle constituting it. A particle’s energy state jumps about randomly, with such transitions governed by the temperature of the system. In particular, the transition probability P(e_i, e_j, T) from energy e_i to e_j at temperature T is given by

P(e_i, e_j, T) = e^{(e_i – e_j)/(k_BT)}

where k_B is a constant – called Boltzmann’s constant.

What does this formula mean? Consider the value of the exponent under different conditions. The probability of moving from a high-energy state to a lower-energy state is very high. But, there is still a nonzero probability of accepting a transition into a high-energy state, with such small jumps much more likely than big ones. The higher the temperature, the more likely energy jumps will occur.

Through random transitions generated according to the given probability distribution, we can mimic the physics to solve arbitrary combinatorial optimization problems.

Applications of Simulated Annealing

We provide several examples to demonstrate how these components can lead to elegant simulated annealing solutions for real combinatorial search problems.

Maximum Cut

Independent Set

An “independent set” of a graph G is a subset of vertices S such that there is no edge with both endpoints in S. Finding large independent sets arises in dispersion problems associated with facility location and coding theory.

The natural state space for a simulated annealing solution would be all 2^n subsets of the vertices, represented as a bit vector. As with maximum cut, a simple transition mechanism would add or delete one vertex from S.

One natural cost function for subset S might be 0 if S contains an edge, and |S| if it is indeed an independent set. This function ensures taht we work towards an independent set at all times.

Circuit Board Placement

Parallel Algorithms

Other Heuristic Search Methods

Popular methods include genetic algorithms, neural networks, and ant colony optimization.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.