A SEARCH PROBLEM consists of:
A search problem can be represented by a STATE-SPACE GRAPH with as nodes the states, with as arcs the actions, and specially marking the initial and goal states. The state-space graph usually is generated starting from the initial state adding gradually nodes and arcs.
A SEARCH STRATEGY is a prescription on how to make choices during a search. A Search Algorithm is the implementation of a search strategy.
As a Search Strategy progresses in examining a State-Space graph we say that a node is
We distinguish search strategies on the basis of their knowledge of nodes they have not yet explored. If they have no knowledge of such nodes we say that the search is Uninformed or Blind. If they have such knowledge we say they are Informed or Heuristic. Examples of Blind Search Algorithsm are:
An HEURISTIC EVALUATION FUNCTION is a function f that, applied to a node n, returns a non-negative finite number f(n). It is assumed that the smaller the number, the more likely it is that the node is on the optimal path from the start to a goal node. The value f(n) may be different at different times during a search, but the value can only decrement, and decrement by some non-infinitesimal amount.
Most search algorithms examine states gradually, in terms of their depth or their heuristic cost. Such search algorithms are called Chronological Search algorithms. Other search algorithms use information about the internal structure of each state to determine what state to consider next. Such algorithms are called Non-Chronological. They are used for instance in scheduling and use information about failed schedules to come up with new candidate schedules. Of course, non-chronological algorithms are more complex than their chronological counterparts. A related concept to non-chronological search algorithms is the concept of MEANS-END ANALYSIS, whereby by examine operators on the basis of their ability to bring us closer to the goal. Means-end analysis usually takes place when we have knowledge of the internal structure of states and of the effect of operators on such structure.
Most search algorithms use the GENERATE-AND-TEST paradigm, that is, we generate new states until either a goal state is found or we run out of states.
Most state space search algorithms use two lists, OPEN and CLOSED. Initially OPEN is a set consisting of the initial state, and CLOSED is {}. OPEN consists of generated nodes that we intend to consider during the search. We say a node is OPENED after it is taken out of OPEN.
After a node is opened, it is possibly expanded and placed into CLOSED. The search algorithm terminates when a goal state is extracted from OPEN (success), or when we try to extract a node from OPEN and OPEN is empty (failure), or, in some cases, when we generate a goal state (success).
BF is in a hurry to find a goal state. In this hurry it may miss an optimal goal state. For example, if n in step 5. has two successors, both goal states, one better than the other, there is no guaranty that BF will return with as solution the best successor.
The BF* algorithm is obtained from BF modified to delay termination as follows:
In what follows we consider BF* and some of its variations.
FACT: If the State Space Graph is finite then BF* will terminate.
Proof: Since the graph is finite, the only possible danger is in step 6.3. when n' is in CLOSED. The danger is that the node may move from CLOSED to OPEN and back again an infinite number of times. But this is not possible since each time the node moves from CLOSED to OPEN its value is decremented by some non infinitesimal amount thus its value would be become eventually negative.
FACT: If the State Space Graph has a solution (a finite path from s to a goal node t), then all nodes opened by BF* will have a value smaller or equal than some fixed value F.
Proof: Since there is a finite path from s to t, F = max{f(n), where n is a node on that path} is well defined and finite. All the nodes n' considered by BF* will all have f(n') <= F.
FACT: Even if the State Space Graph has a solution (a finite path from s to a goal node t), BF* may not terminate.
Proof: We have already seen that all nodes n' expanded by BF* will all have f(n') <= F for some finite number F. Unfortunately the set of such nodes can be infinite when the graph is infinite (by example).
The following definitions apply whenever we define the heuristic evaluation function f in terms of the cost of paths (Path Based Evaluation Function):
The A Search Algorithm is the BF* algorithm when using a path-based evaluation function.
FACT: If the State Space Graph has a solution (a finite path from s to a goal node t), then A will terminate (i.e. A is Complete), but the solution may not be optimal (i.e. A is not Admissible).
Proof: As in the case of the BF* algorithm, we recognise that there is a number F such that all nodes removed from OPEN by the algorithm will have value at most F. Thus all such nodes will be at a finite distance from s (since the cost of infinite paths would be greater than F). The problem is that F may be larger than the cost of the optimal path, thus from OPEN may be extracted a non optimal goal state.
An heuristic function h is ADMISSIBLE iff for all n, h(n) <= h*(n).
The A* Search algorithm is the A algorithm when using an admissible heuristic function.
FACT: In A*, for any node n, n is extracted from OPEN iff f(n) <= f*(s).
Proof: => (If part)
Consider any node m in the optimal path in the state search graph: f(m) = g(m)+h(m) = g*(m)+h(m) <= g*(m)+h*(m) = f*(m) = f*(s). Since some element of the optimal path will always be in OPEN, the result follows.
<= (Only if part)
Consider a node m such f(m)<=f*(s). Since f is computed for a node only if m has been or is going to be in OPEN, m is at some point in OPEN and will be extracted since it has a better value than f*(s).
FACT: A* is Admissible.
Proof: Since by the previous result for each opened m we have f(m) <= f*(s), it follows that if we open a goal state then it is an optimal goal state.
FACT: In A* a node may be moved from CLOSED back to OPEN.
Proof: By example. Consider the graph with the nodes A, B, C, D, where A is the start node and D is the goal node. It has the arcs (A, B), (A, C), (B, C), and (C, D) of length respectively 2, 5, 2, and 5. Assuming that h(B)=7 and h(C)=3 (the value of h for A and D is immaterial), then the node C will be opened twice.
The following table shows the content of the OPEN and CLOSED list during the algorithm. Next to each node we keep its score.
============================================================== OPEN | {A} | {C(8),B(9)} | {B(9),D(10)} | {C(7),D(10)} ------------------------------------------------------------- CLOSED | {} | {A} | {A,C} | {A,B} ==============================================================
An heuristic function h1 is MORE INFORMED than heuristic function h2 iff for all n, h1(n) >= h2(n).
FACT: If A1 and A2 are two versions of the A* algorithm using respectively heuristic functions h1 and h2, where h1 is more informed than h2, then A1 dominates A2 (i.e. every node opened by A1 is also opened by A2).
Proof: Consider a node n opened by A1. We have f*(s) >= f1(n) = g(n)+h1(n) >= g(n)+h2(n) = f2(n). We have already seen that A* opens all nodes where f(n) < f*(s). Thus n will be opened by A2.
An heuristic function h is MONOTONE (or satisfies the TRIANGULAR INEQUALITY) if for any arc (n, n') we have h(n) <= h(n')+c(n, n')
An heuristic function h is CONSISTENT if for any two node n and n', we have: h(n) <= k(n, n')+h(n')
FACT: h is monotone iff h is consistent
Proof: Clearly consistency implies monotonicity, thus we only need to worry about monotonicity implying consistency. We need to consider only pairs n and n' where there is a path from n to n' (for other pairs the consistency inequality is trivially true). Let n1 n2 n3 .. nm , with n=n1, n'=nm, be the optimal path from n to n'. We have,
h(n) = h(n1) <= c(n1,n2)+h(n2) <= c(n1,n2)+c(n2,n3)+h(n3) <= <= c(n1,n2)+c(n2,n3)+..+c(nm-1,nm)+h(nm) = K(n,n')+h(nm).
FACT: Any consistent heuristic h is admissible.
Proof: We need to show that for all n, h(n) <= h*(n). Consider the goal optimal state t reacheable from n. Then, because of consistency, h(n) <= K(n,t)+h(t) = K(n,t) = h*(n).
FACT: Not every admissible heuristic h is consistent.
Proof: By example, using the same example introduced earlier to show that a node may move back from CLOSED to OPEN.
A*-M is A* when using a monotone heuristic.
FACT: For any node n that is opened by A*-M we have g(n) = g*(n).
Proof: We proceed by contradiction. Assume that for some opened node n, g(n) > g*(n), and n is the first opened node where this condition holds true. Thus there is an optimal path some of whose nodes have not been opened (otherwise g(n) would have been computed along that path). Say n' is one such node. Then f(n') = g(n')+h(n') = g*(n')+h(n') <= g*(n')+K(n',n)+h(n) Since we are considering an optimal path to n, g*(n')+K(n',n) is g*(n). Thus f(n') <= g*(n')+K(n',n)+h(n) = g*(n)+h(n) < g(n)+h(n) = f(n). But then n' is not open and with lower f than n, thus n' would be chosen before n. Contradiction.
FACT: In the sequence of nodes n1 n2 .. opened by the A*-M algorithm, we have f(n1) <= f(n2) <= ...
Proof: Suppose for all the nodes expanded up to i-1 we have f(s) = f(n1) <= f(n2) <= f(n3) <= .. <= f(n(i-1)) The next node opened, ni, is the successor of one of those nodes, say nj. Thus f(ni) = g(ni)+h(ni) = g(nj)+c(nj,ni)+h(ni) >= g(nj)+h(nj) = f(nj).
FACT: A*-M will never move a node from CLOSED to OPEN.
Proof: After the first time, any successive visit to a node n will have the same h value and an increasing g value. Thus the value of f at n will not decrease, thus n will not move from CLOSED back to OPEN.
BF ^ | Delayed Termination | BF* ^ | f*(n)=g*(n)+h*(n) and g*(n) is the cost of best path from s to n | | Path-Based Evaluation Function A ^ | h(n) <= h*(n) Admissible Heuristic | A* ^ | h(n') <= h(n)+c(n,n') Monotone Heuristic | A*-M
Nilsson,N.J.: Principles of Artificial Intelligence Tioga Publishing Co. 1980 Pages 72-84 Pearl,J.: Heuristics: Intelligent search strategies for computer problem solving Addison-Wesley, 1984 Pages 73-85
ingargiola@cis.temple.edu