CIS587: LOGIC, PART 3

Proof as Search
Some Proof Strategies
SLD-Resolution
Pure Prolog
Prolog
Forward-Chaining and Backward-Chaining
Some Review Thoughts

Proof as Search

Up to now we have exhibited proofs as if they were found miracolously. We gave formulae and showed proofs of the intended results. We did not exhibit how the proofs were derived.
We now examine how proofs can actually be found. In so doing we stress the close ties between theorem proving and search.

A General Proof Procedure

We use binary resolution [we represent clauses as sets] and we represent the proof as a tree, the Proof Tree. In this tree nodes represent clauses. We start with two disjoint sets of clauses INITIAL and OTHERS.

We create a node START and introduce an hyperarc from START to new nodes, each representing a distinct element of INITIAL. We put in a set OPEN all these new nodes. These nodes are called AND Nodes.
If OPEN is empty, we terminate. Otherwise we remove an element N from OPEN using a selection function SELECT.
If N is an AND node, we connect N with a single hyperarc to new nodes N1 ... Nm, one for each literal in the clause C represented at N. These nodes are also labeled with C and are called OR Nodes. All of these nodes are placed into OPEN.
[NOTE 1: In the case that C consists of a single literal, we can just say that N is now an OR node.]
[NOTE 2: One may decide not to create all the nodes N1 .. Nm in a single action and instead to choose one of the literals of C and to create a node Ni to represent C with this choice of literal. Ni is inserted into OPEN. N also goes back into OPEN if not all of its literals have been considered. The rule used to choose the literal in C is called a Selection Rule]
Repeat from 2.
If N is an OR node, say, corresponding to the ith literal of a clause C, we consider all the possible ways of applying binary resolution between C and clauses from the set OTHERS, resolving on the ith literal of C.
Let D1 .. Dp be the resulting clauses. We represent each of these clauses Dj by an AND node N(Dj) as in 1. above. We put an arc from N to N(Dj). We set OPEN to NEXT-OPEN(OPEN, C, {D1, .., Dp}). We set OTHERS to NEXT-OTHERS(OTHERS, C, {D1, .., Dp}). If in the proof tree we have, starting at START, a hyperpath (i.e. a path that may include hyperarcs) whose leaves have all label {}, we terminate with success.
[NOTE 3: One may decide, instead of resolving C on its ith literal with all possible element of OTHERS, to select one compatible element of OTHERS and to resolve C with it, putting this resolvent and C back into OPEN. We would call a rule for selecting an element of OTHERS a Search Rule.]
Repeat from 2.

In this proof procedure we have left indetermined:

The sets INITIAL and OTHERS of clauses
The function SELECT by which we choose which node to expand
The function NEXT-OPEN for computing the next value of OPEN
The function NEXT-OTHERS for computing the next value of OTHERS

There is no guaranty that for any choice of INITIAL, OTHERS, SELECT, NEXT-OPEN and NEXT-OTHERS the resulting theorem prover will be "complete", i.e. that everything that is provable will be provable with this theorem prover.

Example 1

Suppose that we want to prove (we already encountered this problem in the Logic 2 notes) that

NOT P(x) OR NOT R(x)
is inconsistent with the set of clauses:
NOT S(x) OR H(x)
NOT S(x) OR R(x)
S(b)
P(b)

The following are possible selections for the indeterminates:

INITIAL: {1.}, that is, it consists of the clauses representing the negation of the goal.
OTHERS: {2. 3. 4. 5.}, that is, it consists of the non-logical axioms.
SELECT: We use OPEN as a FIFO queue, i.e. we do breadth-first search.
NEXT-OPEN: It sets NEXT-OPEN(OPEN, C, {D1, .., Dp}) to the union of OPEN, {C}, and {D1, .., Dp}.
NEXT-OTHERS: It leaves OTHERS unchanged

The Proof Tree is then (we underline AND nodes and all their outgoing arcs are assumed to form a single hyperlink)

			START
			  |
			  |
		    NOT P(x) OR NOT R(x)
		     /---------------\
		   /                  \
	  NOT P(x) OR NOT R(x)   NOT P(x) OR NOT R(x)
	         |		      |
		 |		      |
	   NOT R(B)		 NOT P(x) OR NOT S(x)
		 |                /----------- \
		 |	         /              \
	   NOT S(B)	     NOT S(B)	     NOT P(B)
		 |              |		 |
		 |              |                |
		{}             {}               {}

At an OR node we apply resolution between the current clause at its selected literal and all the compatible elements of OTHERS.

Example 2

The following example uses Horn clauses in propositional logic. I use the notation that is common in Prolog: we represent the implication:

A1 AND A2 AND .. AND An IMPLIES A

A < = A1, A2, .. , An

Our problem is:

A, This is what we want to prove
A < = B, C
A < = D
B < = D, E
B < = F
C < =
C < = D, F
D < =
F < =

We now partially represent the proof tree. We do not apply breadth first because we want to see how great is the impact of the choise of SELECT.

				  A
			     /         \
                           /             \
                         /                 \
                     B, C                   D
                  /------\                  |
                /          \                |
              /              \              {}
            B,C              B,C
           /   \             |   \
         /      \            |     \
      C,D,E      C,F         B      B,D,F
     ............................................

You can keep on expanding the tree and see how depth first would generate a large tree while breadth first rapidly derives D from A, and {} from D.
In other circumstances other strategies would be appropriate as we see below.

Some Proof Strategies

From the early sixties people have looked for strategies that would simplify the search problem in theorem proving. Here are some of the strategies suggested.

Unit Preference

When we apply resolution if one of the premises is a unit clause (it has a single literal), the resolvent will have one less literal than the largest of the premises, thus getting closer to the desired goal {}. Thus it appears desirable to use resolution between clauses one of which is the unit clause. This unit preference is applied both when selecting from the OPEN set (i.e. at the leaves of the tree) and when we at an OR node we select an element of OTHERS to resolve with it.

Set of Support Strategy

When we use the Set of Support Strategy we have:

NEXT-OPEN(OPEN, C, {D1, .., Dp}) is the union of OPEN, {C}, and {D1,..,Dp}

NEXT-OTHERS(OTHERS, C, {D1,..,Dp}) is OTHERS.

In simple words, the set of support strategy uses the OPEN set as its set of support. Each application of resolution has as a premise an element of the set of support and adds that premise and the resolvent to the set of support.

Usually the INITIAL set (i.e. the initial set of support) consists of all the clauses obtained by negating the intended "theorem".
The set of support strategy is complete if

The OTHERS set of clauses is satisfiable. That is the case when we are given a satisfiable set of non-logical axioms and we use it as OTHERS.

Input Resolution

In Input Resolution:

INITIAL consists of the union of the negation of the "theorem" and of the set of non-logical axioms.
OTHERS usually is the set of non-logical axioms.
NEXT-OPEN(OPEN, C, {D1,..,Dp}) is the union of OPEN and {C}, that is, OPEN does not change in successive iterations
NEXT-OTHERS(OTHERS, C, {D1,..,Dp}) is the union of OTHERS, {C}, and {D1,..,Dp}.

In other words, in each resolution one of the premises is one of the original clauses.

In general Input Resolution is incomplete, as it can be seen with the following unsatisfiable set of clauses (from Nilsson) from which we are unable to derive FALSE using Input Resolution:

Q(u) OR P(A)
NOT Q(w) OR P(w)
NOT Q(x) OR NOT P(x)
Q(y) OR NOT P(y)

[You can use this set of clauses as both OPEN and OTHERS.]

Input resolution is complete in the case that all the clauses are Horn clauses.

Linear Resolution

Linear Resolution, also known as Ancestry-Filtered Form Strategy, is a generalization of Input Resolution. The generalization is that now in each resolution one of the clauses is one of the original clauses or it is an ancestor in the proof tree of the second premise.

Linear Resolution is complete.

SLD-Resolution

Selected Linear Resolution for Definite Clauses, or simply SLD-Resolution, is the basis for a proof procedure for theorems that are in the form of conjunctions of positive literals (this conjunction is called a Goal) given non-logical axioms that are definite clauses.

More precisely:

A Goal is the conjunction of positive literals
A Selection Rule is a method for selecting one of the literals of a goal (the first literal, or the last literal, etc.)
In SLD-Resolution, given a goal G = A1 .. An, a definite clause C: A < = B1 .. Bm, and a subgoal Ai selected from G using a Selection Rule S, where Ai and A are unifiable with MGU s, the Resolvent of G and C under S is
(A1 .. Ai-1 B1 ..Bm Ai+1..An).s
An SLD-Derivation of a goal G given a selection rule S and a set P of definite clauses, is a sequence of triples (Gi, Ci, si), for i from 0 to k, where
- G0 is G
- for each i > 0, Ci is obtained from a clause in P by replacement of all of its variables with new variables
- Gi+1 is the SLD-Resolvent of Gi and Ci by use of S and with MGU si.
A Refutation of a goal G given definite clauses P and selection rule S, is a finite SLD-derivation of G given S and P whose last goal is the null clause.
If s1 .. sk are the MGUs used in the refutation, then s = si.s2....sk is a substitution that, restricted to the variables of G, makes G true whenever the clauses of P are true.
The goal G succeeds for given selection rule S and set of definite clauses P if it has a refutation for P and S; otherwise it Fails.

Theorem: SLD-Resolution is Sound and Complete for conjunctive positive goals and definite clauses.

An important consequence of this theorem is that it remains true no matter the selection rule we use to select literals in goals. Thus we can select literals as we please, for instance left-to right. An other important aspect is that the substitution s = s1.s2...sn gives us a method for finding the individuals that satisfy the goal in the structures that satisfy the clauses in P.

Nothing has been said in SLD-Resolution about what rule should be used to select the clause of P to resolve with the current literal of the current goal (such a rule is called a Search rule).

Example (from Van Le)

Suppose that we are given the following clauses:

WOMAN(MOTHER(v)) ; Every mother is a woman
GOOD(HUSBAND(ANN)) ; The husband of Ann is good
GOOD(z) < = LIKES(MOTHER(z), z); if z likes his mother then z is good

and we want to find out a woman that likes's someone's husband.
The goal can be stated as:

WOMAN(x),LIKES(x,HUSBAND(y))
[NOTE: the variables x and y are implicitly existentially quantified.]

The SLD-Derivation is:

  ((WOMAN(x),LIKES(x,HUSBAND(y))), WOMAN(MOTHER(v)), [MOTHER(v)/x])
  ((LIKES(MOTHER(v),HUSBAND(y))), GOOD(z) < = LIKES(MOTHER(z), z), 
                                 [HUSBAND(y)/v, HUSBAND(y)/z])
  ((GOOD(HUSBAND(y))), GOOD(HUSBAND(ANN)), [ANN/y])
  {}

and the substitution is:

[MOTHER(HUSBAND(ANN))/x, ANN/y]

Pure Prolog

We are now ready to deal with (pure) Prolog, the major Logic Programming Language. It is obtained from a variation of SLD-Resolution that allows Horn clauses with the following rules and conventions:

The Selection Rule is to select the leftmost literals in the goal.
The Search Rule is to consider the clauses in the order they appear in the current list of clauses, from top to bottom.
Negation as Failure, that is, Prolog assumes that a literal L is proven if if it is unable to prove (NOT L)
Terms can be set equal to variables but not in general to other terms. For example, we can say that x=A and x=F(B) but we cannot say that A=F(B).
Resolvents are added to the bottom of the list of available clauses.

These rules make for very rapid processing. Unfortunately:

The Pure Prolog Inference Procedure is Sound but not Complete

This can be seen by example. Given

P(A,B)
P(C,B)
P(y,x) < = P(x,y)
P(x,z) < = P(x,y),P(y,z)

we are unable to derive in Prolog that P(A,C) because we get caught in an ever deepening depth-first search.

The derivation mechanism of Pure Prolog has a very simple form that can be described by the following flow chart due to Colmerauer:

Interpreter for Pure Prolog

Notational conventions:

i: used to index literals in a goal

K_i: indexes the clauses in the given program (i.e. set of clauses) P

Max: the number of clauses in P

h(G): the first literal of the goal G

t(G): the rest of goal G, i.e. G without its first literal

clause(K_i): the kith clause of the program

				START
				  |
				  v
			i  := O; Gi :=G; Ki := 0;
				  |
				  v	 No
      +------------------------>Gi = {}? ------------------>+
      ^				  |			    |
      |				  |Yes			    |
      |				  v			    |
 Gi+1 := (t(C),t(Gi)).Si;   OUTPUT ANSWER		    |
    i := i+1;			  |			    |
   Ki := 0;			  v      No		    |
      ^			 +----->i = 0?--------> i := i-1    |
      |			 ^	  |               |	    |
      |			 |	  | Yes           |	    |
      |			 |	  v               |	    |
      |			 |	END               |	    |
      |                  | Yes			  v	    |
      |			 +<-----Ki=Max? <---------+	    |
      |                         ^    |			    |
      |				|    v No		    v
     Si = FAIL?---------------->+    +------------------>Ki := Ki + 1;
        ^						    |
        |						    v
	+<--------------- C := COPY(Clause(Ki)); <----------+
			  Si := MGU(h(Gi), h(C));

Prolog

Real Prolog systems differ from pure Prolog for a number of reasons. Many of which have to do with the ability in Prolog to modify the control (search) strategy so as to achieve efficient programs. In fact there is a dictum due to Kowalski:

Logic + Control = Algorithm

But the reason that is important to us now is that Prolog uses a Unification procedure which does not enforce the Occur Test. This has an unfortunate consequence that, while Prolog may give origin to efficient programs,

Prolog is not Sound

Forward-Chaining and Backward-Chaining

We have managed to talk for a long time about theorem proving without ever introducing two important concepts, and Forward Chaining.

Forward-Chaining takes place when we add or assert a new formula to the set of formulae that we already know and we infer consequences of this new formula. For example, if we have know already two formulae (say, two non logical axioms) and we derive from them a new formula by resolution, we say that the resolvent is the result of a Forward Chaining Control Strategy. Forward-Chaining often needs some termination condition so that it does not keep on generating more and more, possibly irrelevant consequences. [In first-order login we know that we can infer an infinity of formulae from any given formula, for example, A, A AND A, A AND A AND A, ...
Backward-Chaining takes place when we ask a question, and we see how we can answer it by deriving formulae that, if answered, then from them we can answer the original question. In terms of search, it is like starting at the goal state with the question, and moving backward from it towards the initially given formulae. Prolog uses backward-chaining. The set-of-support strategy, and most other proof strategies, use mostly backward-chaining (forward chaining takes place when we resolve from the non-logical axioms or from formulae not derived from the goal clauses.

Some Review Thoughts

Logic has provided a good avenue for examining how we can represent knowledge and how we can reason with it. We have first established a representation language, First-Order Logic, and understood the fundamental semantic soncepts of Satisfiability, Truth, and Logical Consequence. Then we have shown how we can use Deductions, Theorems, Consistency to express through symbol manipulation many of the semantic concepts, and express Reasoning. We understood fundamental concepts like Soundness and Completeness for a logical system and showed that Resolution was the only thing we needed to achieve both soundness and completeness.

Once soundness and completeness are established, it becomes a scramble to come up with proof procedures that are efficient. Restrictions are introduced on the form of clauses that can be used (Horn clauses are particularly appealing). Restrictions are introduced in the way that proofs can be built. In the search for efficiency at times Completeness is given up, as in pure Prolog, and at times even Soundness is abandoned, as in Prolog.

Now that we have a "standard" way to represent knowledge and to reason in First Order Logic,

we can see how this machinery can be used to solve common problems,
we can examine alternative approaches to knowledge representation and reasoning, and
we can see limitation of these approaches, and in particular, of first-order logic.

One thing we have not said about first order logic:

First-Order Logic is Monotonic.

This means that if we can derive a formula A from a set of formulae F then if G is a set of formulae that contains F then we can derive F from G.

Do you think that is how it happens in life?

ingargiola@cis.temple.edu