CIS587: LOGIC, PART 3

Proof as Search
Some Proof Strategies
SLD-Resolution
Pure Prolog
Prolog
Forward-Chaining and Backward-Chaining
Some Review Thoughts

Proof as Search

Up to now we have exhibited proofs as if they were found miracolously. We gave formulae and showed proofs of the intended results. We did not exhibit how the proofs were derived.
We now examine how proofs can actually be found. In so doing we stress the close ties between theorem proving and search.

A General Proof Procedure

We use binary resolution [we represent clauses as sets] and we represent the proof as a tree, the Proof Tree. In this tree nodes represent clauses. We start with two disjoint sets of clauses INITIAL and OTHERS.
  1. We create a node START and introduce an hyperarc from START to new nodes, each representing a distinct element of INITIAL. We put in a set OPEN all these new nodes. These nodes are called AND Nodes.
  2. If OPEN is empty, we terminate. Otherwise we remove an element N from OPEN using a selection function SELECT.
  3. If N is an AND node, we connect N with a single hyperarc to new nodes N1 ... Nm, one for each literal in the clause C represented at N. These nodes are also labeled with C and are called OR Nodes. All of these nodes are placed into OPEN.
    [NOTE 1: In the case that C consists of a single literal, we can just say that N is now an OR node.]
    [NOTE 2: One may decide not to create all the nodes N1 .. Nm in a single action and instead to choose one of the literals of C and to create a node Ni to represent C with this choice of literal. Ni is inserted into OPEN. N also goes back into OPEN if not all of its literals have been considered. The rule used to choose the literal in C is called a Selection Rule]
    Repeat from 2.
  4. If N is an OR node, say, corresponding to the ith literal of a clause C, we consider all the possible ways of applying binary resolution between C and clauses from the set OTHERS, resolving on the ith literal of C.
    Let D1 .. Dp be the resulting clauses. We represent each of these clauses Dj by an AND node N(Dj) as in 1. above. We put an arc from N to N(Dj). We set OPEN to NEXT-OPEN(OPEN, C, {D1, .., Dp}). We set OTHERS to NEXT-OTHERS(OTHERS, C, {D1, .., Dp}). If in the proof tree we have, starting at START, a hyperpath (i.e. a path that may include hyperarcs) whose leaves have all label {}, we terminate with success.
    [NOTE 3: One may decide, instead of resolving C on its ith literal with all possible element of OTHERS, to select one compatible element of OTHERS and to resolve C with it, putting this resolvent and C back into OPEN. We would call a rule for selecting an element of OTHERS a Search Rule.]
    Repeat from 2.

In this proof procedure we have left indetermined:

There is no guaranty that for any choice of INITIAL, OTHERS, SELECT, NEXT-OPEN and NEXT-OTHERS the resulting theorem prover will be "complete", i.e. that everything that is provable will be provable with this theorem prover.

Example 1

Suppose that we want to prove (we already encountered this problem in the Logic 2 notes) that
  1. NOT P(x) OR NOT R(x)
    is inconsistent with the set of clauses:
  2. NOT S(x) OR H(x)
  3. NOT S(x) OR R(x)
  4. S(b)
  5. P(b)

The following are possible selections for the indeterminates:

INITIAL: {1.}, that is, it consists of the clauses representing the negation of the goal.
OTHERS: {2. 3. 4. 5.}, that is, it consists of the non-logical axioms.
SELECT: We use OPEN as a FIFO queue, i.e. we do breadth-first search.
NEXT-OPEN: It sets NEXT-OPEN(OPEN, C, {D1, .., Dp}) to the union of OPEN, {C}, and {D1, .., Dp}.
NEXT-OTHERS: It leaves OTHERS unchanged
The Proof Tree is then (we underline AND nodes and all their outgoing arcs are assumed to form a single hyperlink)
			START
			  |
			  |
		    NOT P(x) OR NOT R(x)
		     /---------------\
		   /                  \
	  NOT P(x) OR NOT R(x)   NOT P(x) OR NOT R(x)
	         |		      |
		 |		      |
	   NOT R(B)		 NOT P(x) OR NOT S(x)
		 |                /----------- \
		 |	         /              \
	   NOT S(B)	     NOT S(B)	     NOT P(B)
		 |              |		 |
		 |              |                |
		{}             {}               {}
At an OR node we apply resolution between the current clause at its selected literal and all the compatible elements of OTHERS.

Example 2

The following example uses Horn clauses in propositional logic. I use the notation that is common in Prolog: we represent the implication:

A1 AND A2 AND .. AND An IMPLIES A

as

A < = A1, A2, .. , An

Our problem is:
  1. A, This is what we want to prove
  2. A < = B, C
  3. A < = D
  4. B < = D, E
  5. B < = F
  6. C < =
  7. C < = D, F
  8. D < =
  9. F < =
We now partially represent the proof tree. We do not apply breadth first because we want to see how great is the impact of the choise of SELECT.
				  A
			     /         \
                           /             \
                         /                 \
                     B, C                   D
                  /------\                  |
                /          \                |
              /              \              {}
            B,C              B,C
           /   \             |   \
         /      \            |     \
      C,D,E      C,F         B      B,D,F
     ............................................

You can keep on expanding the tree and see how depth first would generate a large tree while breadth first rapidly derives D from A, and {} from D.
In other circumstances other strategies would be appropriate as we see below.

Some Proof Strategies

From the early sixties people have looked for strategies that would simplify the search problem in theorem proving. Here are some of the strategies suggested.

Unit Preference

When we apply resolution if one of the premises is a unit clause (it has a single literal), the resolvent will have one less literal than the largest of the premises, thus getting closer to the desired goal {}. Thus it appears desirable to use resolution between clauses one of which is the unit clause. This unit preference is applied both when selecting from the OPEN set (i.e. at the leaves of the tree) and when we at an OR node we select an element of OTHERS to resolve with it.

Set of Support Strategy

When we use the Set of Support Strategy we have:
NEXT-OPEN(OPEN, C, {D1, .., Dp}) is the union of OPEN, {C}, and {D1,..,Dp}
NEXT-OTHERS(OTHERS, C, {D1,..,Dp}) is OTHERS.

In simple words, the set of support strategy uses the OPEN set as its set of support. Each application of resolution has as a premise an element of the set of support and adds that premise and the resolvent to the set of support.

Usually the INITIAL set (i.e. the initial set of support) consists of all the clauses obtained by negating the intended "theorem".
The set of support strategy is complete if

Input Resolution

In Input Resolution: In other words, in each resolution one of the premises is one of the original clauses.

In general Input Resolution is incomplete, as it can be seen with the following unsatisfiable set of clauses (from Nilsson) from which we are unable to derive FALSE using Input Resolution:

[You can use this set of clauses as both OPEN and OTHERS.]

Input resolution is complete in the case that all the clauses are Horn clauses.

Linear Resolution

Linear Resolution, also known as Ancestry-Filtered Form Strategy, is a generalization of Input Resolution. The generalization is that now in each resolution one of the clauses is one of the original clauses or it is an ancestor in the proof tree of the second premise.

Linear Resolution is complete.

SLD-Resolution

Selected Linear Resolution for Definite Clauses, or simply SLD-Resolution, is the basis for a proof procedure for theorems that are in the form of conjunctions of positive literals (this conjunction is called a Goal) given non-logical axioms that are definite clauses.

More precisely:

Theorem: SLD-Resolution is Sound and Complete for conjunctive positive goals and definite clauses.

An important consequence of this theorem is that it remains true no matter the selection rule we use to select literals in goals. Thus we can select literals as we please, for instance left-to right. An other important aspect is that the substitution s = s1.s2...sn gives us a method for finding the individuals that satisfy the goal in the structures that satisfy the clauses in P.

Nothing has been said in SLD-Resolution about what rule should be used to select the clause of P to resolve with the current literal of the current goal (such a rule is called a Search rule).

Example (from Van Le)

Suppose that we are given the following clauses: and we want to find out a woman that likes's someone's husband.
The goal can be stated as: The SLD-Derivation is:
  ((WOMAN(x),LIKES(x,HUSBAND(y))), WOMAN(MOTHER(v)), [MOTHER(v)/x])
  ((LIKES(MOTHER(v),HUSBAND(y))), GOOD(z) < = LIKES(MOTHER(z), z), 
                                 [HUSBAND(y)/v, HUSBAND(y)/z])
  ((GOOD(HUSBAND(y))), GOOD(HUSBAND(ANN)), [ANN/y])
  {}
and the substitution is:

[MOTHER(HUSBAND(ANN))/x, ANN/y]

Pure Prolog

We are now ready to deal with (pure) Prolog, the major Logic Programming Language. It is obtained from a variation of SLD-Resolution that allows Horn clauses with the following rules and conventions: These rules make for very rapid processing. Unfortunately:

The Pure Prolog Inference Procedure is Sound but not Complete

This can be seen by example. Given we are unable to derive in Prolog that P(A,C) because we get caught in an ever deepening depth-first search.

The derivation mechanism of Pure Prolog has a very simple form that can be described by the following flow chart due to Colmerauer:

Interpreter for Pure Prolog

Notational conventions:
  • i: used to index literals in a goal
  • Ki: indexes the clauses in the given program (i.e. set of clauses) P
  • Max: the number of clauses in P
  • h(G): the first literal of the goal G
  • t(G): the rest of goal G, i.e. G without its first literal
  • clause(Ki): the kith clause of the program
  • 				START
    				  |
    				  v
    			i  := O; Gi :=G; Ki := 0;
    				  |
    				  v	 No
          +------------------------>Gi = {}? ------------------>+
          ^				  |			    |
          |				  |Yes			    |
          |				  v			    |
     Gi+1 := (t(C),t(Gi)).Si;   OUTPUT ANSWER		    |
        i := i+1;			  |			    |
       Ki := 0;			  v      No		    |
          ^			 +----->i = 0?--------> i := i-1    |
          |			 ^	  |               |	    |
          |			 |	  | Yes           |	    |
          |			 |	  v               |	    |
          |			 |	END               |	    |
          |                  | Yes			  v	    |
          |			 +<-----Ki=Max? <---------+	    |
          |                         ^    |			    |
          |				|    v No		    v
         Si = FAIL?---------------->+    +------------------>Ki := Ki + 1;
            ^						    |
            |						    v
    	+<--------------- C := COPY(Clause(Ki)); <----------+
    			  Si := MGU(h(Gi), h(C));
    

    Prolog

    Real Prolog systems differ from pure Prolog for a number of reasons. Many of which have to do with the ability in Prolog to modify the control (search) strategy so as to achieve efficient programs. In fact there is a dictum due to Kowalski:

    Logic + Control = Algorithm

    But the reason that is important to us now is that Prolog uses a Unification procedure which does not enforce the Occur Test. This has an unfortunate consequence that, while Prolog may give origin to efficient programs,

    Prolog is not Sound

    Forward-Chaining and Backward-Chaining

    We have managed to talk for a long time about theorem proving without ever introducing two important concepts, and Forward Chaining.

    Some Review Thoughts

    Logic has provided a good avenue for examining how we can represent knowledge and how we can reason with it. We have first established a representation language, First-Order Logic, and understood the fundamental semantic soncepts of Satisfiability, Truth, and Logical Consequence. Then we have shown how we can use Deductions, Theorems, Consistency to express through symbol manipulation many of the semantic concepts, and express Reasoning. We understood fundamental concepts like Soundness and Completeness for a logical system and showed that Resolution was the only thing we needed to achieve both soundness and completeness.

    Once soundness and completeness are established, it becomes a scramble to come up with proof procedures that are efficient. Restrictions are introduced on the form of clauses that can be used (Horn clauses are particularly appealing). Restrictions are introduced in the way that proofs can be built. In the search for efficiency at times Completeness is given up, as in pure Prolog, and at times even Soundness is abandoned, as in Prolog.

    Now that we have a "standard" way to represent knowledge and to reason in First Order Logic,

    One thing we have not said about first order logic:

    This means that if we can derive a formula A from a set of formulae F then if G is a set of formulae that contains F then we can derive F from G.

    Do you think that is how it happens in life?

    ingargiola@cis.temple.edu