Chapter 6
Adversarial Search  

powered by FreeFind

Modified: 

Outline

6.1 GAMES

Games versus search problems

 

Types of games

 

6.2 OPTIMAL DECISIONS IN GAMES

Game definition

Game tree

Initial state and legal moves define game tree (see below figure for tic-tac-toe)

initial state:                board empty, MAX turn

successor function:      For tic-tac-toe returns list of 9 (move, state)
                                  from initial state.
                                  [(0,[X,1,2,3,4,5,6,7,8]), (1,[0,X,2,3,4,5,6,7,8]), ...
 

Game tree (partial) of tic-tac-toe (2-player, deterministic, turns)

MAX plays X.
  • Are all the Terminal nodes generated at equal depth?
  • What is the minimum depth of a Terminal node?
  • Explain the values of Utility at Terminal nodes.

Minimax(root) = max(min(3,12,8),min(2,4,6),min(14,5,2)

In the above game tree, MAX makes the first choice, then MIN.
  • MIN picked A11, why?
  • MAX picked A1, why?
  • For the game trees at right, give the Minimax values.


def MINIMAX_DECISION(state):

   infinity = 1.e400

   def MAX_VALUE(state):
      if TERMINAL_TEST(state): return UTILITY(state)
      v = -infinity
      for (a, s) in SUCCESSORS(state):
         v = max(v, MIN_VALUE(s))
      return v

   def MIN_VALUE(state):
      if TERMINAL_TEST(state): return UTILITY(state)
      v = infinity
      for (a, s) in SUCCESSORS(state):
         v = min(v, MAX_VALUE(s))
      return v

   action, state = argmax(SUCCESSORS(state), lambda ((a, s)): MIN_VALUE(s))
   return action
 

The following is the tic-tac-toe board layout:

0 1 2
3 4 5
6 7 8
  • The initial state of the board with no moves is: [0,1,2,3,4,5,6,7,8].
     
  • After an X is moved to the center square, the state is: [0,1,2,3,X,5,6,7,8]
     
  • SUCCESSORS( [0,1,2,3,4,5,6,7,8] ) returns
          [(0,[X,1,2,3,4,5,6,7,8]), (1,[0,X,2,3,4,5,6,7,8]), ... or



    as an (action, state) tuple.
     
  • UTILITY(state) returns for X (MAX) win +1, O (MIN) win -1 and tie 0.
     
  • TERMINAL-STATE(state) is true when the state is a win or tie (board full)
  1. What is the branching factor at depth 0?
  2. At depth 1?
  3. What is the maximum depth?
  4. A MIN move will attempt to minimize or maximize the UTILITY result?
  5. Are states following a TERMINAL state (i.e. ) explored?
  6. Are all possible states explored to a TERMINAL state?
  7. Is this a depth or breadth first search? How do you know? (See the Python program below).
  8. Run the Minimax tic-tac-toe program; you'll play O. (Download)

Minimax (Download)

# Figure 6.3 p166 to implement tic-tac-toe

def MINIMAX_DECISION(state):
   """Given a state , calculate the best move by searching
       forward all the way to the terminal states. [Fig. 6.4]"""

   infinity = 1.e400

   def MAX_VALUE(state):
      if TERMINAL_TEST(state): return UTILITY(state)
      v = -infinity
      for (a, s) in SUCCESSORS(state):
         v = max(v, MIN_VALUE(s))
      return v

   def MIN_VALUE(state):
      if TERMINAL_TEST(state): return UTILITY(state)
      v = infinity
      for (a, s) in SUCCESSORS(state):
         v = min(v, MAX_VALUE(s))
      return v

   action, state = argmax(SUCCESSORS(state), lambda ((a, s)): MIN_VALUE(s))
   return action

def TERMINAL_TEST(state) :                   # win(state) = 'X', 'O' or None
   return win(state)!=None or tie(state)

def UTILITY(state) :
   if win(state) == 'X' : return +1
   if win(state) == 'O' : return -1
   return 0

def SUCCESSORS(state) :
   successors=[]
   open=0
   for move in range(9) :
      if state[move]==move : open = open + 1
      if open%2 == 1 : player = 'X'       # X makes odd numbered moves
      else : player = 'O'
   for move in range(9) :
      if state[move]==move :                # Its a 0, 1, 2, etc.
         successor = state[:]                  
         successor[move]=player             # Place the player
         successors.append((move,successor))
   return successors

def tie(state) :                                     # When no more moves
   for i in range(9) :
      if state[i]==i : return False
   return True

def win(state) :                                   
   for c in [0,3,6] :
      if state[c+0]==state[c+1] and state[c+0]==state[c+2] : return state[c+0]
   for c in [0,1,2] :
      if state[c+0]==state[c+3] and state[c+0]==state[c+6] : return state[c+0]
   if state[0]==state[4] and state[0]==state[8] : return state[4]
   if state[2]==state[4] and state[2]==state[6] : return state[4]
   return None

def display(state) :
   print
   for c in [0,3,6] :  print state[c+0],state[c+1],state[c+2]

def argmax(seq, fn) :     # Return seq that evaluate to maximum fn(seq[0..n])
   max = 0
   maxvalue = fn(seq[0])
   for i in range(len(seq)) :
      testvalue = fn(seq[i])
      if testvalue > maxvalue :
         max = i
         maxvalue = testvalue
   return seq[max]

def run() :
   board=[0,1,2,3,4,5,6,7,8]
   while not win(board) and not tie(board) :
      board[MINIMAX_DECISION(board)]='X'
      if not win(board) and not tie(board) :
         display(board)
         board[input('Your move? ')]='O'
   display(board)

Properties of Minimax

Explores all possible state outcomes to TERMINAL state of win or tie.

Resource limits

Evaluation functions

Exact value don't matter

a-b search

Idea: At a MAX level establish initial cutoff value alpha and prune any branches that are less, explore branches that are greater. At a MIN level establish initial cutoff beta and prune any branches that are greater, explore branches that are less.

a-b(root) = max(min(3,12,8),min(2,x,y),min(14,5,2)

                 = max(3,min(2,x,y),2)

                 = max(3,z,2)     where z <= 2

                 = 3

The maximum is independent of x and y; can prune without exploring. When examining 2 know that MIN will return 2 or less, MAX 3 or more.

a=highest-value so far for MAX

b=lowest-value so far for MIN

(a) MIN B node initially a=-,b=+; b=min(3,+); a=-,b=3.

(b) MIN B node, a=-,b=3; b=min(12,3); a=-,b=3

(c) MIN B node, a=-,b=3; b=min(8,3); a=-, b=3; return 3
     MAX A node, a=-,b=+; a=max(-,3); a=3, b=+

(d) MIN C node, a=3,b=+; b=min(2,+); 2<=3 can prune; return 2;
     MAX A node, a=3,b=+; a=max(3,2); a=3, b=+

(e) MIN D node, a=3,b=+; b=min(14,+); 14>3 cannot prune

(f) MIN D node, a=3,b=+; b=min(14,5); 5>3 cannot prune
    MIN D node, a=3,b=+; b=min(5,2); 2<=3 nothing to prune, return 2
    MAX A node, a=3,b=+; a=max(3,2); a=3, b=+; return 3

a-b pruning example

MAX A, MIN B

 

a-b pruning example

 

 

 

For the game trees at right, give the Minimax values.

For the game trees at right, give the a-b values.

Properties of a-b

Why called a-b

 

a-b search algorithm (Download Tic-Tac-Toe (Download)

# Figure 6.7 p170 to implement Figure 6.5 p168

def ALPHA_BETA_SEARCH(state):
   """Given a state in a game, calculate the best move by searching
       forward all the way to the terminal states. [Fig. 6.4]"""

   infinity = 1.e400

   def MAX_VALUE(state, alpha, beta):
      if TERMINAL_TEST(state): return UTILITY(state)
      v = -infinity

      for (a, s) in SUCCESSORS(state):
         v = max(v, MIN_VALUE(s, alpha, beta))
         if v >= beta : return v
         alpha = min(alpha,v)
      return v

   def MIN_VALUE(state, alpha, beta):
      if TERMINAL_TEST(state): return UTILITY(state)
      v = infinity

      for (a, s) in SUCCESSORS(state):
         v = min(v, MAX_VALUE(s, alpha, beta))
         if v <= alpha : return v
         beta = min(beta,v)
      return v

   action, state = argmax(SUCCESSORS(state), lambda ((a, s)): MIN_VALUE(s))
   return action

def TERMINAL_TEST(state) :  return isinstance(state, int)

def UTILITY(state) : return state

def SUCCESSORS(state) :  return STATE_SPACE[state]

def argmax(seq, fn) : # Return seq that evaluate to maximum fn(seq[0..n])
   max = 0
   maxvalue = fn(seq[0])
   for i in range(len(seq)) :
      testvalue = fn(seq[i])
      if testvalue > maxvalue :
         max = i
         maxvalue = testvalue
   return seq[max]

A,B,C,D='A','B','C','D'
STATE_SPACE={A:[B,C,D], B:[3,12,8], C:[2,4,1], D:[14,5,2]}

def run() :
   print ALPHA_BETA_SEARCH('A')
Note that the above algorithm returns the MAX value not the action by:

   return MAX_VALUE(state,-infinity, infinity)

To return the action use:

   return argmax(SUCCESSORS(state), lambda (s): MIN_VALUE(s, -infinity, infinity))
 

The following is the tic-tac-toe board layout:

0 1 2
3 4 5
6 7 8
  • Run the a-b tic-tac-toe program; you'll play O. (Download)

Deterministic games in practice

6.4 IMPERFECT, REAL-TIME DECISIONS

6.5 GAMES THAT INCLUDE AN ELEMENT OF CHANCE

Nondeterministic games: Backgammon

 

Nondeterministic games in general

 

 

Pruning in nondeterministic game trees

 

More pruning occurs if we can bound the leaf values