2011 R.SHANTHI VICTORIA ., B.E., M.E TAMIL NADU COLLEGE OF ENGINEERING COIMBATORE. Designed according to Anna University Coimbatore syllabus FOR IV YEAR CSE STUDENTS. 11/1/2011 ARTIFICIAL INTELLIGENCE 2 ARTIFICIAL INTELLIGENCE L T P M C 3 1 0 100 4 UNIT I Introduction and Problem Solving I 9 Artificial Intelligence: Definition-Turing Test-Relation with other Disciplines-History of AI Applications-Agent: Intelligent Agent-Rational Agent -Nature of Environments-Structure of Agent.-Problem Solving Agent -Problems: Toy Problems and Real-world Problems-Uninformed Search Strategies: BFS, DFS, DLS, IDS, Bidirectional Search -Comparison of uninformed search strategies. UNIT II Problem Solving II: 9 Informed Search Strategies-Greedy best-first search-A* search-Heuristic functions-Local search Algorithms and Optimization problems -Online Search Agent-Constraint Satisfaction Problems-Backtracking Search for CSP’s –Local Search for Constraint Satisfaction Problems-Structure of Problems -Adversarial Search-Optimal Decision in Games-Alpha-Beta Pruning-Imperfect Real Time Decisions-Games that Include an Element of Chance. UNIT III Knowledge Representation 9 First-Order Logic-Syntax and Semantics of First-Order-Logic-Using First-Order-Logic-Knowledge Engineering in First-Order-Logic.-Inference in First-Order-Logic-Inference rules-Unification and Lifting-Forward Chaining-Backward Chaining-Resolution. UNIT IV Learning 9 Learning from Observations-Forms of Learning-Learning Decision –Ensemble Learning -A Logical Formulation of Learning-Knowledge in Learning-Explanation Based Learning-Learning using Relevance Information-Inductive Logic Programming. UNIT V Applications 9 Communication –Communication as action -A formal grammar for a fragment of English – Syntactic Analysis – Augmented Grammars – Semantic Interpretation – Ambiguity and Disambiguation – Discourse Understanding – Grammar Induction. Perception –Image Formation –Early Image Processing Operations – Extracting Three Dimensional Information – Object Recognition – Using Vision for Manipulation and Navigation. Total:45 3 TEXT BOOKS: 1. Stuart Russell, Peter Norvig, “Artificial Intelligence – A Modern Approach”, 3rd Edition, Pearson Education /Prentice Hall of India 2010(yet to be published). 2. Nils J. Nilsson, “Artificial Intelligence: A new Synthesis”, Harcourt Asia Pvt. Ltd,2003. REFERENCES: 1. Elaine Rich and Kevin Knight, “Artificial Intelligence”, 2nd Edition, Tata McGraw-Hill, 2003. 2. Patrick Henry Winston, “Artificial Intelligence”, Pearson Education /PHI, 2004. 4 UNIT-1 INTRODUCTION AND PROBLEM SOLVING I DEFINITION: Artificial Intelligence is the study of how to make computers do things at which, at the moment, people are better. SOME DEFINITIONS OF AI Building systems that think like humans “The exciting new effort to make computers think … machines with minds, in the full and literal sense” --Haugeland, 1985 “The automation of activities that we associate with human thinking, … such as decision-making, problem solving, learning, …” --Bellman, 1978 Building systems that act like humans “The art of creating machines that perform functions that require intelligence when performed by people” --Kurzweil, 1990 “The study of how to make computers do things at which, at the moment, people are better” --Rich and Knight, 1991 Building systems that think rationally “The study of mental faculties through the use of computational models” --Charniak and McDermott, 1985 “The study of the computations that make it possible to perceive, reason, and act” --Winston, 1992 Building systems that act rationally “A field of study that seeks to explain and emulate intelligent behavior in terms of computational processes” --Schalkoff, 1990 “The branch of computer science that is concerned with the automation of intelligent behavior” --Luger and Stubblefield, 1993 TURING TEST It is proposed by Alan Turing 1950 .According to this test, a computer could be considered to be thinking only when a human interviewer, conversing with both an unseen human being and an unseen computer, could not determine which is which. Description: 2 human being,1 computer The computer would need to posses the following capabilities: The computer processing: to enable it to communicate successfully in English 5 Knowledge representation: to store what it knows or hears Automated reasoning: to use the stored information to answer questions and to draw new conclusions Machine learning: to adapt to new circumstances and to detect and extrapolate patterns. To pass the total Turing test, the computer will need, Computer vision: to perceive objects Robotics: to manipulate objects and move about Thinking and Acting Humanly Acting humanly "If it looks, walks, and quacks like a duck, then it is a duck” The Turing Test Interrogator communicates by typing at a terminal with TWO other agents. The human can say and ask whatever s/he likes, in natural language. If the human cannot decide which of the two agents is a human and which is a computer, then the computer has achieved AI this is an OPERATIONAL definition of intelligence, i.e., one that gives an algorithm for testing objectively whether the definition is satisfied Thinking humanly: cognitive modeling Develop a precise theory of mind, through experimentation and introspection, then write a computer program that implements it Example: GPS -General Problem Solver (Newell and Simon, 1961) trying to model the human process of problem solving in general Thinking Rationally-The laws of thought approach Capture ``correct'' reasoning processes” A loose definition of rational thinking: Irrefutable reasoning process How do we do this Develop a formal model of reasoning (formal logic) that “always” leads to the “right” answer Implement this model How do we know when we've got it right? when we can prove that the results of the programmed reasoning are correct soundness and completeness of first-order logic Example: 6 Ram is a student of III year CSE. All students are good in III year CSE. Ram is a good student. Acting Rationally Act so that desired goals are achieved The rational agent approach (this is what we’ll focus on in this course) Figure out how to make correct decisions, which sometimes means thinking rationally and other times means having rational reflexes correct inference versus rationality reasoning versus acting; limited rationality RELATION WITH OTHER DISCIPLINES: -Expert Systems -Natural Language Processor -Speech Recognition -Robotics -Computer Vision -Intelligent Computer-Aided Instruction -Data Mining -Genetic Algorithms • Philosophy Logic, methods of reasoning, mind as physical system foundations of learning, language, rationality • Mathematics Formal representation and proof algorithms, computation, (un)decidability, (in)tractability, probability • Economics utility, decision theory • Neuroscience physical substrate for mental activity • Psychology phenomena of perception and motor control, experimental techniques • Computer building fast computers engineering • Control theory design systems that maximize an objective function over time 7 • Linguistics knowledge representation, grammar HISTORY OF AI: • 1943 McCulloch & Pitts: Boolean circuit model of brain • 1950 Turing's "Computing Machinery and Intelligence" • 1956 Dartmouth meeting: "Artificial Intelligence" adopted • 1952—69 Look, Ma, no hands! • 1950s Early AI programs, including Samuel's checkers program, Newell & Simon's Logic Theorist, Gelernter's Geometry Engine • 1965 Robinson's complete algorithm for logical reasoning • 1966—73 AI discovers computational complexity Neural network research almost disappears • 1969—79 Early development of knowledge-based systems • 1980--AI becomes an industry • 1986--Neural networks return to popularity • 1987--AI becomes a science • 1995--The emergence of intelligent agents INTELLIGENT AGENT: Agent = perceive + act Thinking Reasoning Planning 8 Agent: entity in a program or environment capable of generating action. An agent uses perception of the environment to make decisions about actions to take. The perception capability is usually called a sensor. The actions can depend on the most recent perception or on the entire history (percept sequence). An agent is anything that can be viewed as perceiving its environment through sensors and acting upon the environment through actuators. Ex: Robotic agent Human agent Agent Sensors ACTUATORS ? E N V I R O N M ENT PERCEPTS ACTION 9 Agents interact with environment through sensors and actuators. Percept sequence action [A, clean] right [A, dirt] suck [B, clean] left [B, dirty] suck [A, clean], [A, clean] right [A, clean], [A, dirty] suck Fig: practical tabulation of a simple agent function for the vacuum cleaner world Agent Function The agent function is a mathematical function that maps a sequence of perceptions into action. The function is implemented as the agent program. The part of the agent taking an action is called an actuator. environment sensors agent function actuators environment A B 10 RATIONAL AGENT: A rational agent is one that can take the right decision in every situation. Performance measure: a set of criteria/test bed for the success of the agent's behavior. The performance measures should be based on the desired effect of the agent on the environment. Rationality: The agent's rational behavior depends on: the performance measure that defines success the agent's knowledge of the environment the action that it is capable of performing The current sequence of perceptions. Definition: for every possible percept sequence, the agent is expected to take an action that will maximize its performance measure. Agent Autonomy: An agent is omniscient if it knows the actual outcome of its actions. Not possible in practice. An environment can sometimes be completely known in advance. Exploration: sometimes an agent must perform an action to gather information (to increase perception). 11 Autonomy: the capacity to compensate for partial or incorrect prior knowledge (usually by learning). NATURE OF ENVIRONMENTS: Task environment – the problem that the agent is a solution to. Includes Performance measure Environment Actuator Sensors Agent Type Performance Measures Environment Actuators Sensors Taxi Driver Safe, Fast, Legal, Comfort, Maximize Profits Roads, other traffic, pedestrians, customers Steering, accelerators, brake, signal, horn Camera, sonar, GPS, Speedometer, keyboard, etc Medical diagnosis system Healthy patient, minimize costs, lawsuits Patient, hospital, staff Screen display (questions, tests, diagnoses, treatments, referrals) Keyboard (entry of symptoms, findings, patient's answers) Properties of Task Environment: • Fully Observable (vs. Partly Observable) – Agent sensors give complete state of the environment at each point in time – Sensors detect all the aspect that is relevant to the choice of action. – An environment might be partially observable because of noisy and inaccurate sensors or apart of the state are simply missing from the sensor data. • Deterministic (vs. Stochastic) – Next state of the environment is completely determined by the current state and the action executed by the agent 12 – Strategic environment (if the environment is deterministic except for the actions of other agent.) • Episodic (vs. Sequential) – Agent’s experience can be divided into episodes, each episode with what an agent perceive and what is the action • Next episode does not depend on the previous episode – Current decision will affect all future sates in sequential environment • Static (vs. Dynamic) – Environment doesn’t change as the agent is deliberating – Semi dynamic • Discrete (vs. Continuous) – Depends the way time is handled in describing state, percept, actions • Chess game : discrete • Taxi driving : continuous • Single Agent (vs. Multi Agent) – Competitive, cooperative multi-agent environments – Communication is a key issue in multi agent environments. Partially Observable: Ex: Automated taxi cannot see what other devices are thinking. Stochastic: Ex: taxi driving is clearly stochastic in this sense, because one can never predict the behavior of the traffic exactly. Semi dynamic: If the environment does not change for some time, then it changes due to agent’s performance is called semi dynamic environment. Single Agent Vs multi agent: An agent solving a cross word puzzle by itself is clearly in a single agent environment. An agent playing chess is in a two agent environment. Example of Task Environments and Their Classes 13 STRUCTURE OF AGENT: Simple Agents: 14 Table-driven agents: the function consists in a lookup table of actions to be taken for every possible state of the environment. If the environment has n variables, each with t possible states, then the table size is tn. Only works for a small number of possible states for the environment. Simple reflex agents: deciding on the action to take based only on the current perception and not on the history of perceptions. Based on the condition-action rule: (if (condition) action) Works if the environment is fully observable Four types of agents: 1. Simple reflex agent 2. Model based reflex agent 3. goal-based agent 4. utility-based agent Simple reflex agent Definition: SRA works only if the correct decision can be made on the basis of only the current percept that is only if the environment is fully observable. Characteristics – no plan, no goal 15 – do not know what they want to achieve – do not know what they are doing Condition-action rule – If condition then action Ex: medical diagnosis system. 16 Algorithm Explanation: Interpret – Input: Function generates an abstracted description of the current state from the percept. RULE-MATCH: Function returns the first rule in the set of rules that matches the given state description. RULE -ACTION: The selected rule is executed as action of the given percept. Model-Based Reflex Agents: Definition: An agent which combines the current percept with the old internal state to generate updated description of the current state. If the world is not fully observable, the agent must remember observations about the parts of the environment it cannot currently observe. This usually requires an internal representation of the world (or internal state). Since this representation is a model of the world, we call this model-based agent. Ex: Braking problem characteristics Reflex agent with internal state Sensor does not provide the complete state of the world. must keep its internal state Updating the internal world requires two kinds of knowledge How world evolves How agent’s action affect the world 17 Algorithm Explanation: UPDATE-INPUT: This is responsible for creating the new internal stated description. Goal-based agents: The agent has a purpose and the action to be taken depends on the current state and on what it tries to accomplish (the goal). In some cases the goal is easy to achieve. In others it involves planning, sifting through a search space for possible solutions, developing a strategy. Characteristics – Action depends on the goal. (consideration of future) – e.g. path finding 18 – Fundamentally different from the condition-action rule. – Search and Planning – Solving “car-braking” problem? – Yes, possible … but not likely natural. • Appears less efficient. Utility-based agents If one state is preferred over the other, then it has higher utility for the agent Utility-Function (state) = real number (degree of happiness) The agent is aware of a utility function that estimates how close the current state is to the agent's goal. • Characteristics – to generate high-quality behavior – Map the internal states to real numbers. (e.g., game playing) • Looking for higher utility value utility function 19 Learning Agents Agents capable of acquiring new competence through observations and actions. Learning agent has the following components Learning element Suggests modification to the existing rule to the critic Performance element Collection of knowledge and procedures for selecting the driving actions Choice depends on Learning element Critic Observes the world and passes information to the learning element Problem generator Identifies certain areas of behavior needs improvement and suggest experiments 20 Agent Example A file manager agent. Sensors: commands like ls, du, pwd. Actuators: commands like tar, gzip, cd, rm, cp, etc. Purpose: compress and archive files that have not been used in a while. Environment: fully observable (but partially observed), deterministic (strategic), episodic, dynamic, discrete. Agent vs. Program Size – an agent is usually smaller than a program. Purpose – an agent has a specific purpose while programs are multi-functional. Persistence – an agent's life span is not entirely dependent on a user launching and quitting it. Autonomy – an agent doesn't need the user's input to function. Problem Solving Agents • Problem solving agent – A kind of “goal based” agent – Finds sequences of actions that lead to desirable states. – 21 Formulate Goal, Formulate Problem Search Execute PROBLEMS Four components of problem definition – Initial state – that the agent starts in – Possible Actions • Uses a Successor Function – Returns pair • State Space – the state space forms a graph in which the nodes are states and arcs between nodes are actions. • Path – Goal Test – which determine whether a given state is goal state – Path cost – function that assigns a numeric cost to each path. • Step cost Problem formulation is the process of deciding what actions and states to consider, given a goal Path: A path in the state space is a sequence of states connected by a sequence of actions. The sequence of steps done by intelligent agent to maximize the performance measure: Goal Formulation: based on the current situation and the agent’s performance measure, it is the first step in problem solving. Problem Formulation: it is the process of deciding what actions and states to consider, given a goal. Search: the process of looking for different sequence. Solution: A search algorithm takes a problem as input and returns a solution in the form of an action sequence. Execution: Once a solution is found, the actions it recommends can be carried out called execution phase. Solutions 22 • A Solution to the problem is the path from the initial state to the final state • Quality of solution is measured by path cost function – Optimal Solution has the lowest path cost among other solutions – An Agent with several immediate options of unknown value can decide what to do by first examining different possible sequences of actions that lead to a state of known value, and then choosing the best sequence Searching Process – Input to Search : Problem – Output from Search : Solution in the form of Action Sequence A Problem solving Agent, Assuming the environment is • Static • Observable • Discrete • Deterministic Example 23 A Simplified Road Map of Part of Romania Explanation: • On holiday in Romania; currently in Arad • Flight leaves tomorrow from Bucharest • Formulate goal: – be in Bucharest • Formulate problem: – states: various cities – actions: drive between cities • Find solution: – sequence of cities, e.g., Arad, Sibiu, Fagaras, Bucharest TOY PROBLEM Example-1 : Vacuum World Problem Formulation • States – 2 x 22 = 8 states – Formula n2n states • Initial State 24 – Any one of 8 states • Successor Function – Legal states that result from three actions (Left, Right, Suck) • Goal Test – All squares are clean • Path Cost – Number of steps (each step costs a value of 1) 25 State Space for the Vacuum World. Labels on Arcs denote L: Left, R: Right, S: Suck Example-2 : The 8-Puzzle • States : Location of Tiles • Initial State : One of States • Successor Function : Move blank left, Right, Up, down • Goal Test : Shown in Fig. Above • Path Cost : 1 for each step Eight puzzle is from a family of “sliding –block puzzles” 26 • NP Complete • 8 puzzle has 9!/2 = 181440 states • 15 puzzle has approx. 1.3*1012 states • 24 puzzle has approx. 1*1025 states • Place eight queens on a chess board such that no queen can attack another queen • No path cost because only the final state counts! • Incremental formulations • Complete state formulations States : Any arrangement of 0 to 8 queens on the board. Arrangements of n queens, one per column in the leftmost n columns, with no queen attacking another are states Initial state : No queens on the board Successor function: Add a queen to an empty square. Add a queen to any square in the leftmost empty column such that it is not attacked by any other queen. 2057 sequences to investigate 27 Goal Test: 8 queens on the board and none are attacked. 64*63*…*57 = 1.8*1014 possible sequences. SOME MORE REAL-WORLD PROBLEMS • Route finding • Touring (traveling salesman) • Logistics • VLSI layout • Robot navigation • Learning Robotic assembly: States: real-valued co ordinates of robot joint angels part of the object to be assembled. Actions: continuous motion of robot joint. Goal test: complete assembly Path cost: time to execute. Route-finding Find the best route between two cities given the type & condition of existing roads & the driver’s preferences • Used in – computer networks – automated travel advisory systems – airline travel planning systems • path cost • money • seat quality • time of day • type of airplane Traveling Salesman Problem (TSP) • A salesman must visit N cities. • Each city is visited exactly once and finishing the city started from. • There is usually an integer cost c (a, b) to travel from city a to city b. 28 • However, the total tour cost must be minimum, where the total cost is the sum of the individual cost of each city visited in the tour. • Given a road map of n cities, find the shortest tour which visits every city on the map exactly once and then return to the original city (Hamiltonian circuit) • (Geometric version): – A complete graph of n vertices (on an unit square) – Distance between any two vertices: Euclidean distance – n!/2n legal tours – Find one legal tour that is shortest It’s an NP Complete problem no one has found any really efficient way of solving them for large n. Closely related to the Hamiltonian-cycle problem. VLSI layout • The decision of placement of silicon chips on breadboards is very complex. (or standard gates on a chip). • This includes – cell layout – channel routing • The goal is to place the chips without overlap. • Finding the best way to route the wires between the chips becomes a search problem. Searching for Solutions to VLSI Layout 29 • Generating action sequences • Data structures for search trees Generating action sequences • What do we know? – define a problem and recognize a solution • Finding a solution is done by a search in the state space • Maintain and extend a partial solution sequence UNINFORMED SEARCH STRATEGIES • Uninformed strategies use only the information available in the problem definition – Also known as blind searching – Uninformed search methods: • Breadth-first search • Uniform-cost search • Depth-first search • Depth-limited search • Iterative deepening search BREADTH-FIRST SEARCH Definition: 30 The root node is expanded first, and then all the nodes generated by the node are expanded. • Expand the shallowest unexpanded node • Place all new successors at the end of a FIFO queue Implementation: • • • • Properties of Breadth-First Search 31 • Complete – Yes if b (max branching factor) is finite • Time – 1 + b + b2 + … + bd + b(bd-1) = O(bd+1) – exponential in d • Space – O(bd+1) – Keeps every node in memory – This is the big problem; an agent that generates nodes at 10 MB/sec will produce 860 MB in 24 hours • Optimal – Yes (if cost is 1 per step); not optimal in general Lessons from Breadth First Search • The memory requirements are a bigger problem for breadth-first search than is execution time • Exponential-complexity search problems cannot be solved by uniformed methods for any but the smallest instances Ex: Route finding problem Given: 32 Task: Find the route from S to G using BFS. Step1: Step 2: Step3: 33 Step4: Answer : The path in the 2nd depth level that is SBG (or ) SCG. Time complexity 1+b+b²+……………………..+ O) DEPTH-FIRST SEARCH OR BACK TRACKING SEARCH: Definition: Expand one node to the depth of the tree. If dead end occurs, backtracking is done to the next immediate previous node for the nodes to be expanded • Expand the deepest unexpanded node • Unexplored successors are placed on a stack until fully explored • Enqueue nodes on nodes in LIFO (last-in, first-out) order. That is, nodes used as a stack data structure to order nodes. • It has modest memory requirement. • It needs to store only a single path from the root to a leaf node, along with remaining unexpanded sibling nodes for each node on a path • Back track uses less memory. Implementation: 34 35 36 Properties of Depth-First Search • Complete – No: fails in infinite-depth spaces, spaces with loops • Modify to avoid repeated spaces along path – Yes: in finite spaces 37 • Time – O(bm) – Not great if m is much larger than d – But if the solutions are dense, this may be faster than breadth-first search • Space – O(bm)…linear space • Optimal – No • When search hits a dead-end, can only back up one level at a time even if the “problem” occurs because of a bad operator choice near the top of the tree. Hence, only does “chronological backtracking” Advantage: • If more than one solution exists or no of levels is high then dfs is best because exploration is done only a small portion of the white space. Disadvantage: • No guaranteed to find solution. Example: Route finding problem Given problem: Task: Find a route between A to B Step 1: 38 Step 2: Step 3: Step 4: S B C A D G S B C A D 39 Answer: Path in 3rd level is SADG DEPTH-LIMITED SEARCH Definition: A cut off (Maximum level of the depth) is introduced in this search technique to overcome the disadvantage of Depth First Search. The cut off value depends on the number of states. DLS can be implemented as a simple modification to the general tree search algorithm or the recursive DFS algorithm. DLS imposes a fixed depth limit on a dfs. A variation of depth-first search that uses a depth limit – Alleviates the problem of unbounded trees – Search to a predetermined depth l (“ell”) – Nodes at depth l have no successors • Same as depth-first search if l = ∞ • Can terminate for failure and cutoff • Two kinds of failure Standard failure: indicates no solution Cut off: indicates no solution within the depth limit Properties of Depth-Limited Search • Complete – Yes if l < d 40 • Time – N(IDS)=(d)b+(d-1)b²+……………………..+(1) – O(bl) • Space – O(bl) • Optimal – No if l > d Advantage: • Cut off level is introduced in DFS Technique. Disadvantage: • No guarantee to find the optimal solution. E.g.: Route finding problem Given: The number of states in the given map is five. So it is possible to get the goal state at the maximum depth of four. Therefore the cut off value is four. Task: find a path from A to E. 1. 2. 3. 4. A B C D E A A B C A B C D A B C D 41 Answer: Path = ABDE Depth=3 ITERATIVE DEEPENING SEARCH (OR) DEPTH-FIRST ITERATIVE DEEPENING (DFID): Definition: • Iterative deepening depth-first search It is a strategy that steps the issue of choosing the best path depth limit by trying all possible depth limit Uses depth-first search Finds the best depth limit Gradually increases the depth limit; 0, 1, 2, … until a goal is found Iterative Lengthening Search: The idea is to use increasing path-cost limit instead of increasing depth limits. The resulting algorithm called iterative lengthening search. Implementation: 42 Properties of Iterative Deepening Search: • Complete – Yes • Time : N(IDS)=(d)b+(d-1)b2+…………+(1)bd – O(bd) • Space – O(bd) • Optimal – Yes if step cost = 1 – Can be modified to explore uniform cost tree Advantages: • This method is preferred for large state space and when the depth of the search is not known. • Memory requirements are modest. • Like BFS it is complete 43 Disadvantages: Many states are expanded multiple times. Lessons from Iterative Deepening Search • If branching factor is b and solution is at depth d, then nodes at depth d are generated once, nodes at depth d-1 are generated twice, etc. – Hence bd + 2b(d-1) + ... + db <= bd /(1 -1/b)2 = O(bd). – If b=4, then worst case is 1.78 * 4d, i.e., 78% more nodes searched than exist at depth d (in the worst case). • Faster than BFS even though IDS generates repeated states – BFS generates nodes up to level d+1 – IDS only generates nodes up to level d • In general, iterative deepening search is the preferred uninformed search method when there is a large search space and the depth of the solution is not known Example: Route finding problem Given: Task: Find a path from A to G. Limit=0 Limit=1 Limit=2 1. A B C D E F G A A B C F A B C F D 44 2. 3. A-B-D-E-G Limit 4 A B D-E-G A-C-E-G Limit 3 A-F-G-Limit 2 Answer: Since it is a IDS tree the lowest depth limit (i.e.) A-F-G is selected as the solution path. BI-DIRECTIONAL SEARCH Definition: It is a strategy that simultaneously searches both the directions (i.e) forward from the initial state and backward from the goal state and stops when the two searches meet in the middle. F A B C F G G 45 • Alternate searching from the start state toward the goal and from the goal state toward the start. • Stop when the frontiers intersect. • Works well only when there are unique start and goal states. • Requires the ability to generate “predecessor” states. • Can (sometimes) lead to finding a solution more quickly. Properties of Bidirectional Search: 1. Time Complexity: O(b d/2) 2. Space Complexity: O(b d/2) 3. Complete: Yes 4. Optimal: Yes Advantages: Reduce time complexity and space complexity Disadvantages: The space requirement is the most significant weakness of bi-directional search. If two searches do not meet at all, complexity arises in the search technique. In backward search calculating predecessor is difficult task. If more than one goal state exists then explicitly, multiple state searches are required. Ex: Route Finding Problem Given: A B C D E 46 Task: Find a path from A to E Search from forward (A): Search from backward (E): Answer: Solution path is A-C-E. COMPARING UNINFORMED SEARCH STRATEGIES • Completeness – Will a solution always be found if one exists? • Time – How long does it take to find the solution? A B C A E E D C 47 – Often represented as the number of nodes searched • Space – How much memory is needed to perform the search? – Often represented as the maximum number of nodes stored at once • Optimal – Will the optimal (least cost) solution be found? • Time and space complexity are measured in – b – maximum branching factor of the search tree – m – maximum depth of the state space – d – depth of the least cost solution 48 UNIT-2 PROBLEM SOLVING-II INFORMED SEARCH STRATEGIES: Heuristic /Informed It uses additional information about nodes (heuristics) that have not yet been explored to decide which nodes to examine next Use problem specific knowledge Can find solutions more efficiently than search strategies that do not use domain specific knowledge. find solutions even when there is limited time available General approach of informed search: Best-first search: node is selected for expansion based on an evaluation function f(n) Idea: evaluation function measures distance to the goal. Choose node which appears best • Best First Search algorithms differs in the evaluation function – Evaluation function incorporate the problem specific knowledge in the form of h(n) – h(n) heuristic function , a component of f(n) Estimated cost of cheapest path to the goal node • h(n) = 0, if n is the goal node Implementation: fringe is queue sorted in decreasing order of desirability. Special cases: greedy search, A* search GREEDY BEST-FIRST SEARCH • Expands the node that is closest to the goal • Consider route finding problem in Romania 49 – Use of hSLD, Straight Line Distance Heuristic – Evaluation function f(n) = h(n) (heuristic), estimate of cost from n to goal Definition: A best first search that uses to select next node to expand is called greedy search. Ex: Given, 75 99 211 140 118 101 80 97 f(n)=h(n) Straight line distance to B from A A – 366 R – 193 B -0 S – 253 F – 178 T – 329 P – 98 Z – 374 Solution: From the given graph and estimated cost the goal state is estimated as B from A. Apply the evaluation function h(n) to find a path from A to B. 1. 2. h=374 h=253 S A Z T R B P F A S A Z T 50 h=329 3. S is selected for next level of expansion since h(n) is minimum from S, when comparing T & Z. h=374 h=178 h=253 h=366 h=329 h =193 4. F is selected for next level of expansion since h(n) is minimum for F. h=374 h=253 3 h=178 h=253 h=366 h=0 h=329 h=193 From F goal state B is reached. Therefore the path from A to B using greedy search is A-S-F-B = 450 (i.e.) (140+99+211). For the problem of finding route from Arad to Burcharest… S A Z T R F A S A Z T R B S F A 51 Greedy search example: Assume that we want to use greedy search to solve the problem of travelling from Arad to Bucharest. The initial state=Arad The first expansion step produces: Sibiu, Timisoara and Zerind Greedy best-first will select Sibiu. Arad Sibiu(253) Timisoara(329) Zerind(374) 52 If Sibiu is expanded we get: Arad, Fagaras, Oradea and Rimnicu Vilcea Greedy best-first search will select: Fagaras If Fagaras is expanded we get: Sibiu and Bucharest Goal reached !! Yet not optimal (see Arad, Sibiu, Rimnicu Vilcea, Pitesti) GREEDY SEARCH, EVALUATION: Completeness: NO (cfr. DF-search) Check on repeated states Minimizing h(n) can result in false starts, e.g. Iasi to Fagaras. 53 Properties of greedy best-first search: • Complete? No – can get stuck in loops, e.g., Iasi Neamt Iasi Neamt • Time? O(bm), but a good heuristic can give dramatic improvement • Space? O(bm) --keeps all nodes in memory • Optimal? No A* SEARCH • A better form of best-first search • f(n)=g(n) + h(n) – g(n) the cost to reach the node – h(n) the estimated cost of the cheapest path from n to the goal – f(n) the estimated cost of the cheapest solution through the node n • h(n) Admissible Heuristic /Consistent (in case of graph search) Admissible heuristics • An admissible heuristic never overestimates the cost to reach the goal from a given state, i.e., it is optimistic 54 • Example: hSLD(n) (never overestimates the actual road distance) • Theorem: If h(n) is admissible, A* using TREE-SEARCH is optimal A* Search Definition: A best first search using f as an evaluation function and an admissible h function is known as A* search. f(n)=g(n)+h(n) Ex: Given, 75 99 211 140 118 101 80 97 Straight line distance to B from A A – 366 R – 193 B -0 S – 253 F – 178 T – 329 P – 98 Z – 374 Solution: From the given graph and estimated cost the goal state is estimated as B from A. Apply the evaluation function f(n)=g(n)+h(n) to find a path from A to B. 1. f=0+366=366 2. f=75+374=449 f=140+253=393 S A Z T R B P F A S A Z T 55 f=118+329=447 3. S is selected for next level of expansion since f(S) is minimum from S, when comparing T & Z. f=449 f=293+178=417 f=393 f=280+366=646 f=447 f=220+193=413 How to calculate g(n) Ex: A-S-A 140+140=280 A-S-F 140+99=239 4. R is selected for next level of expansion since f(R) is minimum when comparing to A and F. f=449 f=417 f=393 f=646 f=553 f=447 f=413 f=415 S A Z T R F A S A Z T R P S F A 56 5. P is selected for next level of expansion since f(P) is minimum. f=449 f=417 f=646 f=393 f=553 f=447 f=413 f=317+193=510 f=415 f=418+0=418 From P, goal state B is reached. Therefore the path from A to B using A* search is A-S-R-P-B = 418 (i.e.) (40+80+97+101). A* search example: Find Bucharest starting at Arad f(Arad) = c(??,Arad)+h(Arad)=0+366=366 Expand Arrad and determine f(n) for each node f(Sibiu)=c(Arad,Sibiu)+h(Sibiu)=140+253=393 S A Z T R P S F A B R 57 f(Timisoara)=c(Arad,Timisoara)+h(Timisoara)=118+329=447 f(Zerind)=c(Arad,Zerind)+h(Zerind)=75+374=449 Best choice is Sibiu Expand Sibiu and determine f(n) for each node f(Arad)=c(Sibiu,Arad)+h(Arad)=280+366=646 f(Fagaras)=c(Sibiu,Fagaras)+h(Fagaras)=239+179=415 f(Oradea)=c(Sibiu,Oradea)+h(Oradea)=291+380=671 f(Rimnicu Vilcea)=c(Sibiu,Rimnicu Vilcea)+ h(Rimnicu Vilcea)=220+192=413 Best choice is Rimnicu Vilcea Expand Rimnicu Vilcea and determine f(n) for each node f(Craiova)=c(Rimnicu Vilcea, Craiova)+h(Craiova)=360+160=526 f(Pitesti)=c(Rimnicu Vilcea, Pitesti)+h(Pitesti)=317+100=417 f(Sibiu)=c(Rimnicu Vilcea,Sibiu)+h(Sibiu)=300+253=553 Best choice is Fagaras 58 Expand Fagaras and determine f(n) for each node f(Sibiu)=c(Fagaras, Sibiu)+h(Sibiu)=338+253=591 f(Bucharest)=c(Fagaras,Bucharest)+h(Bucharest)=450+0=450 Best choice is Pitesti !!! Expand Pitesti and determine f(n) for each node f(Bucharest)=c(Pitesti,Bucharest)+h(Bucharest)=418+0=418 Best choice is Bucharest !!! Optimal solution (only if h(n) is admissable) Note values along optimal path !! Optimality of A* (proof) • Suppose some suboptimal goal G2 has been generated and is in the fringe. Let n be an unexpanded node in the fringe such that n is on a shortest path to an optimal goal G. 59 Suppose suboptimal goal G2 in the queue. Let n be an unexpanded node on a shortest to optimal goal G. f(G2 ) = g(G2 ) since h(G2 )=0 > g(G) since G2 is suboptimal >= f(n) since h is admissible Since f(G2) > f(n), A* will never select G2 for expansion Discards new paths to repeated state. Previous proof breaks down Solution: Add extra bookkeeping i.e. remove more expensive of two paths. Ensure that optimal path to any repeated state is always first followed. Extra requirement on h(n): consistency (monotonicity) Consistency: 60 A heuristic is consistent if )'()',,()(nhnancnh If h is consistent, we have )()()()'()',,()()'()'()'(nfnhngnhnancngnhngnf i.e. f(n) is nondecreasing along any path. • Theorem: If h(n) is consistent, A* using GRAPH-SEARCH is optimal Optimality of A*(more usefull) A* expands nodes in order of increasing f value Contours can be drawn in state space Uniform-cost search adds circles. F-contours are gradually Added: 1) nodes with f(n)C* – These nodes are said to be pruned – This pruning still guarantees optimality 62 • Expands nodes in the increasing order of costs • A* is optimally efficient – For a given heuristic, A* finds optimal solution with the fewest number of nodes expansion – Any algorithm that doesn’t expand nodes with f(n)C* Memory Bounded Heuristic Search: 2 types of Memory Bounded algorithm IDA* MA* Memory bounded heuristic search: The memory requirement of A* is reduced by combining the heuristic fn with iterative depending resulting on IDA* algorithm. IDA* search: Space capacity: bd Time complexity: difficult to characterize depends on the number of different values that the heurestic function can take on Optimality:yes Completeness:Yes 63 Disadvantages: It will require more storage space in complex domain (i.e)Each contour will include only one state with the previous contour.To avoid this, We increase the f-cost limit by a fixed amount E on each iteration.So that the total no.of.iteration is proportional to 1/E.Such an algorihm is called E admissible. Function IDA*(problem) returns a solution sequence. Inputs:problem,a problem Local Variable:F-limit, the current f-cost limit root, a node Root ←MAKE-NODE (INITIAL –STATE[problem]) f-limit←f-cost(root) loop do solutions,f-limit←DFS-CONTOUR (root,f-limit) If solution is non null then return solution If f-limit =∞Then return failure End Properties of SMA* search: 1.Complete:Yes If the available memory is sufficient to store the deepest solution path 2.Time&space complexity: Depends on the available no.of.nodes 3.Optimal:If enough memory is available to store the deepest solution path, otherwise it returns the best solution that can be reached with available memory Advantage : SMA* uses only the available memory Disadvantage:if enough memory is not available it leads to unoptional solutions. • Iterative-deepening A* (IDA*) – uses f-value (g + h) as the cutoff – Keep increase the cutoff for each iteration by the smallest amount, in excess of the current cutoff 64 Recursive best-first search RECURSIVE BEST FIRST SEARCH ALGORITHM: 1.After expanding A,S and R the current best leaf(p) has a value that is worse then the best alernative path(f). 393 449 447 646 415 413 417 553 2.f-limit value of each recursive call is shown on the top of each current node. 3.After expanding R,the condition f[best]>f-limit(417>415) is true and returns f[best] to that node. 4.After unwinding back to & expanding F A S T Z A F R S P infinity 447 A S 447 infinity 65 393 449 447 646 415 417 591 450 Hints: F[best]>f[limit] True F[best] Here the f[best] is 450 and which is greater than the f-limit of 417.Therefore it returns and unwind with f[best] value to that node. After switching back to R and expanding P. 393 449 447 T Z A F R B S A S T Z 447 infinity 66 646 450 417 417 553 418 607 The best alternative path through T costs atleast 447,therefore the path through R and P is considered as the best one. • Keeps track of the f-value of the best-alternative path available. – If current f-values exceeds this alternative f-value than backtrack to alternative path. – Upon backtracking change f-value to best f-value of its children. – Re-expansion of this result is thus still possible. Example: A F R S P 447 B R 67 RBFS evaluation • RBFS is a bit more efficient than IDA* – Still excessive node generation (mind changes) • Like A*, optimal if h(n) is admissible • Space complexity is O(bd). – IDA* retains only one single number (the current f-cost limit) • Time complexity difficult to characterize – Depends on accuracy if h(n) and how often best path changes. • IDA* en RBFS suffer from too little memory. A search technique(algorithm) which uses all available memory are: 1.MA*(memory bounded A*) 2.SMA*(simplified MA*) 68 (simplified) memory-bounded A*: • Use all available memory. – I.e. expand best leafs until available memory is full – When full, SMA* drops worst leaf node (highest f-value) – Like RFBS backup forgotten node to its parent • What if all leafs have the same f-value? – Same node could be selected for expansion and deletion. – SMA* solves this by expanding newest best leaf and deleting oldest worst leaf. – SMA* is complete if solution is reachable, optimal if optimal solution is reachable. Properties of SMA* search: 1.Complete:Yes If the available memory is sufficient to store the deepest solution path 2.Time&space complexity: Depends on the available no.of.nodes 3.Optimal:If enough memory is available to store the deepest solution path, otherwise it returns the best solution that can be reached with available memory Advantage : SMA* uses only the available memory Disadvantage:if enough memory is not available it leads to unoptional solutions. 0+12=12 8 8+5=13 A B G 69 RBFS (mimics depth-first search, but backtracks if the current path is not promising and a better path exist. advantage: limited size of open list disadvantage: excessive node regeneration) Learning to Search: The idea is to search at the meta-level space.ach state here is a search tree. The goal is to learn from different search strategies to avoid exploring useless parts of the tree. Several fixed strategies – BFS, Greedy Best First Search Could an agent learn how to search better? C D D 70 Answer: Yes. Reason: Metalevel state space Each state in metalevel state space capture the internal state of a program that is searching in an object level state space HEURISTIC FUNCTIONS: Heuristic function A key compound of Best First Search algorithm is heuristic function denoted h(n) h(n)= Estimated cost of the cheapest path from node n to a goal node Heuristic functions are the most common form in which additional knowledge of the problem imparted to the search algorithm. 71 A good heuristic function can reduce the search process. h(n) = estimated cost of the cheapest path from node n to goal node. If n is goal then h(n)=0 Effective branching factor: Ex: Depth=5 N=52 Effective branching factor=1.91 Assume the branching factor=2 depth=5 N= 1+2+(2)^2+(2)^3+(2)^4+(2)^5 = 1+2+4+8+16+32= 63 From the above example, the number of nodes in effective branching factor is reduced, when comparing to branching factor. That is, Difference =63-52 = 11 E.g for the 8-puzzle: – Avg. solution cost is about 22 steps (branching factor +/-3) – Exhaustive search to depth 22: 3.1 x 1010 states. 72 Common candidates: • h1 = the number of misplaced tiles – h1(s)=8 h2 = the sum of the distances of the tiles from their goal positions (manhattan distance). (or) city block distance h2(s)=3+1+2+2+2+3+3+2=18 The effect of heuristic accuracy on performance: Effective branching factor b* – Is the branching factor that a uniform tree of depth d would have in order to contain N+1 nodes. dbbbN*)(...*)(*112 – Measure is fairly constant for sufficiently hard problems. • Can thus provide a good guide to the heuristic’s overall usefulness. • A good value of b* is 1. Heuristic quality and dominance Q: Whether h2 is always better than h1? A: Yes. Reason: h2 (n)>h1 (n) • 1200 random problems with solution lengths from 2 to 24. • If h2(n) >= h1(n) for all n (both admissible) then h2 dominates h1 and is better for search Inventing admissible heuristics 73 Admissible heuristics can be derived from the exact solution cost of a relaxed version of the problem: Relaxed problem: Relaxed problem: A problem with fewer restrictions on the action is called Relaxed Problem. Ex: 8-puzzle problem Relaxed 8-puzzle for h1 : a tile can move anywhere As a result, h1(n) gives the shortest solution Relaxed 8-puzzle for h2 : a tile can move to any adjacent square. As a result, h2(n) gives the shortest solution. The optimal solution cost of a relaxed problem is no greater than the optimal solution cost of the real problem. Admissible heuristics can also be derived from the solution cost of a subproblem of a given problem. This cost is a lower bound on the cost of the real problem. Pattern databases store the exact solution to for every possible subproblem instance. Disjoint pattern databases: The problem can be divided up in such a way that each move affects only one subproblem. The complete heuristic is constructed using the patterns in the DB Another way to find an admissible heuristic is through learning from experience: Experience = solving lots of 8-puzzles 74 An inductive learning algorithm can be used to predict costs for other states that arise during search. LOCAL SEARCH ALGORITHM AND OPTIMIZATION PROBLEMS: Local search algorithm and optimization problem: Definition: Local search algorithm operates using a single current state and generally moves only to neighbours of that state. A landscape has Location – defined by the state Elevation – defined by the value of the heuristic cost function (or) objective function. Global minimum -finds the lowest valley. Global maximum – finds the highest peak. A complete local search always finds a goal if one exists. An optimal algorithm always finds a global minimum/maximum Applications: 1. Integrated – circuit design 2. Factory – floor layout 3. Job – shop scheduling 4. Automatic programming 5. Vehicle routing 6. Telecommunication network optimization Local Search Algorithm Types: Hill climbing search Simulated annealing Local beam search Genetic algorithm • They are useful for solving optimization problems – Aim is to find a best state according to an objective function e.g. survival of the fittest as a metaphor for optimization 75 – Many optimization problems do not fit the standard search model outlined in chapter 3 – E.g. There is no goal test or path cost in Darwinian evolution • State space landscape • Local search= use single current state and move to neighboring states. Advantages: – Use very little memory – Find often reasonable solutions in large or infinite state spaces. Hill-climbing search “is a loop that continuously moves in the direction of increasing value” It terminates when a peak is reached. Hill climbing does not look ahead of the immediate neighbors of the current state. Hill-climbing chooses randomly among the set of best successors, if there is more than one. Hill-climbing a.k.a. greedy local search Algorithm: function HILL-CLIMBING( problem) return a state that is a local maximum 76 input: problem, a problem local variables: current, a node. neighbor, a node. current MAKE-NODE(INITIAL-STATE[problem]) loop do neighbor a highest valued successor of current if VALUE [neighbor] ≤ VALUE[current] then return STATE[current] current neighbor Example: 8-queens problem (complete-state formulation). Successor function: move a single queen to another square in the same column. Heuristic function h(n): the number of pairs of queens that are attacking each other (directly or indirectly). A) b) 77 a) shows a state of h=17 and the h-value for each possible successor. b) A local minimum in the 8-queens state space (h=1). Drawbacks Ridge = sequence of local maxima difficult for greedy algorithms to navigate Plateaux = an area of the state space where the evaluation function is flat. Gets stuck 86% of the time. Hill-climbing variations: Stochastic hill-climbing Random selection among the uphill moves. 78 The selection probability can vary with the steepness of the uphill move. First-choice hill-climbing cfr. stochastic hill climbing by generating successors randomly until a better one is found. Random-restart hill-climbing-it conducts a series of hill climbing searches from randomly generated initial state,stopping when a goal is found. Tries to avoid getting stuck in local maxima. Simulated annealing Search: An algorithm which combines with random walk to yield both the efficiency and completeness. Escape local maxima by allowing “bad” moves. Idea: but gradually decrease their size and frequency. Origin; metallurgical annealing-the process of gradually cooling a liquid until it freezes. Ex: Bouncing ball analogy: Shaking hard (= high temperature). Shaking less (= lower the temperature). Properties of simulated annealing search If T decreases slowly enough, best state is reached. Applied for VLSI layout, airline scheduling, etc.(applications) Algorithm: function SIMULATED-ANNEALING( problem, schedule) return a solution state input: problem, a problem schedule, a mapping from time to temperature local variables: current, a node. next, a node. T, a “temperature” controlling the probability of downward steps current MAKE-NODE(INITIAL-STATE[problem]) 79 for t 1 to ∞ do T schedule[t] if T = 0 then return current next a randomly selected successor of current ∆E VALUE[next] -VALUE[current] if ∆E > 0 then current next else current next only with probability e∆E /T Algorithm Expanation: ∆E -∆E variable is introduced to calculate the probability of worse end T – it is introduced to determine the probability, which measures temperatures. VALUE-it corresponds to the total energy of the atoms in the material. Schedule-it determines the rate at which the temperature is lowered. The K-Means Algorithm 1. Choose a value for K, the total number of clusters. 2. Randomly choose K points as cluster centers. 3. Assign the remaining instances to their closest cluster center. 4. Calculate a new cluster center for each cluster. 5. Repeat steps 3-5 until the cluster centers do not change. Table 3.6 • K-Means Input Values Instance X Y 1 1.0 1.5 2 1.0 4.5 3 2.0 1.5 4 2.0 3.5 5 3.0 2.5 6 5.0 6.0 Table 3.7 • Several Applications of the K-Means Algorithm (K = 2) Outcome Cluster Centers Cluster Points Squared Error 1 (2.67,4.67) 2, 4, 6 14.50 (2.00,1.83) 1, 3, 5 2 (1.5,1.5) 1, 3 15.94 (2.75,4.125) 2, 4, 5, 6 3 (1.8,2.7) 1, 2, 3, 4, 5 9.60 80 General Considerations Requires real-valued data. We must select the number of clusters present in the data. Works best when the clusters in the data are of approximately equal size. Attribute significance cannot be determined. Lacks explanation capabilities. Local beam search Keep track of k states instead of one 012345670123456xf(x)81 Initially: k random states Next: determine all successors of k states If any of successors is goal finished Else select k best from successors and repeat. Major difference with random-restart search Information is shared among k search threads. Can suffer from lack of diversity. Stochastic variant: choose k successors at proportional to state success. Genetic algorithms A successor state is generated by combining two parent states • Operations – Crossover (2 parents -> 2 children) – Mutation (one bit) Basic structure – Create population – Perform crossover & mutation (on fittest) – Keep only fittest children 82 • Children carry parts of their parents’ data • Only “good” parents can reproduce – Children are at least as “good” as parents? • No, but “worse” children don’t last long • Large population allows many “current points” in search – Can consider several regions (watersheds) at once Fitness Function: Value of expression assignment Hard to judge the “quality” Number of clauses that pattern satisfies 83 • Representation – Children (after crossover) should be similar to parent, not random – Binary representation of numbers isn’t good -what happens when you crossover in the middle of a number? – Need “reasonable” breakpoints for crossover (e.g. between R, xcenter and ycenter but not within them) • “Cover” – Population should be large enough to “cover” the range of possibilities – Information shouldn’t be lost too soon – Mutation helps with this issue General Considerations Global optimization is not a guarantee. The fitness function determines the complexity of the algorithm. Explain their results provided the fitness function is understandable. Transforming the data to a form suitable for genetic learning can be a challenge. 84 The General Form: 85 An Example of Crossover The CNF-satisfaction Problem: 86 ONLINE SEARCH AGENT: Offline search vs. online search • Offline search agents – Compute a solution before setting foot in the real world • Online search agents – Interleave computation and action • E.g. takes an action and then observes environments and then computes the next action – Necessary for an exploration problem • States and actions are unknown • E.g. robot in a new building, or labyrinth online search problems • Agents have access to: – ACTIONS(s)-returns a list of actions allowed in states – c(s,a,s’)-this step-cost cannot be used until the agent knows that s’ is the outcome – GOAL-TEST(s) • The agent cannot access the successors of a state except by actually trying all the actions in that state • Assumptions – The agent can recognize a state that it has visited before – Actions are deterministic – Optionally, an admissible heuristic function • Objective: reach goal while minimizing cost • Safely Explorable space – goal can be reached from any reachable state (no dead ends) Performance of Online Search • Competitive Ratio: total path cost actually traverse vs. cost agent would traverse if it knew state space in advance. 87 – best achievable CR is often unbounded (very high, possibly infinite) – adversary argument • Better to evaluate performance relative to size of state space, rather than depth of shallowest goal Robot Maze A simple maze problem , the agent start at S and must reach G ,but knows nothing of the environment. Adversary Arguments: • Imagine an adversary that constructs state space as agent is moving through it and can adjust unexplored space to create worst case behavior. • The space that would be created by this adversary is a space that would yield worst-case behavior. Competitive Ratio: This defines the comparison between the total pathcost with shortest complete exploration pathcost. This ratio should be as small as possible. Adversaries 88 Fig(a): Two state space that might lead on online search agent into a dead end.Any given agent will fail in atleast one of these spaces. Fig(b): A two dimensional environment that can cause an online search agent to follow on arbitrarily inefficient route to the goal. Fig: An environment in which a random will take exponentially many steps to find the goal. 89 online search agents: • Online algorithm can expand only a node that it physically occupies – Offline algorithms can expand any node in fringe • Same principle as DFS Online DFS function ONLINE_DFS-AGENT(s’) return an action input: s’, a percept identifying current state static: result, a table of the next state, indexed by action and state, initially empty unexplored, a stack that lists, for each visited state, the action not yet tried unbacktracked, a stack that lists, for each visited state, the predecessor states to which the agent has not yet backtracked s, a, the previous state and action, initially null if GOAL-TEST(s’) then return stop 90 if s’ is a new state then unexplored[s’] ACTIONS(s’) if s is not null then do result[a,s] s’ add s to the front of unbackedtracked[s’] if unexplored[s’] is empty then if unbacktracked[s’] is empty then return stop else a an action b such that result[b, s’]=POP(unbacktracked[s’]) else a POP(unexplored[s’]) s s’ return a Online DFS algorithm Exploration: o Result[a,s]-records the state resulting from executing action in state S. o An action from the current state has not been explored , then it is explored and return action a. o If there is no action exists from the current and possible to backtracking is done with action b return action a. o If the agent has run out of states to which it can backtrack then its search is complete. Online DFS, example Task: From S to reach G. 91 • GOAL-TEST((1,1))? – s’ != G thus false • (1,1) a new state? – false • s is null? – false (s=(1,2)) – result*DOWN,(1,2)+ ← (1,1) – UB[(1,1)] = {(1,2)} • UX[(1,1)] empty? – False • a=RIGHT, s=(1,1) • return a 92 • GOAL-TEST((2,1))? – s’ != G thus false • (2,1) a new state? – True, UX[(2,1)]={RIGHT,UP,LEFT} • s is null? – false (s=(1,1)) – result*RIGHT,(1,1)+ ← (2,1) – UB[(2,1)]={(1,1)} • UX[(2,1)] empty? – False • a=LEFT, s=(2,1) • return a 93 Online local search • Hill-climbing is already online – One state is stored. • Bad performance due to local maxima – Random restarts impossible. • Solution1: Random walk introduces exploration – Selects one of actions at random, preference to not-yet-tried action – can produce exponentially many steps An environment in which a random walk will take exponentially many steps to find the goal Random Walks • Instead of random restart, we can consider random walks for online search • select a possible action at random, give precedence to untried actions • probability of success 1 for finite space Solution 2: Add memory to hill climber • Store current best estimate H(s) of cost to reach goal • H(s) is initially the heuristic estimate h(s) • Afterward updated with experience (see below) • Learning real-time A* (LRTA*) The current position of agent 94 C(S,a,S’)+H(S’) 1+9 1+2 10 3(min) So move right. Learning real-time A*(LRTA*) Build a map of the environment using result table. H(s) is initially empty,when the process start h(s) is initialized for new states and it is updated when the agent agains experience in the state space. Each state is updated with minimum H(s) out of all possible actions and the same action is returned. function LRTA*-COST(s,a,s’,H) return an cost estimate if s’ is undefined the return h(s) else return c(s,a,s’) + H[s’] function LRTA*-AGENT(s’) return an action input: s’, a percept identifying current state static: result, a table of next state, indexed by action and state, initially empty 95 H, a table of cost estimates indexed by state, initially empty s, a, the previous state and action, initially null if GOAL-TEST(s’) then return stop if s’ is a new state (not in H) then H[s’] h(s’) unless s is null result[a,s] s’ H[s] min LRTA*-COST(s,b,result[b,s],H) b ACTIONS(s) a an action b in ACTIONS(s’) that minimizes LRTA*-COST(s’,b,result[b,s’],H) s s’ return a CONSTRAINT SATISFACTION PROBLEMS (CSPS)(refer pg no 79) • Standard search problem: – state is a "black box“ – any data structure that supports successor function, heuristic function, and goal test • CSP: – state is defined by variables Xi with values from domain Di – goal test is a set of constraints specifying allowable combinations of values for subsets of variables – Simple example of a formal representation language • Allows useful general-purpose algorithms with more power than standard search algorithms Arc consistency: Arc refers to a directed arc in the constraint graph. Arc consistency checking can be applied either as a preprocessing. step before the beginning of the search process(or) propagation step. The process must be applied repeatedly until no more inconsistency remain. Refer idea of arc consistency in Xerox Explain AC-3 algorithm. 96 Path consistency: 3 consistency means that any pair of adjacent variables can always be extended to a third neighboring variable, this is also called path consistency K-consistency: Stronger forms of propagation can be defined using the notation called K-consistency. A CSP is K-consistency if for any set of K-1 variables and for any consistent assignment to those variables, a constant value can always be assigned to any variable. Example: Map-Coloring • Variables WA, NT, Q, NSW, V, SA, T • Domains Di = {red,green,blue} • Constraints: adjacent regions must have different colors • e.g., WA ≠ NT, or (WA,NT) in ,(red,green),(red,blue),(green,red), (green,blue),(blue,red),(blue,green)} 97 • Solutions are complete and consistent assignments, e.g., WA = red, NT = green,Q = red,NSW = green,V = red,SA = blue,T = green Constraint graph • Binary CSP: each constraint relates two variables • Constraint graph: nodes are variables, arcs are constraints Refer pg 80 for explanation 98 Varieties of CSPs • Discrete variables – finite domains: • n variables, domain size d O(dn) complete assignments • e.g., Boolean CSPs, incl.~Boolean satisfiability (NP-complete) – infinite domains: • integers, strings, etc. • e.g., job scheduling, variables are start/end days for each job • need a constraint language, e.g., StartJob1 + 5 ≤ StartJob3 • Continuous variables – e.g., start/end times for Hubble Space Telescope observations – linear constraints solvable in polynomial time by linear programming Varieties of constraints: • Unary constraints involve a single variable, – e.g., SA ≠ green • Binary constraints involve pairs of variables, – e.g., SA ≠ WA – Higher-order constraints involve 3 or more variables, – e.g., cryptarithmetic column constraints Example: Cryptarithmetic 99 • Variables: F T U W R O X1 X2 X3 • Domains: {0,1,2,3,4,5,6,7,8,9} • Constraints: Alldiff (F,T,U,W,R,O) – O + O = R + 10 · X1 – X1 + W + W = U + 10 · X2 – X2 + T + T = O + 10 · X3 – X3 = F, T ≠ 0, F ≠ 0 – Where X1, X2 , X3 are auxilliary variables representing the digit (0&1) carried over into the next column. Higher order constraint can be represented as constraint hypergraph. CSP: CSP are mathematical problem where one must find states or object that satisfy a number of constraint or criteria. A constraint is a restriction of the feasible solutions in an optimization problem Example N-queen problem Cross word puzzle Map coloring problem Real-world CSPs • Assignment problems – e.g., who teaches what class • Timetabling problems – e.g., which class is offered when and where? • Transportation scheduling • Factory scheduling • Notice that many real-world problems involve real-valued variables Standard search formulation (incremental) Let's start with the straightforward approach, then fix it 100 States are defined by the values assigned so far • Initial state: the empty assignment { } • Successor function: assign a value to an unassigned variable that does not conflict with current assignment fail if no legal assignments • Goal test: the current assignment is complete 1. This is the same for all CSPs 2. Every solution appears at depth n with n variables use depth-first search 3. Path is irrelevant, so can also use complete-state formulation 4. b = (n – l)d at depth l, hence n! · dn leaves The result in (4) is grossly pessimistic, because the order in which values are assigned to variables does not matter. There are only dn assignments. BACKTRACKING SEARCH FOR CSP’S Backtracking search: It is used for depth first search that chooses the values for one variable at a time and backtracks when a variable has no legal values left to assign. variable and value ordering propagating information through constraints forward checking constraint propagation handling special constraint intelligent backtracking: looking backward • Variable assignments are commutative}, i.e., [ WA = red then NT = green ] same as [ NT = green then WA = red ] • Only need to consider assignments to a single variable at each node 101 b = d and there are dn leaves • Depth-first search for CSPs with single-variable assignments is called backtracking search • Backtracking search is the basic uninformed algorithm for CSPs • Can solve n-queens for n ≈ 25 Algorithm explanation: VARSELECT-UNASSIGNED-VARIABLE (VARIABLES [CSP], assignment, CSP) SELECT-UNASSIGNED-VARIABLE-at simply select the next unassigned variable in the order given by the list VARIABLES [CSP] Intelligent backtracking: looking backward Chronological backtracking: Backup the preceding variables and try a different value for it called chronological backtracking. Ex: map coloring problem 102 Variable: WA, NT, SA, Q, NSW, V, T Domain d={red, green, blue} Generate the partial assignment {Q=red, NSW=green, V=blue, T=red} Try next variable SA We see that every values violates a constraint. So back up to T and change it color but this is solve the problem have to introduce the new approach here cause the failure set called conflict set set of variables that cause the failure. the set is called conflict set Constraint directed back jumping: When a branch of the search fails back track to one of the set of variables that caused the failure conflict set. the conflict set for the variable X is set of previously assigned variables that are 103 connected to X by constraints. A backtracking alg that was conflict sets defined in this way is called conflict directed backtracking. Constraint Learning: It actually modifies the CSP by adding a new constraint that is induced from these conflicts. Back jumping: this method back tracks to the most recent variable in the conflict set Ex: Back jumping would jump over Tasmania and try a new value for V Backtracking example 104 105 Improving backtracking efficiency • General-purpose methods can give huge gains in speed: – Which variable should be assigned next? – In what order should its values be tried? – Can we detect inevitable failure early? Variable and value ordering: Variable and value ordering MRV/most constraint value Degree heuristic Least constraining value Most constrained variable • Most constrained variable:/minimum remaining values/fail-firtheuristic called MRV heuristic choose the variable with the fewest legal values • a.k.a. minimum remaining values (MRV) heuristic 106 • Tie-breaker among most constrained variables • Most constraining variable: – choose the variable with the most constraints on remaining variables Degree heuristic: The MRV doesn’t help at all in choosing the first region to color in Australia. Because initially every region has three colors. In this case, the degree heuristic used. Used to reduce branching factor on future choices by selecting the variable that is involved in the largest number of constraint on other unassigned variables. Useful in tie-breaker Least constraining value • Given a variable, choose the least constraining value: – the one that rules out the fewest values in the remaining variables – Combining these heuristics makes 1000 queens feasible Propagating information through constraints: Forward checking 107 • Idea: – Keep track of remaining legal values for unassigned variables – Terminate search when any variable has no legal values 108 RGB RGB RGB RGB RGB RGB RGB R GB RGB RGB TGB GB RGB R B G R B RGB B RGB R B G B B RGB The progress of a map coloring search with forward checking. WA=red is assigned first; then forward checking deletes red from the domains of the neighboring variables NT and SA. After Q=green, green is deleted from the domain of NT, SA and NSW. After V=blue, blue is deleted from the domains is NSW and SA, leaving SA with no legal values. Constraint propagation • Forward checking propagates information from assigned to unassigned variables, but doesn't provide early detection for all failures: • NT and SA cannot both be blue! • Constraint propagation repeatedly enforces constraints locally 109 Constraint propagation: consistency techniques: Arc consistency(two consistency) Node consistency(one consistency) Path consistency (or) K-consistency Handling special constraint: Alldiff constraint Resource constraint Bounds propagation Intelligent backtracking: looking backward Chronological backtracking Conflict set Back jumping Conflict-directed back jumping Constraint learning Handling special constraint: Alldiff –all variables involved must have distinct value m-variable n-values so m>n then constraint cannot be satisfied Resource constraint/atmost constraint/higher order constraint In which consistency is achieved by deleting the maximum value of any domain if it is not consistent with minimum values of other domains. Ex: personal assigned task PA1……………P(A4) 110 Constraint no more 10 personal are assigned in total So atmost(10,PA1,PA2,PA3,PA4) Domain(3,4,5,6) Result: the almost constraint cannot be satisfied. Suppose the domain{2,3,4,5,6} The value 5 and 6 can be deleted from each domain. Bound propagation: Domains are represented by lower bound and upper bound and are managed by bound propagation Used practical constraint problem EX: Two flights=271,272 Planes have capacity=165,385 Initial domain Flight271€[0,165] Flight272€[0,385] Additional constraint: Two flights must carry 420 people so flight271+flight272€[420,420] Reduce the domain to Flight271€[35,165] Flight272€[255,385] Finally we say that a CSP is bound constraint if for every variable X, and both the lower bound and upper bound values of X, there exists some value of Y, that satisfies the constraint between X and Y for every variable Y. Arc consistency • Simplest form of propagation makes each arc consistent 111 • X Y is consistent iff for every value x of X there is some allowed y • Simplest form of propagation makes each arc consistent • X Y is consistent iff for every value x of X there is some allowed y • If X loses a value, neighbors of X need to be rechecked 112 • If X loses a value, neighbors of X need to be rechecked • Arc consistency detects failure earlier than forward checking • Can be run as a preprocessor or after each assignment Arc consistency algorithm AC-3 113 • Time complexity: O(n2d3), where n is the number of variables and d is the maximum variable domain size, because: – At most O(n2) arcs – Each arc can be inserted into the agenda (TDA set) at most d times – Checking consistency of each arc can be done in O(d2) time Generalized Arc Consistency Algorithm • Three possible outcomes: 1. One domain is empty => no solution 2. Each domain has a single value => unique solution 3. Some domains have more than one value => there may or may not be a solution • If the problem has a unique solution, GAC may end in state (2) or (3); otherwise, we would have a polynomial-time algorithm to solve UNIQUE-SAT • UNIQUE-SAT or USAT is the problem of determining whether a formula known to have either zero or one satisfying assignments has zero or has one. Although this problem seems easier than general SAT, if there is a practical algorithm to solve this problem, then all problems in NP can be solved just as easily [Wikipedia; L.G. Valiant and V.V. Vazirani, NP is as Easy as Detecting Unique Solutions. Theoretical Computer Science, 47(1986), 85-94.] LOCAL SEARCH FOR CSPS: Use complete state formulation. Initial state-assign a value to every variable Successor function-works by changing the value of each variable Hill-climbing, simulated annealing typically work with "complete" states, i.e., all variables assigned • To apply to CSPs: – allow states with unsatisfied constraints – operators reassign variable values • Variable selection: randomly select any conflicted variable • Value selection by min-conflicts heuristic: – choose value that violates the fewest constraints – i.e., hill-climb with h(n) = total number of violated constraints 114 Algorithm: function MIN-CONFLICTS(csp, max_steps) return solution or failure inputs: csp, a constraint satisfaction problem max_steps, the number of steps allowed before giving up current an initial complete assignment for csp for i = 1 to max_steps do if current is a solution for csp then return current var a randomly chosen, conflicted variable from VARIABLES[csp] value the value v for var that minimize CONFLICTS(var,v,current,csp) set var = value in current return failure Example: 4-Queens • States: 4 queens in 4 columns (44 = 256 states) • Actions: move queen in column • Goal test: no attacks • Evaluation: h(n) = number of attacks • Given random initial state, can solve n-queens in almost constant time for arbitrary n with high probability (e.g., n = 10,000,000) Min-conflicts example 2 115 • Use of min-conflicts heuristic in hill-climbing Min-conflicts example 3 • A two-step solution for an 8-queens problem using min-conflicts heuristic • At each stage a queen is chosen for reassignment in its column • The algorithm moves the queen to the min-conflict square breaking ties randomly Advantages of local search • The runtime of min-conflicts is roughly independent of problem size. – Solving the millions-queen problem in roughly 50 steps. • Local search can be used in an online setting. – Backtrack search requires more time Summary • CSPs are a special kind of problem: 116 – states defined by values of a fixed set of variables – goal test defined by constraints on variable values • Backtracking = depth-first search with one variable assigned per node • Variable ordering and value selection heuristics help significantly • Forward checking prevents assignments that guarantee later failure • Constraint propagation (e.g., arc consistency) does additional work to constrain values and detect inconsistencies • Iterative min-conflicts is usually effective in practice STRUCTURE OF PROBLEMS: Independent sub problems. Connected Components. Tree Algorithm. Constraint graph can be reduced in two ways. 1. Removing nodes. 2. Collapsing nodes together. Removing nodes. 1. Definition. 2. Cut set conditioning. 3. Cycle cut set. 4. Algorithm. Tree decomposition. 1. Definition. 2. It satisfy 3 requirement. 3. Example. 4. Solution. 5. Complexity. 6. Tree width. 117 • How can the problem structure help to find a solution quickly? • Subproblem identification is important: – Coloring Tasmania and mainland are independent subproblems – Identifiable as connected components of constrained graph. • Improves performance • Suppose each problem has c variables out of a total of n. • Worst case solution cost is O(n/c dc), i.e. linear in n – Instead of O(d n), exponential in n • E.g. n= 80, c= 20, d=2 – 280 = 4 billion years at 1 million nodes/sec. – 4 * 220= .4 second at 1 million nodes/sec The complete Algorithm runs in time 0(nd^2) time. Tree-structured CSPs Tree: Any two variables are connected by atmost one path. 118 • Theorem: if the constraint graph has no loops then CSP can be solved in O(nd 2) time • Compare difference with general CSP, where worst case is O(d n) • In most cases subproblems of a CSP are connected as a tree DEFINITION: • Any tree-structured CSP can be solved in time linear in the number of variables. ALGORITHM: – Choose a variable as root, order variables from root to leaves such that every node’s parent precedes it in the ordering. (label var from X1 to Xn) – For j from n down to 2, apply REMOVE-INCONSISTENT-VALUES(Parent(Xj),Xj) – For j from 1 to n assign Xj consistently with Parent(Xj ) Steps: 1. Csp is arc consistent so the assignment of values in step. 2. Requires no backtracking. The arc consistency is applied in reverse order to ensure the consistency of arcs that are processed already. The complete algorithm runs in 0(nd^2) time. 119 General constraint graph can be reduced to trees on two ways .They are 1. Removing nodes-cut set conditioning 2. Collapsing nodes together-Tree decomposition. Cut set conditioning: Finding the smallest cycle cut set is NP-hard but several efficient approximation algorithm are known for this task. The overall algorithmic approach is called cut set conditioning. Removing nodes: This is the first approach involves assigning values to some variables so that the remaining variables from a tree. Tree width: Tree decomposition techniques transform the csp into a tree of subproblem and are efficient if the tree width of the constraint graph is small. Nearly tree-structured CSPs Conditioning: Instantiate a variable, prune its neighbors’ domains Cutset conditioning: Instantiate (in all ways) a set of variables such that the remaining constraint graph is a tree 120 Tree decomposition: Three requirements that must be satisfied Solve each problem independently In any one has no solution the entire problem has no solution Build general solution CSP with constraint graphs of bounded tree width (w) are solvable in polynomial time: O(ndw+1) Finding the tree decomposition with minimal tree width is NP-hard 121 Summary CSPs are a special kind of problem: states defined by values of a fixed set of variables goal test defined by constraints on variable values Backtracking = depth-first search with one variable assigned per node Variable ordering and value selection heuristics help significantly Forward checking prevents assignments that guarantee later failure Constraint propagation (e.g., arc consistency) does additional work to constrain values and detect inconsistencies Tree-structured CSPs can be solved in linear time Iterative min-conflicts is usually effective in practice GAMES: multiagent environment. Game. 122 Game theory 1. call. 2. History of Games. pruning. Evaluation functions. Pruning: It allows us to ignore the portions of the search tree that make no difference to the final choice. Heuristic evaluation function: It allows us to approximate the true utility of a state without doing a complete search. Optimal decision in games: Game’s component. Optimal strategy 1. Strategy. 2. Minimax value. 3. Minimax decision. The minimax algorithm. 1. Minimax algorithm 2. Backed up. Optimal decision in multilayer games ADVERSARIAL SEARCH:/often known as games What are, why study games? • Games are a form of multi-agent environment – What do other agents do and how do they affect our success? – Competitive multi-agent environments give rise to adversarial search (games) • Problem solving agent is not alone any more – Multiagent, conflict • Default: deterministic, turn-taking, two-player, zero sum game of perfect information – Perfect info. vs. imperfect, or probability • Why study games? – Fun; historically entertaining – Interesting because they are hard 123 – Easy to represent,agents restricted to small number of actions Games vs. search problems • Search – no adversary – Solution is (heuristic) method for finding goal – Heuristics and CSP techniques can find optimal solution – Evaluation function: estimate of cost from start to goal through given node – Examples: path planning, scheduling activities • Games – adversary – Solution is a strategy: specifies move for every possible opponent reply – Time limits force an approximate solution – Evaluation function: “goodness” of game position – Examples: chess, checkers, Othello, backgammon Historical Contributions • computer considers possible lines of play – Babbage, 1846 • algorithm for perfect play – Zermelo, 1912, Von Neumann, 1944 • finite horizon, approximate evaluation – Zuse, 1945, Wiener, 1948, Shannon, 1950 • first chess program – Turing, 1951 • machine learning to improve evaluation accuracy – Samuel, 1952-57 • pruning to allow deeper search – McCarthy, 1956 124 Types of Games Optimal Decisions in Games: Game Tree search Or Game formalization • Formally define a two-person game as: • Two players, called MAX and MIN. – Alternate moves – At end of game winner is rewarded and loser penalized. • Game has – Initial State: board position and player to go first : e.g. board configuration of chess – Successor Function: returns (move, state) pairs • All legal moves from the current state • Resulting state – Terminal Test -: Is the game finished? – Utility function for terminal states. numerical value of terminal states. e.g. win (+1), loose (-1) and draw (0) in tic-tac-toe – MAX uses search tree to determine next move 125 Initial state plus legal moves define game tree. Tic-tac-toe: Game tree (2-player, deterministic, turns) Optimal Strategy: • Find the contingent strategy for MAX assuming an infallible MIN opponent. • Assumption: Both players play optimally !! • Given a game tree, the optimal strategy can be determined by using the minimax value of each node: MINIMAX-VALUE(n)= UTILITY(n) If n is a terminal maxs successors(n) MINIMAX-VALUE(s) If n is a max node mins successors(n) MINIMAX-VALUE(s) If n is a min node Two-Ply Game Tree 126 Minimax maximizes the worst-case outcome for max What if MIN does not play optimally? 127 • Definition of optimal play for MAX assumes MIN plays optimally: maximizes worst-case outcome for MAX. • But if MIN does not play optimally, MAX will do even better. [proven.] Minimax Algorithm • Perfect play for deterministic games • Idea: choose move to position with highest minimax value = best achievable payoff against best play • E.g., 2-ply game: Two-Ply Game Tree 128 Minimax maximizes the worst-case outcome for max Minimax-Decision: select all the possible moves, finds the utility values returns. Minimax-value: The moves with high utility value returns the utility value depends on the state. Terminal state. Max player move in the state. Min player move in the state. 129 Properties of minimax • Complete? Yes (if tree is finite) • Optimal? Yes (against an optimal opponent) • Time complexity? O(bm) • Space complexity? O(bm) (depth-first) • For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible • Even tic-tac-toe is much too complex to diagram here, although it's small enough to implement. Optimal Decision in Multiplayer games: • Games allow more than two players • Single minimax values become vectors 130 ALPHA-BETA PRUNING: Alpha-Beta Pruning: Pruning. Alpha-Beta pruning. Example. Two parameters. Effectiveness of alpha beta. Alpha-Beta algorithm. Transposition. Transposition table. Example: Two ply game tree 131 The terminals nodes on the bottom level are already labeled with their utility value. The first Min node labeled B has three successor with values 3,12,8,so its minimax value 3. Similarily the other two min nodes have minimax value 2. The root node is MAX node, its successor have minimax value 3,2,3.so its minimax value is 3. Steps are explained by the following figure: (a)refer the textbook page 168. The outcome of is that we can identify the minimax decision without ever evaluating two of the leaf nodes. Another way: Simplification the formula for MINIMAX-VALUE. let the two unevaluated successors of node C. have values x and y and let z be the minimum of x and y. the value of root node is given by MINIMAX_VALUE(root)=max(min(3,12,8),min(2,x,y),min(14,5,2)) =max(3,min(2,x,y),2) 132 =max(3,2,3) where z<=2 =3 In other words the value of the root and hence the minimax decision are independent of the values of pruned leaves x and y. It can be applied to tree of any depth and possible to prune entire sub tree rather than just leaves. It updates the value of alpha and beta as it goes along and prunes the remaining nodes. The effecitiveness of alpha and beta pruning is highly dependent on the order in which the successors are examined. Transposition: Different permutations of more than end up in the same sequence. Ex.[a1,b1,a2,b2} &[a1,b2,a2,b1] both end up in the same position. To store the evaluation of this position in a hash table that first time is encountered so that we don’t have to recomputed it on subsequent occurrences. Transposition table: The hash table of previously seen position is traditionally called transposition table. Why is it called α-β? α is the value of the best (i.e., highestvalue)choice found sofar at any choice point along the path for max If v is worse than α,max will avoid it prune that branch Define β similarly for min Pruning • Minimax problem: number of games states is exponential in the number of moves. 133 • Alpha-beta pruning – Remove branches that don’t influence final decision The α-β algorithm: Example: 134 135 136 137 138 Properties of α-β • Pruning does not affect final result • Good move ordering improves effectiveness of pruning • With "perfect ordering," time complexity = O(bm/2) doubles depth of search which can be carried out for a given level of resources. • A simple example of the value of reasoning about which computations are relevant (a form of metareasoning) IMPERFECT REAL TIME DECISIONS: Imperfect Real time Decisions: Evaluation function Cut off test Evaluation functions 1. Features. 2. Expected value. 3. Material value. 4. Weighted linear function. Cutting off search 1. Quiescence. 139 2. Quiescence search. 3. Horizon effect. 4. Singular extensions. 5. Forward pruning. Features: Most evaluation function by calculating various features of the state. Ex: chess game Pawn have various features to win,loss,draw. Expected value or weighted average: It can be determined for each category, resulting in an evaluation function that works for any state. The evaluation function need not return actual expected values, as long as the ordering of the state is the same. Ex: experience suggest=72% 72% to win(+1) 20% to loss(-1) 8% to draw(0) (0.72*+1)+(0.20*-1)+(0.08*0)=0.52 Cutting of search: To perform a cut off test, an evaluation function should be applied to positions that are quiexent ,that is a position that will not swing in bad value for long time in the search tree is known as waiting for quiescence. Quiescence search: A search which is restricted to consider only certain type of moves such as capture moves, that will quickly resolves the uncertainties in the position. Suppose we have 100 secs, explore 104 nodes/sec 106 nodes per move Standard approach: 140 • cutoff test: e.g., depth limit (perhaps add quiescence search) • evaluation function = estimated desirability of position Evaluation function • Evaluation function or static evaluator is used to evaluate the “goodness” of a game position. – Contrast with heuristic search where the evaluation function was a non-negative estimate of the cost from the start node to a goal and passing through the given node • The zero-sum assumption allows us to use a single evaluation function to describe the goodness of a board with respect to both players. – f(n) >> 0: position n good for me and bad for you – f(n) << 0: position n bad for me and good for you – f(n) near 0: position n is a neutral position – f(n) = +infinity: win for me – f(n) = -infinity: win for you 141 Min-Max A typical evaluation function is a linear function in which some set of coefficients is used to weight a number of "features" of the board position. • For chess, typically linear weighted sum of features Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s) • e.g., w1 = 9 with f1(s) = (number of white queens) – (number of black queens), etc. 142 • "material", : some measure of which pieces one has on the board. • A typical weighting for each type of chess piece is shown • Other types of features try to encode something about the distribution of the pieces on the board. Cutting off search MinimaxCutoff is identical to MinimaxValue except 1. Terminal? is replaced by Cutoff? 2. Utility is replaced by Eval • Change: – if TERMINAL-TEST(state) then return UTILITY(state) into – if CUTOFF-TEST(state,depth) then return EVAL(state) • Introduces a fixed-depth limit depth – Is selected so that the amount of time will not exceed what the rules of the game allow. • When cutoff occurs, the evaluation is performed. Does it work in practice? bm = 106, b=35 m=4 4-ply lookahead is a hopeless chess player! – 4-ply ≈ human novice – 8-ply ≈ typical PC, human master 143 – 12-ply ≈ Deep Blue, Kasparov The key idea is that the more look ahead we can do, that is, the deeper in the tree we can look, the better our evaluation of a position will be, even with a simple evaluation function. In some sense, if we could look all the way to the end of the game, all we would need is an evaluation function that was 1 when we won and -1 when the opponent won. it seems to suggest that brute-force search is all that matters. • And Deep Blue is brute indeed... It had 256 specialized chess processors coupled into a 32 node supercomputer. It examined around 30 billion moves per minute. The typical search depth was 13ply, but in some dynamic situations it could go as deep as 30. Horizon problem: When the program is facing a move by the opponent that causes serious damage and ulitimately unavoidable. Ex: Beginning of the search-1player A B C D 144 M max -4 0 2 0 -4 This diagram shows the situation of horizon problem that is when one level is generated from B,it causes bad value for B. max 6 0 2 6 2 5 6 7 6 When one more successor level is generated from E and F and situation comes down and the value of B is retained as a good move. The time B is waited for this situation is called waiting for quiescence. Singular Extensions: Used to avoid the horizon effect without adding to much search cost. Forward pruning: Some moves at a given node are pruned immediately without further consideration. Imperfect ,Real-time Decisions The minimax algorithm generates the entire game search space,whereas the alpha-beta algorithm allows us to prune large parts of it. However,alpha-beta still has to search all the way to terminal states for atleast a portion of search space. Shannon’s 1950 paper,Programming a computer for playing chess,proposed that programs should cut off the search earlier and apply a A D C B F E A B D C I F E J H G 145 heuristic evaluation function to states in the search,effectively turning nonterminal nodes into terminal leaves. The basic idea is to alter minimax or alpha-beta in two ways : (1) The utility function is replaced by a heuristic evaluation function EVAL,which gives an estimate of the position’s utility,and (2) the terminal test is replaced by a cutoff test that decides when to apply EVAL. Realtime decisions What if you don’t have enough time to explore entire search tree? • We cannot search all the way down to terminal state for all decision sequences • Use a heuristic to approximate (guess) eventual terminal state Evaluation Function The heuristic that estimates expected utility • Cannot take too long (otherwise recurse to get answer) • It should preserve the ordering among terminal states – otherwise it can cause bad decision making • Define features of game state that assist in evaluation – what are features of chess? Truncating minimax search When do you recurse or use evaluation function? • Cutoff-Test (state, depth) returns 1 or 0 – When 1 is returned, use evaluation function • Cutoff beyond a certain depth • Cutoff if state is stable (more predictable) • Cutoff moves you know are bad (forward pruning) Benefits of truncation Comparing Chess • Using minimax 5 ply • Average Human 6-8 ply 146 • Using alpha-beta 10 ply • Intelligent pruning 14 ply Games that include an element of chance: Position evaluation in games with chance nodes Complexity of expect minimax Card games Practical issues: GAMES THAT INCLUDE AN ELEMENT OF CHANCE: • Backgammon is a two-player game with uncertainty. • Players roll dice to determine what moves to make. • White has just rolled 5 and 6 and has four legal moves: • 5-10, 5-11 • 5-11, 19-24 147 • 5-10, 10-16 • 5-11, 11-16 • Such games are good for exploring decision making in adversarial problems involving skill and luck. • Backgammon: move all one’s pieces off the board • Branches leading from each chance node denote the possible dice rolls – Labeled with roll and the probability 148 • [1,1], [6,6] chance 1/36, all other chance 1/18 • Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16) • Cannot calculate definite minimax value, only expected value Decision-Making in Non-Deterministic Games • Probable state tree will depend on chance as well as moves chosen • Add "chance" notes to the max and min nodes. • Compute expected values for chance nodes. Game Trees with Chance Nodes • Chance nodes (shown as circles) represent random events • For a random event with N outcomes, each chance node has N distinct children; a probability is associated with each • (For 2 dice, there are 21 distinct outcomes) • Use minimax to compute values for MAX and MIN nodes • Use expected values for chance nodes • For chance nodes over a max node, as in C: expectimax(C) = ∑i(P(di) * maxvalue(i)) • For chance nodes over a min node: expectimin(C) = ∑i(P(di) * minvalue(i)) 149 Meaning of the evaluation function • Dealing with probabilities and expected values means we have to be careful about the “meaning” of values returned by the static evaluator. • Note that a “relative-order preserving” change of the values would not change the decision of minimax, but could change the decision with chance nodes. • Linear transformations are OK 150 Expected minimax value EXPECTED-MINIMAX-VALUE(n) = UTILITY(n) If n is a terminal maxs successors(n) MINIMAX-VALUE(s) If n is a max node mins successors(n) MINIMAX-VALUE(s) If n is a max node s successors(n) P(s) * EXPECTEDMINIMAX(s) If n is a chance node These equations can be backed-up recursively all the way to the root of the game tree. Position evaluation with chance nodes: • Left, A1 is best • Right, A2 is best 151 • Outcome of evaluation function (hence the agent behavior) may change when values are scaled differently. • Behavior is preserved only by a positive linear transformation of EVAL. 152 153 154 155 Summary • Games illustrate several important points about AI – Perfection is unattainable approximation – Good idea what to think about what to think about – Uncertainty constrains the assignment of values to states • Games : AI as grand prix racing : automobile design Glossary of Terms A constraint satisfaction problem is a problem in which the goal is to choose a value for each of a set of variables, in such a way that the values all obey a set of constraints. A constraint is a restriction on the possible values of two or more variables. For example, a constraint might say that A=a is not allowed in conjunction with B=b. Backtracking search is a form of DFS in which there is a single representation of the state that gets updated for each successor, and then must be restored when a dead end is reached. A directed arc from variable A to variable B in a CSP is arc consistent if, for every value in the current domain of A, there is some consistent value of B. Backjumping is a way of making backtracking search more efficient by jumping back more than one level when a dead end is reached. Min-conflicts is a heuristic for use with local search on CSP problems. The heuristic says that, when given a variable to modify, choose the value that conflicts with the fewest number of other variables. 156 UNIT-3 KNOWLEDGE REPRESENTATION FIRST ORDER LOGIC: Pros and cons of propositional logic: Propositional logic is declarative Propositional logic allows partial/disjunctive/negated information – (unlike most data structures and databases) Propositional logic is compositional: – meaning of B1,1 P1,2 is derived from meaning of B1,1 and of P1,2 Meaning in propositional logic is context-independent – (unlike natural language, where meaning depends on context) Propositional logic has very limited expressive power – (unlike natural language) – E.g., cannot say "pits cause breezes in adjacent squares“ • except by writing one sentence for each square First-order logic • Whereas propositional logic assumes the world contains facts, • first-order logic (like natural language) assumes the world contains – Objects, which are things with individual identities – Properties of objects that distinguish them from other objects – Relations that hold among sets of objects – Functions, which are a subset of relations where there is only one “value” for any given “input” • Examples: – Objects: Students, lectures, companies, cars ... – Relations: Brother-of, bigger-than, outside, part-of, has-color, occurs-after, owns, visits, precedes, ... – Properties: blue, oval, even, large, ... 157 – Functions: father-of, best-friend, second-half, one-more-than ... Logics • Logics are characterized by what they commit to as "primitives". • Ontology • a “specification of a conceptualization” • A description of the objects and relationships that can exist • Propositional logic had only true/false relationships • First-order logic has many more relationships • The ontological commitment of languages is different • How much can you infer from what you know? • Temporal logic defines additional ontological commitments because of timing constraints Higher-order logic First-order logic is “first” because you relate objects (the first-order entities that actually exist in the world) • There are 10 chickens… chickens.number=10 • There are 10 ducks… ducks.number=10 You cannot build relationships between relations or functions • There are as many chickens as ducks… chickens.number = ducks.number • the number of objects belonging to a group must be a property of the group, and not the objects themselves • Cannot represent Leibniz’s law: If x and y share all properties, x is y Another characterization of a logic Epistemological commitments • The possible states of knowledge permitted with respect to each fact – In first-order logic, each sentence is a statement that is True, false, or unknown 158 Formal structure of first-order logic Models of first-order logic contain: • A set of objects (its domain) – Alice, Alice’s left arm, Bob, Bob’s hat • Relationships between objects – Represented as tuples Sibling (Alice, Bob), Sibling (Bob, Alice) On head (Bob, hat) Person (Bob), Person (Alice) – Some relationships are functions if a given object is related to exactly one object in a certain way Alice -> Alice’s left arm Names of things are abitrary • Knowledge base adds meaning Number of possible domain elements is unbounded • Number of models is unbounded – Checking enumeration by entailment is impossible 159 Logic What Exists in World Knowledge States Propositional Facts true/false/unknown First-Order facts, objects, relations true/false/unknown Temporal facts, objects, relations, times true/false/unknown Probability Theory Facts degree of belief 0..1 Fuzzy Facts degree of truth 0..1 SYNTAX AND SEMANTICS OF FOL: Syntax – Rules for constructing legal sentences in the logic – Which symbols we can use (English: letters, punctuation) – How we are allowed to combine symbols Semantics – How we interpret (read) sentences in the logic – Assigns a meaning to each sentence Example: “All lecturers are seven foot tall” – A valid sentence (syntax) – And we can understand the meaning (semantics) – This sentence happens to be false (there is a counterexample) 160 MODELS FOR FOL: Example: 161 • Model: – an interpretation of a set of sentences such that every sentence is True • Domain M: the set of all objects in the world (of interest) • Model contains objects (domain elements) and relations among them SYMBOLS AND INTERPRETATION: interpretations specify meaning of each symbol (intended interpretation) Interpretation specifies referents for constant symbols → objects predicate symbols → relations function symbols → functional relations • Constant symbols, which represent individuals in the world – Mary – 3 – Green • Function symbols, which map individuals to individuals – father-of(Mary) = John – color-of(Sky) = Blue • Predicate symbols, which map individuals to truth values – greater(5,3) – green(Grass) – color(Grass, Green) Syntax of FOL: Basic elements • Constants KingJohn, 2, NUS,... • Predicates Brother, >,... • Functions Sqrt, LeftLegOf,... • Variables x, y, a, b,... • Connectives , , , , 162 • Equality = • Quantifiers , Terms: • A term denotes an object in the world. – Constant: JohnPaxton, 2, Bozeman, … – Variable: x, y, a, b, c, … – Function(Term1, …, Termn): Sqrt(9), Distance(Bozeman,Missoula) • is a relation for which there is one answer • maps one or more objects to another single object • can refer to an unnamed object: e.g. Car(Neal) • represents a model/user defined functional relation • A ground term is a term with no variables. Atomic Sentences: • An atom/literal is smallest expression to which a truth value can be assigned. – Predicate(Term1, …, Termn): Teacher(John, Neal) • maps one or more objects to a truth value • represents a model/user defined truth relation – Term1 = Term2: Car(Neal) = GMC_Truck Car(Neal) = InGarage(Neal) • represents the equality relation when two terms refer to the same object • is a predicate in prefix form: =(Term1, Term2) 163 Complex Sentences: • A sentence represents a fact in the world that is assigned a truth value. • Ex: Brother(Richard, John) Brother(John,Richard) King(Richard) V King(John) King(Richard) => King(John) Quantifiers: Quantifiers qualify values of variables – True for all objects (Universal ): X. likes(X, apples) – Exists at least one object (Existential ): X. likes(X, apples) Universal Quantifiers: Universal quantification is rarely used to make blanket statements about every individual in the world: Universal quantifier: " Means the sentence holds true for all values of x in the domain of variable x. Main connective typically Þ forming if-then rules – All humans are mammals. Becomes what in FOL? 164 "x Human(x) Þ Mammal(x) for all x if x is a human then x is a mammal – Mammals must have fur. Becomes what in FOL? "x Mammal(x) Þ HasFur(x) for all x if x is a mammal then x has fur A common mistake to avoid: • Typically, is the main connective with • Common mistake: using as the main connective with : x At(x,NUS) Smart(x) means “Everyone is at NUS and everyone is smart” Existential Quantifiers: Existential quantifiers are usually used with “and” to specify a list of properties about an individual Existential quantifier: $ – Means the sentence holds true for some value of x in the domain of variable x. – Main connective typically Ù – Some humans are old. $x Human(x) Ù Old(x) there exist an x such that x is a human and x is old – Mammals may have arms. $x Mammal(x) Ù HasArms(x) there exist an x such that x is a mammal and x has arms Another common mistake to avoid: • Typically, is the main connective with • Common mistake: using as the main connective with : x At(x,NUS) Smart(x) is true if there is anyone who is not at NUS! 165 Connections between All and Exists: DeMorgan’s and Quantifiers We can relate sentences involving and using De Morgan’s laws: • Example: no one likes everyone – ~ (x)(y)likes(x,y) – (x)(y)~likes(x,y) Nested Quantifiers; Building more complex sentences – Everybody loves somebody – There is someone who is loved by everyone Use unique variable names and parentheses when appropriate Equality: • term1 = term2 is true under a given interpretation if and only if term1 and term2 refer to the same object • E.g., definition of Sibling in terms of Parent: x,y Sibling(x,y) [(x = y) m,f (m = f) Parent(m,x) Parent(f,x) Parent(m,y) Parent(f,y)] 166 Properties of quantifiers: • x y is the same as y x • x y is the same as y x • x y is not the same as y x • x y Loves(x,y) – “There is a person who loves everyone in the world” • y x Loves(x,y) – “Everyone in the world is loved by at least one person” • Quantifier duality: each can be expressed using the other • x Likes(x,IceCream) x Likes(x,IceCream) • x Likes(x,Broccoli) x Likes(x,Broccoli) USING FIRST ORDER LOGIC: Assertion s and queries in FOL; • Suppose a wumpus-world agent is using an FOL KB and perceives a smell and a breeze (but no glitter) at t=5: Tell(KB,Percept([Smell,Breeze,None],5)) Ask(KB,a BestAction(a,5)) • I.e., does the KB entail some best action at t=5? • Answer: Yes, {a/Shoot} ← substitution (binding list) • Given a sentence S and a substitution σ, • Sσ denotes the result of plugging σ into S; e.g., S = Smarter(x,y) σ = {x/Hillary,y/Bill} Sσ = Smarter(Hillary,Bill) • Ask(KB,S) returns some/all σ such that KB╞ σ 167 The kinship domain: Brothers are siblings x,y Brother(x,y) Sibling(x,y) One's mother is one's female parent m,c Mother(c) = m (Female(m) Parent(m,c)) “Sibling” is symmetric x,y Sibling(x,y) Sibling(y,x) • Axioms are facts and rules which are known (or assumed) to be true facts and concepts about a domain. – Mathematicians don't want any unnecessary (dependent) axioms --ones that can be derived from other axioms. – Dependent axioms can make reasoning faster, however. – Choosing a good set of axioms for a domain is a kind of design problem. • A definition of a predicate is of the form “P(x) <=> S(x)” (define P(x) by S(x)) and can be decomposed into two parts – Necessary description: “P(x) => S(x)” (only if) – Sufficient description “P(x) <= S(x)” (if) – Some concepts don’t have complete definitions (e.g. person(x)) • A theorem S is a sentence that logically follows the axiom set A, i.e. A |= S. The set domain: • s Set(s) (s = {} ) (x,s2 Set(s2) s = {x|s2}) • x,s {x|s} = {} • x,s x s s = {x|s} • x,s x s [ y,s2} (s = {y|s2} (x = y x s2))] • s1,s2 s1 s2 (x x s1 x s2) • s1,s2 (s1 = s2) (s1 s2 s2 s1) • x,s1,s2 x (s1 s2) (x s1 x s2) • x,s1,s2 x (s1 s2) (x s1 x s2) 168 KNOWLEDGE ENGINEERING IN FIRST ORDER LOGIC: 1. Identify the task 2. Assemble the relevant knowledge 3. Decide on a vocabulary of predicates, functions, and constants 4. Encode general knowledge about the domain 5. Encode a description of the specific problem instance 6. Pose queries to the inference procedure and get answers 7. Debug the knowledge base The electronic circuits domain: One-bit full adder 1. Identify the task – Does the circuit actually add properly? (circuit verification) 2. Assemble the relevant knowledge – Composed of wires and gates; Types of gates (AND, OR, XOR, NOT) – Irrelevant: size, shape, color, cost of gates 3. Decide on a vocabulary – Alternatives: Type(X1) = XOR Type(X1, XOR) 169 XOR(X1) 4. Encode general knowledge of the domain – t1,t2 Connected(t1, t2) Signal(t1) = Signal(t2) – t Signal(t) = 1 Signal(t) = 0 – 1 ≠ 0 – t1,t2 Connected(t1, t2) Connected(t2, t1) – g Type(g) = OR Signal(Out(1,g)) = 1 n Signal(In(n,g)) = 1 – g Type(g) = AND Signal(Out(1,g)) = 0 n Signal(In(n,g)) = 0 – g Type(g) = XOR Signal(Out(1,g)) = 1 Signal(In(1,g)) ≠ Signal(In(2,g)) – g Type(g) = NOT Signal(Out(1,g)) ≠ Signal(In(1,g)) 5. Encode the specific problem instance Type(X1) = XOR Type(X2) = XOR Type(A1) = AND Type(A2) = AND Type(O1) = OR Connected(Out(1,X1),In(1,X2)) Connected(In(1,C1),In(1,X1)) Connected(Out(1,X1),In(2,A2)) Connected(In(1,C1),In(1,A1)) Connected(Out(1,A2),In(1,O1)) Connected(In(2,C1),In(2,X1)) Connected(Out(1,A1),In(2,O1)) Connected(In(2,C1),In(2,A1)) Connected(Out(1,X2),Out(1,C1)) Connected(In(3,C1),In(2,X2)) Connected(Out(1,O1),Out(2,C1)) Connected(In(3,C1),In(1,A2)) 6. Pose queries to the inference procedure – What are the possible sets of values of all the terminals for the adder circuit? – i1,i2,i3,o1,o2 Signal(In(1,C_1)) = i1 Signal(In(2,C1)) = i2 Signal(In(3,C1)) = i3 Signal(Out(1,C1)) = o1 Signal(Out(2,C1)) = o2 7. Debug the knowledge base – May have omitted assertions like 1 ≠ 0 170 Inference in First-Order Logic: PROPOSITIONAL VS.FIRST-ORDER INFERENCE • FOL (like propositional logics) is declarative, compositional, and the meaning of its sentences is context-independent • Propositional logic assumes the world contains facts • FOL (like natural language) assumes the world contains (ontological commitment) – Objects: people, houses, numbers, colors, baseball games, wars, … – Relations: red, round, prime, brother of, bigger than, part of, comes between, … – Functions: father of, best friend, one more than, plus, … • FOL can express facts about some or all the objects in the universe • Every sentence in FOL is either true, false or unknown to an agent (epistemological commitment) Inference with Quantifiers: Two Basic Ideas for Inference in FOL 1. Grounding I. Treat first-order sentences as a template. II. Instantiating all variables with all possible constants gives a set of ground propositional clauses. III. Apply efficient propositional solvers, e.g. SAT. 2. Lifted Inference: I. Generalize propositional methods for 1st-order methods. II. Unification: ecognize instances of variables where necessary. Universal instantiation (UI): Notation: Subst({v/g}, α) means the result of substituting g for v in sentence α Every instantiation of a universally quantified sentence is entailed by it: 171 for any variable v and ground term g E.g., x King(x) Greedy(x) Evil(x) yields King(John) Greedy(John) Evil(John), {x/John} King(Richard) Greedy(Richard) Evil(Richard), {x/Richard} King(Father(John)) Greedy(Father(John)) Evil(Father(John)), {x/Father(John)} Existential instantiation (EI): For any sentence α, variable v, and constant symbol k (that does not appear elsewhere in the knowledge base): E.g., x Crown(x) OnHead(x,John) yields: Crown(C1) OnHead(C1,John) where C1 is a new constant symbol, called a Skolem constant Existential and universal instantiation allows to “propositionalize” any FOL sentence or KB EI produces one instantiation per EQ sentence UI produces a whole set of instantiated sentences per UQ sentence Reduction to propositional inference: • Replace each universally quantified sentence by all possible instantiations – All X man(X) mortal(X) replaced by – man(tom) mortal(tom) – man(chocolate) mortal(chocolate) • Problem: when the KB includes a function symbol, the set of term substitutions is infinite. father(father(father(tom))) … • Herbrand 1930: if a sentence is entailed by the original FO KB, then there is a proof using a finite subset of the propositionalized KB Idea: For n = 0 to ∞ do 172 create a propositional KB by instantiating with depth-$n$ terms see if α is entailed by this KB Example x King(x) Greedy(x) Evil(x) Father(x) King(John) Greedy(Richard) Brother(Richard,John) Query Evil(X)? • Since any subset has a maximum depth of nesting in terms, we can find the subset by generating all instantiations with constant symbols, then all with depth 1, and so on Ex: Suppose the KB contains the following: x King(x) Greedy(x) Evil(x) Father(x) King(John) Greedy(John) Brother(Richard,John) Instantiating the universal sentence in all possible ways, we have: King(John) Greedy(John) Evil(John) King(Richard) Greedy(Richard) Evil(Richard) King(John) Greedy(John) Brother(Richard,John) The new KB is propositionalized: propositional symbols are King(John), Greedy(John), Evil(John), King(Richard), etc We have an approach to FO inference via propositionalization that is complete: any entailed sentence can be proved 173 Entailment for FOPC is semi-decidable: algorithms exist that say yes to every entailed sentence, but no algorithm exists that also says no to every nonentailed sentence. Our proof procedure could go on and on, generating more and more deeply nested terms, but we will not know whether it is stuck in a loop, or whether the proof is just about to pop out UNIFICATION AND LIFTING: • Unification: The process of finding all legal substitutions that make logical expressions look identical • This is a recursive algorithm A first order inference rule: • We can infer this fact immediately if we can find a substitution θ such that King(x) and Greedy(x) match King(John) and Greedy(y) • A substitution provides a binding list θ = {x/John,y/John} works Generalized Modus Ponens (GMP) • This is a general inference rule for FOL that does not require instantiation • Given: p1’, p2’ … pn’ (p1 … pn) q Subst(θ, pi’) = subst(θ, pi) for all p • Conclude: Subst(θ, q) Lifting: • GMP “lifts” MP from propositional to first-order logic • Key advantage of lifted inference rules over propositionalization is that they make only substitutions which are required to allow particular inferences to proceed 174 Unification: • Process of finding all legal substitutions • Key component of all FO inference algorithms • Unify(p,q) = theta, where Subst(theta,p) == Subst(theta,q) Assuming all variables universally quantified • We can get the inference immediately if we can find a substitution θ such that King(x) and Greedy(x) match King(John) and Greedy(y) • θ = {x/John, y/John} works • Unify(a,b) = θ if a θ = b θ Standardizing apart: Standardizing apart eliminates overlap of variables, e.g., Knows(z17,OJ)……… • knows(john,X). • knows(X,elizabeth). • These ought to unify, since john knows everyone, and everyone knows elizabeth. • Rename variables to avoid such name clashes Note: all X p(X) == all Y p(Y) All X (p(X) ^ q(X)) == All X p(X) ^ All Y p(Y) Most general unifier (MGU) • There is a single most general unifier (MGU) that is unique up to renaming of variables. MGU = { y/John, x/z } • The Unify algorithm returns a MGU L1 = p(X,f(Y),b) L2 = p(X,f(b),b) Subst1 = {a/X, b/Y} Result1 = p(a,f(b),b) Subst2 = {b/Y} 175 Result2 = p(X,f(b),b) Subst1 is more restrictive than Subst2. In fact, Subst2 is a MGU of L1 and L2. The unification algorithm Storage and Retrieval: Storage(s): stores a sentence s into the knowledge base Fetch(q): returns all unifiers such that the query q unifies with some sentence. Simple naïve method. Keep all facts in knowledge base in one long list and then call unify(q,s) for all sentences to do fetch. Inefficient but works Unification is only attempted on sentence with chance of unification. (knows(john, x) , brother(richard,john)) 176 Predicate indexing If many instances of the same predicate exist (tax authorities employer(x,y)) Also index arguments Keep latice p280 FORWARD CHAINING: • Forward Chaining – Start with atomic sentences in the KB and apply Modus Ponens in the forward direction, adding new atomic sentences, until no further inferences can be made. First order definite clauses: • The law says that it is a crime for an American to sell weapons to hostile nations. The country Nono, an enemy America, has some missiles, and all of its missiles were sold to it by Col. West, who is an American. • Prove that Col. West is a criminal. • …it is a crime for an American to sell weapons to hostile nations American(x) Weapon(y) Sells(x,y,z) Hostile(z) Criminal(x) • Nono…has some missiles x Owns(Nono, x) Missiles(x) Owns(Nono, M1) and Missle(M1) • …all of its missiles were sold to it by Col. West x Missle(x) Owns(Nono, x) Sells( West, x, Nono) • Missiles are weapons Missle(x) Weapon(x) • An enemy of America counts as “hostile” Enemy( x, America ) Hostile(x) • Col. West who is an American American( Col. West ) • The country Nono, an enemy of America Enemy(Nono, America) 177 FC: Example Knowledge Base Properties of forward chaining Sound and complete for first-order definite clauses Datalog = first-order definite clauses + no functions FC terminates for Datalog in finite number of iterations May not terminate in general if α is not entailed 178 Efficient forward chaining: Incremental forward chaining: no need to match a rule on iteration k if a premise wasn't added on iteration k-1 match each rule whose premise contains a newly added positive literal Matching itself can be expensive: Database indexing allows O(1) retrieval of known facts e.g., query Missile(x) retrieves Missile(M1) Forward chaining is widely used in deductive databases • Order conjuncts appropriately – E.g. most constrained variable • Don’t generate redundant facts; each new fact should depend on at least one newly generated fact. – Production systems – RETE matching – CLIPS Diff(wa,nt) Diff(wa,sa) Diff(nt,q) Diff(nt,sa) Diff(q,nsw) Diff(q,sa) Diff(nsw,v) Diff(nsw,sa) Diff(v,sa) Colorable() 179 Diff(Red,Blue) Diff (Red,Green) Diff(Green,Red) Diff(Green,Blue) Diff(Blue,Red) Diff(Blue,Green) Colorable() is inferred iff the CSP has a solution CSPs include 3SAT as a special case, hence matching is NP-hard RETE Network: • Based only on Left Sides (conditions) of rules • Each condition (test) appears once in the network • Tests with “AND” are connected with “JOIN” – Join means all tests work with same bindings • Each time a fact comes in… – Update bindings for the relevant node (s) – Update join(s) below those bindings – Note new rules satisfied • Each processing cycle – Choose a satisfied rule Why RETE is Efficient: • Rules are “pre-compiled” • Facts are dealt with as they come in 180 – Only rules connected to a matching node are considered – Once a test fails, no nodes below are considered – Similar rules share structure • In a typical system, when rules “fire”, new facts are created /deleted incrementally – This incrementally adds /deletes rules (with bindings) to the conflict set CLIPS: • CLIPS is another forward-chaining production system • Important commands – (assert fact) (deffacts fact1 fact2 … ) – (defrule rule-name rule) – (reset) -eliminates all facts except “initial-fact” – (load file) (load-facts file) – (run) – (watch all) – (exit) Incremental forward chaining: no need to match a rule on iteration k if a premise wasn't added on iteration k-1 – match each rule whose premise contains a newly added positive literal – Magic Sets: rewriting the rule set, using information from the goal, so that only relevant bindings are considered during forward chaining Magic_Criminal(x) American(x) Weapon(y) Sells(x,y,z) Hostile(z) Criminal(x) add Magic_Criminal(West) to the KB Matching itself can be expensive – Database indexing allows O(1) retrieval of known facts: for a given fact it is possible to construct indices on all possible queries that unify with it – Subsumption lattice can get very large; the costs of storing and maintaining the indices must not outweigh the cost for facts retrieval. Backward chaining algorithm 181 SUBST(COMPOSE(θ1, θ2), p) = SUBST(θ2, SUBST(θ1, p)) Backward chaining example 182 183 184 Properties of backward chaining • Depth-first recursive proof search: space is linear in size of proof • Incomplete due to infinite loops – fix by checking current goal against every goal on stack • Inefficient due to repeated subgoals (both success and failure) – fix using caching of previous results (extra space) • Widely used for logic programming Logic programming: Prolog • Algorithm = Logic + Control • Basis: backward chaining with Horn clauses + bells & whistles Widely used in Europe, Japan (basis of 5th Generation project) Compilation techniques 60 million LIPS 185 • Program = set of clauses = head :-literal1, … literaln. criminal(X) :-american(X), weapon(Y), sells(X,Y,Z), hostile(Z). • Depth-first, left-to-right backward chaining • Built-in predicates for arithmetic etc., e.g., X is Y*Z+3 • Built-in predicates that have side effects (e.g., input and output • predicates, assert/retract predicates) • Closed-world assumption ("negation as failure") – e.g., given alive(X) :-not dead(X). – alive(joe) succeeds if dead(joe) fails Prolog • Appending two lists to produce a third: append([],Y,Y). append([X|L],Y,[X|Z]) :-append(L,Y,Z). • query: append(A,B,[1,2]) ? • answers: A=[] B=[1,2] A=[1] B=[2] A=[1,2] B=[] Resolution: brief summary • Full first-order version: l1 ··· lk, m1 ··· mn (l1 ··· li-1 li+1 ··· lk m1 ··· mj-1 mj+1 ··· mn)θ where Unify(li, mj) = θ. • The two clauses are assumed to be standardized apart so that they share no variables. • For example, Rich(x) Unhappy(x) Rich(Ken) Unhappy(Ken) with θ = {x/Ken} 186 • Apply resolution steps to CNF(KB α); complete for FOL Conversion to CNF • Everyone who loves all animals is loved by someone: x [y Animal(y) Loves(x,y)] [y Loves(y,x)] • 1. Eliminate biconditionals and implications x [y Animal(y) Loves(x,y)] [y Loves(y,x)] • 2. Move inwards: x p ≡ x p, x p ≡ x p x [y (Animal(y) Loves(x,y))] [y Loves(y,x)] x [y Animal(y) Loves(x,y)] [y Loves(y,x)] x [y Animal(y) Loves(x,y)] [y Loves(y,x)] 3. Standardize variables: each quantifier should use a different one x [y Animal(y) Loves(x,y)] [z Loves(z,x)] 3. Skolemize: a more general form of existential instantiation. Each existential variable is replaced by a Skolem function of the enclosing universally quantified variables: x [Animal(F(x)) Loves(x,F(x))] Loves(G(x),x) 5. Drop universal quantifiers: [Animal(F(x)) Loves(x,F(x))] Loves(G(x),x) 6. Distribute over : [Animal(F(x)) Loves(G(x),x)] [Loves(x,F(x)) Loves(G(x),x)] 187 Resolution proof: definite clauses 188 UNIT-4 LEARNING LEARNING FROM OBSERVATIONS: Introduction: What is learning? Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more effectively the next time (Simon, 1983). Learning is making useful changes in our minds (Minsky, 1985). Learning is constructing or modifying representations of what is being experienced (Michalski, 1986). A computer program learns if it improves its performance at some task through experience (Mitchell, 1997). So what is learning? (1) acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality); (2) discover new knowledge and theories (by creating hypotheses that explain some data or phenomena); (3) acquire skills (by gradually improving their motor or cognitive skills through repeated practice, sometimes involving little or no conscious thought). (4) Learning results in changes in the agent (or mind) that improve its competence and/or efficiency. (5) Learning is essential for unknown environments, (1) i.e., when designer lacks omniscience (1) Learning is useful as a system construction method, (2) i.e., expose the agent to reality rather than trying to write it down (1) Learning modifies the agent's decision mechanisms to improve performance 189 FORMS OF LEARNING: Learning agents: • Four Components 1. Performance Element: collection of knowledge and procedures to decide on the next action. E.g. walking, turning, drawing, etc. 2. Learning Element: takes in feedback from the critic and modifies the performance element accordingly. 3. Critic: provides the learning element with information on how well the agent is doing based on a fixed performance standard. E.g. the audience 4. Problem Generator: provides the performance element with suggestions on new actions to take. 190 Components of the Performance Element • A direct mapping from conditions on the current state to actions • Information about the way the world evolves • Information about the results of possible actions the agent can take • Utility information indicating the desirability of world states Learning element • Design of a learning element is affected by – Which components of the performance element are to be learned – What feedback is available to learn these components – What representation is used for the components Type of feedback: – Supervised learning: correct answers for each example – Unsupervised learning: correct answers not given – Reinforcement learning: occasional rewards LEARNING DECISION TREES: • Come up with a set of attributes to describe the object or situation. • Collect a complete set of examples (training set) from which the decision tree can derive a hypothesis to define (answer) the goal predicate. Decision Tree Example: Problem: decide whether to wait for a table at a restaurant, based on the following attributes: 1. Alternate: is there an alternative restaurant nearby? 2. Bar: is there a comfortable bar area to wait in? 3. Fri/Sat: is today Friday or Saturday? 4. Hungry: are we hungry? 5. Patrons: number of people in the restaurant (None, Some, Full) 6. Price: price range ($, $$, $$$) 7. Raining: is it raining outside? 191 8. Reservation: have we made a reservation? 9. Type: kind of restaurant (French, Italian, Thai, Burger) 10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60) Logical Representation of a Path r [Patrons(r, full) Wait_Estimate(r, 10-30) Hungry(r, yes)] Will_Wait(r) 192 Expressiveness of Decision Trees • Any Boolean function can be written as a decision tree • E.g., for Boolean functions, truth table row → path to leaf: • Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic in x) but it probably won't generalize to new examples • Prefer to find more compact decision trees Limitations – Can only describe one object at a time. – Some functions require an exponentially large decision tree. • E.g. Parity function, majority function • Decision trees are good for some kinds of functions, and bad for others. • There is no one efficient representation for all kinds of functions. Principle Behind the Decision-Tree-Learning Algorithm • Uses a general principle of inductive learning often called Ockham’s razor: “The most likely hypothesis is the simplest one that is consistent with all observations.” • Decision trees can express any function of the input attributes. Decision tree learning Algorithm: • Aim: find a small tree consistent with the training examples • Idea: (recursively) choose "most significant" attribute as root of (sub)tree 193 Choosing an attribute tests: • Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative" • Patrons? is a better choice Attribute-based representations • Examples described by attribute values (Boolean, discrete, continuous) • E.g., situations where I will/won't wait for a table: • Classification of examples is positive (T) or negative (F) 194 Using information theory • To implement Choose-Attribute in the DTL algorithm • Information Content (Entropy): I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi) • For a training set containing p positive examples and n negative examples: Information gain • A chosen attribute A divides the training set E into subsets E1, … , Ev according to their values for A, where A has v distinct values. • Information Gain (IG) or reduction in entropy from the attribute test: • Choose the attribute with the largest IG npnnpnnppnppnpnnppI22loglog),(viiiiiiiiinpnnppInpnpAremainder1),()()(),()(AremaindernpnnppIAIG195 • For the training set, p = n = 6, I(6/12, 6/12) = 1 bit • Consider the attributes Patrons and Type (and others too): • Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root Assessing the performance of the learning algorithm: • A learning algorithm is good if it produces hypotheses that do a good job of predicating the classifications of unseen examples • Test the algorithm’s prediction performance on a set of new examples, called a test set. Methodology in Accessing Performance 1. Collect a large set of examples. 2. Divide it into 2 disjoint set: the training set and the test set. It is very important that these 2 sets are separate so that the algorithm doesn’t cheat. Usually this division of examples is done randomly. 3. Use the learning algorithm with the training set as examples to generate a hypothesis H. 4. Measure the percentage of examples in the test set that are correctly classified by H. 5. Repeat steps 1 to 4 for different sizes of training sets and different randomly selected training sets of each size. bits 0)]42,42(124)42,42(124)21,21(122)21,21(122[1)(bits 0541.)]64,62(126)0,1(124)1,0(122[1)(IIIITypeIGIIIPatronsIG196 Analyzing the Results • How do we know that h ≈ f ? 1. Use theorems of computational/statistical learning theory 2. Try h on a new test set of examples (use same distribution over example space as training set) Learning curve = % correct on test set as a function of training set size Overfitting: • Overfitting is what happens when a learning algorithm finds meaningless “regularity” in the data. • Caused by irrelevant attributes. • Solution: decision tree pruning. – Resulting decision tree is. • Smaller. • More tolerant to noise. • More accurate in its predictions. 197 Summary • Learning needed for unknown environments, lazy designers • Learning agent = performance element + learning element • For supervised learning, the aim is to find a simple hypothesis approximately consistent with training examples • Decision tree learning using information gain • Learning performance = prediction accuracy measured on test set Ensemble Learning So far – learning methods that learn a single hypothesis, chosen form a hypothesis space that is used to make predictions. Ensemble learning selects a collection (ensemble) of hypotheses and combine their predictions. Example 1 -generate 100 different decision trees from the same or different training set and have them vote on the best classification for a new example. Key motivation: reduce the error rate. Hope is that it will become much more unlikely that the ensemble of will misclassify an example. Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different learning algorithms. Combine decisions of multiple definitions, e.g. using weighted voting. 198 Value of Ensembles “No Free Lunch” Theorem – No single algorithm wins all the time! When combing multiple independent and diverse decisions each of which is at least more accurate than random guessing, random errors cancel each other out, correct decisions are reinforced. Examples: Human ensembles are demonstrably better – How many jelly beans in the jar?: Individual estimates vs. group average. – Who Wants to be a Millionaire: Audience vote. Example: Weather Forecast Intuitions Majority vote Suppose we have 5 completely independent classifiers… – If accuracy is 70% for each • (.75)+5(.74)(.3)+ 10 (.73)(.32) • 83.7% majority vote accuracy – 101 such classifiers 199 • 99.9% majority vote accuracy Note: Binomial Distribution: The probability of observing x heads in a sample of n independent coin tosses, – where in each toss the probability of heads is p, is Another way of thinking about ensemble learning: way of enlarging the hypothesis space, i.e., the ensemble itself is a hypothesis and the new hypothesis space is the set of all possible ensembles constructible form hypotheses of the original space. Increasing power of ensemble learning: Three linear threshold hypothesis (positive examples on the non-shaded side); Ensemble classifies as positive any example classified positively be all three. The resulting triangular region hypothesis is not expressible in the original hypothesis space. Different Learners Different learning algorithms Algorithms with different choice for parameters Data set with different features Data set = different subsets 200 Homogenous Ensembles: Use a single, arbitrary learning algorithm but manipulate training data to make it learn multiple models. – Data1 Data2 … Data m – Learner1 = Learner2 = … = Learner m Different methods for changing training data: – Bagging: Resample training data – Boosting: Reweight training data In WEKA, these are called meta-learners, they take a learning algorithm as an argument (base learner) and create a new learning algorithm. Boosting: Strong and Weak Learners Strong Learner Objective of machine learning – Take labeled data for training – Produce a classifier which can be arbitrarily accurate Weak Learner – Take labeled data for training – Produce a classifier which is more accurate than random guessing Weak Learner: only needs to generate a hypothesis with a training accuracy greater than 0.5, i.e., < 50% error over any distribution Learners – Strong learners are very difficult to construct – Constructing weaker Learners is relatively easy Questions: Can a set of weak learners create a single strong learner ? YES Boost weak classifiers to a strong learner Originally developed by computational learning theorists to guarantee performance improvements on fitting training data for a weak learner that only needs to generate a hypothesis with a training accuracy greater than 0.5 (Schapire, 1990). 201 Revised to be a practical algorithm, AdaBoost, for building ensembles that empirically improves generalization performance (Freund & Shapire, 1996). Key Insights Instead of sampling (as in bagging) re-weigh examples! Examples are given weights. At each iteration, a new hypothesis is learned (weak learner) and the examples are reweighted to focus the system on examples that the most recently learned classifier got wrong. Final classification based on weighted vote of weak classifiers Adaptive Boosting: Each rectangle corresponds to an example, with weight proportional to its height. Crosses correspond to misclassified examples. Size of decision tree indicates the weight of that hypothesis in the final ensemble. Construct Weak Classifiers: Using Different Data Distribution – Start with uniform weighting – During each step of learning • Increase weights of the examples which are not correctly learned by the weak learner • Decrease weights of the examples which are correctly learned by the weak learner 202 Idea – Focus on difficult examples which are not correctly classified in the previous steps Combine Weak Classifiers: Weighted Voting – Construct strong classifier by weighted voting of the weak classifiers Idea – Better weak classifier gets a larger weight – Iteratively add weak classifiers • Increase accuracy of the combined classifier through minimization of a cost function High Level Description: C =0; /* counter*/M = m; /* number of hypotheses to generate*/1 Set same weight for all the examples (typically each example has weight = 1); 2 While (C < M) 2.1 Increase counter C by 1. 2.2 Generate hypothesis hC . 2.3 Increase the weight of the misclassified examples in hypothesis hC 3 Weighted majority combination of all M hypotheses (weights according to how well it performed on the training set). Many variants depending on how to set the weights and how to combine the hypotheses. ADABOOST quite popular!!!! Performance of Adaboost: Learner = Hypothesis = Classifier Weak Learner: < 50% error over any distribution 203 M number of hypothesis in the ensemble. If the input learning is a Weak Learner, then ADABOOST will return a hypothesis that classifies the training data perfectly for a large enough M, boosting the accuracy of the original learning algorithm on the training data. Strong Classifier: thresholded linear combination of weak learner outputs. Restaurant Data 204 A LOGICAL FORMULATION OF LEARNING: – Look at inductive learning generally – Find a logical description that is equivalent to the (unknown) evaluation function • Make our hypothesis more or less specific to match the evaluation function. Hypothesis Space • Consistency of hypotheses with examples • False negative, hypothesis says negative but actually is positive, need to generalize • False positive, hypothesis says positive but actually is negative, need to specialize 205 Current-best-hypothesis Search – Maintain a single hypothesis throughout. – Update the hypothesis to maintain consistency as a new example comes in. • Positive example: an instance of the hypothesis • Negative example: not an instance of the hypothesis • False negative example: the hypothesis predicts it should be a negative example but it is in fact positive • False positive example: should be positive but it is actually negative. Current-best-hypothesis Search Algorithm 1. Pick a random example to define the initial hypothesis 2. For each example, – In case of a false negative: • Generalize the hypothesis to include it 206 – In case of a false positive: • Specialize the hypothesis to exclude it 3. Return the hypothesis How to Generalize a) Replacing Constants with Variables: Object(Animal,Bird) Object (X,Bird) b) Dropping Conjuncts: Object(Animal,Bird) & Feature(Animal,Wings) Object(Animal,Bird) c) Adding Disjuncts: Feature(Animal,Feathers) Feature(Animal,Feathers) v Feature(Animal,Fly) d) Generalizing Terms: Feature(Bird,Wings) Feature(Bird,Primary-Feature) How to Specialize a) Replacing Variables with Constants: Object (X, Bird) Object(Animal, Bird) b) Adding Conjuncts: Object(Animal,Bird) Object(Animal,Bird) & Feature(Animal,Wings) c) Dropping Disjuncts: Feature(Animal,Feathers) v Feature(Animal,Fly) Feature(Animal,Fly) d) Specializing Terms: Feature(Bird,Primary-Feature) Feature(Bird,Wings) What do all these mean? Generalize and Specialize • Must be consistent with all other examples • Non-deterministic – At any point there may be several possible specializations or generalizations that can be applied. Potential Problem of Current-best-hypothesis Search • Extension made not necessarily lead to the simplest hypothesis. • May lead to an unrecoverable situation where no simple modification of the hypothesis is consistent with all of the examples. • The program must backtrack to a previous choice point. 207 Problem of Backtracking • Require large space to store all examples • Need to check all previous instances after each modification of the hypothesis. • Search and check all these previous instances over again after each modification is very expensive Least-commitment search • Incremental approach such that consistency is guaranteed without backtracking • Partial ordering on the hypothesis space – Generalization/specialization – G-set, most general boundary – S-set, most specific boundary Version Space Learning Algorithm • Least-Commitment Search • No backtracking • Key idea: – Maintain the most general and specific hypotheses at any point in learning. Update them as new examples come in. Definitions • Version space: a set of all hypotheses consistent with examples seen so far • Boundary sets: sets of hypotheses defining boundary on which hypotheses are consistent with examples – Most general (G-set) and most specific (S-set) boundary sets Requirement • Usually requires an enormous number of hypotheses to record. • An assumption: a partial ordering (more-specific-than ordering) exists on all of the hypotheses in the space – hierarchical • Boundary sets circumscribing the space of possible hypotheses. – G-set(the most general boundary) – S-set (the most specific boundary) 208 False positive for Si of S-set, Si too general so remove False negative for Si of S-set, Si too specific so replace with immediate generalizations False positive for Gi of G-set, Gi too general so replace with immediate specializations False negative for Gi of G-set, Gi too specific so remove Version Space Learning Algorithm 209 1. Initially, the G-set is True, and the S-set is False 2. For each new example, there are 6 possible cases: a) false positive for Si in S • Si is too general -no consistent specializations. • Throw it out. b) false negative for Si in S • Si is too specific. • Replace it with its generalizations. c) false positive for Gi in G • Gi is too general. • Replace it with its specializations. d) false negative for Gi in G • Gi is too specific -no consistent generalizations. • Throw it out. e) Si more general than some other hypothesis in S or G • Throw it out. f) Gi more specific than some other hypothesis in S or G • Throw it out. 3. Repeat the process until one of three things happens: a) Only one hypothesis left in the version space. • This is the answer we want. b) The version space collapses, i.e. either G or S becomes empty. • This means there are no consistent hypotheses. c) We run out of examples while the version space still has several hypotheses. • Use their collective evaluation (breaking disagreements with majority vote). Advantages of the Algorithm • Never favor one possible hypothesis over another; all remaining hypotheses are consistent • Never require backtracking Potential Problems Does not deal with noise – Not very practical in real-world learning problem – Noise, version space will collapse 210 – How to specify S-set and G-set, if unlimited then S-set is disjunction of examples and G-set is negations S-set and G-set can grow exponentially KNOWLEDGE IN LEARNING: 211 EXPLANATION-BASED LEARNING • Extract general rules from examples • Basic idea – Given an example, construct a proof for the goal predicate that applies using the background knowledge. – In parallel, construct a generalized proof with variabilized goal. – Construct a new rule, LHS with the leaves of the proof tree and RHS with the variabilized goal. – Drop any conditions that are always true regardless of value of variables in the goal. 212 213 Any partial subtree can be use for the extracted general rule, how to choose? Efficiency, Operationality, Generality – Too many rules slows down reasoning – Rules should provide speed increase by eliminating dead-ends and shortening the proof – As general as possible to cover the most cases • Tradeoffs, how to maximize the efficiency of the knowledge base? INDUCTIVE LOGIC PROGRAMMING • Combine inductive methods with FOL • Rigorous approach • Offers complete algorithm • Hypotheses are relatively easy to read • Because FOL, we are learning relationships(predicates) not just attributes of objects. • ILP can generate new predicates, constructive induction algorithm ILP Example • Know Father, Mother, Married, Male, Female, Parent • Might want to learn Grandparent or Ancestor – Grandparent(Mum, Charles) – ¬Grandparent(Mum, Harry) Top-down Inductive Learning: 214 ILP with Inverse Resolution • “run the proof backward” • Instead of C1 and C2 resolve to C • Take C and produce C1 and C2 such that they resolve together. • Alternatively take C and C1 and produce C2 Inverse Resolution Example 215 If Classifications follow from B^H^D, then we can prove this by resolution with refutation (completeness). If we run the proof backwards, we can find a H such that the proof goes through. C -> C1 and C2 C and C2 -> C1 Generating inverse proofs A family tree example (Fig 19.13) Inverse resolution involves search Each inverse resolution step is nondeterministic For any C and C1, there can be many C2 Discovering new knowledge with IR It’s not easy -a monkey and a typewriter Discovering new predicates with IR Fig 19.14 The ability to use background knowledge provides significant advantages Top-down learning (FOIL): A generalization of DT induction to the first-order case by the same author of C4.5 Starting with a general rule and specialize it to fit data 216 Now we use first-order literals instead of attributes, and H is a set of clauses instead of a decision tree. Example: =>grandfather(x,y) (page 701) positive and negative examples adding literals one at a time to the left-hand side e.g., Father (x,y) => Grandfather(x,y) How to choose literal? (Algorithm on page 702) the rule should agree with some + examples, none of – examples FOIL removes the covered + examples, repeats 217 UNIT-5 APPLICATION Communication and Language Communication Communication is the intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs. Most animals use signs to represent important messages: food here, predator nearby etc. In a partially observable world, communication can help agents be successful because they can learn information that is observed or inferred by others. Communication as Action: Speech Act: The action available to an agent to produce language includes talking, typing, sign language, etc. Speaker -An agent that produces a speech act Hearer -An agent that receives a speech act Speaker, hearer, and utterance are generic terms referring to any mode of communication. Why? To change the actions of other agents 218 Speech acts The term word is used to refer to any kind of conventional communicative sign. The capabilities gained by an agent from speech act Why would agents choose to perform a speech act? To be able to: • Inform, Query, Answer, Request or Command, Promise, Acknowledge and Share Transferring Information to Hearer: Inform: each other about the part of the world each has explored, so other agent has less exploring to do. Ex. There’s a breeze in 3 4. Answer: questions. This is a kind of informing. Ex. Yes, I smelled the wumpus in 2 3. Acknowledge: requests and offers. Ex. Okay. Share: feelings and experiences with each other. Ex. You know, that wumpus sure needs deodorant. Make the Hearer take some action: Promise: to do things or offer deals. Ex. I’ll shoot the wumpus if you let me share the gold. 219 Query: other agents about particular aspects of the world. Ex. Have you smelled the wumpus anywhere? Request or Command: other agents to perform actions. It can be seen as impolite to make direct requests, so often an indirect speech act (a request in the form of a statement or question) is used instead. Ex. I could use some help carrying this or Could you please help me carry this? Difficulties with Communication Speaking: When is a speech act called for? Which speech act, out of all the possibilities is the right one? Nondeterminism Understanding: Given ambiguous inputs, what state of the world could have created these inputs? FUNDAMENTALS OF LANGUAGE A formal language is defined as a (possibly infinite) set of' strings. Each string is a concatenation of terminal symbols, sometimes called words. For example, Lisp, C++, first order logic, etc. Formal languages such as first-order logic and Java have strict mathematical definitions. Formal languages always have an official grammar, specified in manuals or books. 220 Natural Languages: Languages that humans use to talk to one another. Ex: Chinese, Danish, English, etc. Natural languages have no official grammar, but linguists strive to discover properties of the language by a process of scientific inquiry and then to codify their discoveries in a grammar. A grammar is a finite set of rules that specifies a language. The component steps of communication A typical communication episode, in which speaker S wants to inform hearer H about proposition P using words W, is composed of seven processes: 1) Intention. Somehow, speaker S decides that there is some proposition P that is worthsaying to hearer H. For our example, the speaker has the intention of having the hearer know that the wumpus is no longer alive. 2) Generation. The speaker plans how to turn the proposition P into an utterance that makes it likely that the hearer, upon perceiving the utterance in the current situation, can infer the meaning P (or something close to it). Assume that the speaker is able to come up with the words "The wumpus is dead," and call this W. 3) Synthesis. The speaker produces the physical realization W' of the words W. Thiscan be via ink on paper, vibrations in air, or some other medium. we show the agent synthesizing a string of sounds W' written in the phonetic alphabet "[thaxwahmpaxsihzdehd]." The words are run together; this is typical of quickly spoken speech. 4) Perception. H perceives the physical realization W' as Wi and decodes it as the wordsW2. When the medium is speech, the perception step is called speech recognition; when it is printing, it is called optical character recognition. 221 5) Analysis. H infers that W2 has possible meanings PI, . . . , P,. We divide analysis into three main parts: a) syntactic interpretation (or parsing), b) Semantic interpretation, and c)Pragmatic interpretation. Parsing is the process of building a parse tree for an input string, as shown in Figure . 222 The interior nodes of the parse tree represent phrases and the leaf nodes represent words. Semantic interpretation is the process of extracting the meaning of an utterance as an expression in some representation language. Figure 22.1 shows two possible semantic interpretations: that the wumpus is not alive and that it is tired (a colloquial meaning of dead). Utterances with several possible interpretations are said to be ambiguous. Pragmatic interpretation takes into account the fact that the same words can have different meanings in different situations. 6) Disambiguation. H infers that S intended to convey P, (where ideally P, = P). Most speakers are not intentionally ambiguous, but most utterances have several feasible interpretations.. Analysis generates possible interpretations; if more than one Interpretation is found, then disambiguation chooses the one that is best. 223 7) Incorporation. H decides to believe P, (or not). A totally naive agent might believe everything it hears, but a sophisticated agent treats the speech act as evidence for P,, not confirmation of it. Putting it all together, we get the agent program shown in Figure 22.2. Here the agent acts as a robot slave that can be commanded by a master. On each turn, the slave will answer a question or obey a command if the master has made one, and it will believe any statements made by the master. It will also comment (once) on the current situation if it has nothing more pressing to do, and it will plan its own action if left alone. Here is a typical dialog: ROBOT SLAVE MASTER I feel a breeze. Go to 12. Nothing is here. Go north. I feel a breeze and I smell a stench and I see a glitter. Grab the gold. Fig 22.1 shows the seven processes involved in communication,using the example sentence “The wumpus is dead”. A FORMAL GRAMMAR FOR A FRAGMENT OF ENGLISH:224 The next step is to combine the words into phrases. We will use five nonterminal symbols to define the different kinds of phrases: sentence (S) noun phrase (NP) verb phrase (VP) prepositional phrase (PP) relative clause.' 225 The following figure shows a grammar for ἐo , with an example for each rewrite rule. ἐo generates good English sentences such as the following: John is in the pit The wumpus that stinks is in 2 2 Mary is in Boston and john stinks. SYNTACTIC ANALYSING (PARSING) Parsing is defined as the process of finding a parse tree for a given input string. That is, a call to the parsing function PARSE, such as should return a parse tree with root S whose leaves are "the wumpus is dead" and whose internal nodes are nonterminal symbols from the grammar Eo. Parsing can be seen as a process of searching for a parse tree. 226 There are two extreme ways of specifying the search space (and many variants in between).or Parsing Algorithms: There are many algorithms for parsing Top-down parsing • Starting with an S and expanding accordingly Bottom-up parsing start with the words and search for a tree with root S Combination of top-down and bottom-up Dynamic programming techniques • Avoids inefficiencies of backtracking Top-down parsing: It can be precisely defined as a search problem as follows: The initial state is a parse tree consisting of the root S and unknown children: [S: ?I. In general, each state in the search space is a parse tree. The successor function selects the leftmost node in the tree with unknown children. o It then looks in the grammar for rules that have the root label of the node on the left-hand side. o For each such rule, it creates a successor state where the ? is replaced by a list corresponding to the right-hand side of the rule. The Goal test checks that the leaves of the parse tree correspond exactly to the input string, with no unknowns and no uncovered inputs. Bottom-up Parsing: The bottom-up parsing as a search problem. The formulation of bottom-up parsing as a search is as follows: The initial state is a list of the words in the input string, each viewed as a parse tree that is just a single leaf node-for example; [the, wumlpus, is, dead]. In general, each state in the search space is a list of parse trees. The successor function looks at every position i in the list of trees and at every right and side of a rule in the grammar. If the subsequence of the list of trees starting at i matches the right-hand side, then the subsequence is replaced by a new tree whose category is the left-227 hand side of the rule and whose children are the subsequence. By "matches," we mean that the category of the node is the same as the element in the right hand side. For example, the rule Article + the matches the subsequence consisting of the first node in the list [the, wumpus, is, dead], so a successor state would be [[Article:the], wumpus, is, dead]. The goal test checks for a state consisting of a single tree with root S. Bottom-up Parse (example) Bottom-Up parsing algorithm: function BOTTOM-UP-PARSE(words, grammar) returns a parse tree forest ¬ words loop do if LENGTH(forest) = 1 and CATEGORY(forest[1]) = START(grammar) then return forest[1] else i ¬ choose from {1…LENGTH(forest)} rule ¬ choose from RULES(grammar) n ¬ LENGTH(RULE-RHS(rule)) subsequence ¬ SUBSEQUENCE(forest, i, i+n-1) if MATCH(subsequence,RULE-RHS(rule)) then forest[i…i+n-1] ¬ [MAKE-NODE(RULE-LHS(rule) , subsequence)] else fail end EFFICIENT PARSING: Forward Chaining on graph search problem is an example of dynamic programming. Solutions to the sub problems are constructed incrementally from those of smaller sub problems and are cached to avoid recomputation. 228 Augmented Grammars: Definite Clause Grammar (DCG) Problems with Backus-Naur Form (BNF) Need meaning Context sensitive Introduction of First Order Logic BNF First Order Logic S ® NP VP NP(s1) /\ VP(s2) Þ S(Append(s1 ,s2)) Noun ® stench | … (s=“stench” \/…) Þ Noun(s) DCG Notation Positive: Easy to describe grammars Negative: More verbose than BNF 3 Rules: • The notation X ® Y Z … translate as Y(s1) /\ Z(s2)…Þ X(Append(s1, s2,…). • The notation X ® word translates as X(*“word”+). • The notation X ® Y | Z | … translates as Y’(s) \/Z’(s) \/…Þ X(s), where Y’ is the translation into logic of the DCG expression Y. Extending the Notation Non-terminal symbols can be augmented A variable can appear on RHS An arbitrary logical test can appear on RHS DCG First Order Logic 229 Digit(sem) ® sem { 0 £ sem £ 9 } Number(sem) ® Digit(sem) Number(sem) ® Number(sem1) Digit(sem2) {sem = 10 ´ sem1 + sem2} (s=[sem]) Þ Digit(sem , s) Digit(sem , s) Þ Number(sem , s) Number(sem , s1) /\ Digit(sem , s2) /\ sem = 10 ´ sem1 + sem2 Þ Number(sem , Append(s1 , s2) Verb Subcategorization Now have slight improvement Must create a sub-categorization list Verb Subcats Example give [NP , PP] [NP , NP] give the gold in 3,3 to me give me the gold smell [NP] [Adjective] [PP] smell a wumpus smell awful smell like a wumpus is [Adjective] [PP] [NP] Is smelly is in 2 2 is a pit believe [S] Believe the smelly wumpus in 2 2 is dead Generative Capacity of Augmented Grammars. Parse Tree 230 SEMANTIC INTERPRETATION Semantic Interpretation: Responsible for combining meanings compositionally to get a set of possible interpretations Formal Languages Compositional Semantics: The semantics of any phrase is a function of its subphrases • X + Y We can handle an infinite grammar with a finite set of rules Natural Languages Appears to have a noncompositional semantics • “The batter hit the ball” Semantic interpretation alone cannot be certain of the right interpretation of a phrase or sentence Semantics as DCG Augmentation The same idea used to specify the semantics of numbers and digits can applied to the complete language of mathematics 231 Exp(sem) –> Exp(sem1) Operator(op) Exp(sem2) {sem = Apply(op, sem1, sem2)} Exp(sem) –> ( Exp(sem) ) Exp(sem) –> Number(sem) Digit(sem) –> sem { 0 £ sem £ 9 } Number(sem) –> Digit(sem) Number(sem) –> Number(sem1) Digit(sem2) { sem = 10 × sem1 + sem2 } Operator(sem) –> sem { sem Î {+,–,×,÷}} The Semantics of E1 Semantic structure is very different from syntactic structure. We use an intermediate form called a quasi-logical form which uses a new construction which we will call a quantified term. “every agent” -> [" a Agent(a)] “Every agent smells a wumpus” $ e (e Î Perceive([" a Agent(a)], [$ w Wumpus(w)],Nose) Ù During(Now, e)) Pragmatic Interpretation Through semantic interpretation, an agent can perceive a string of words and use a grammar to derive a set of possible semantic interpretations. Now we address the problem of completing the interpretation by adding information about the current situation Information which is noncompositional and context-dependant Indexicals: Phrases that refer directly to the current situation “I am in Boston today.” Anaphora: Phrases referring to objects that have been mentioned previously “John was hungry. He entered a restaurant.” “After John proposed to Marsha, they found a preacher and got married. For the honeymoon, they went to Hawaii.” 232 Deciding which reference is the right one is a part of disambiguation. Ambiguity and Disambiguation The biggest problem in communicative exchange is that most utterances are ambiguous. Squad helps dog bite victim. Red-hot star to wed astronomer. Helicopter powered by human flies. Once-sagging cloth diaper industry saved by full dumps. Ambiguity Lexical Ambiguity a word has more than one meaning Syntactic Ambiguity (Structural Ambiguity) more than one possible parse for the phrase Semantic Ambiguity follows from lexical or syntactic ambiguity Referential Ambiguity semantic ambiguity caused by anaphoric expressions Pragmatic Ambiguity Speaker and hearer disagree on what the current situation is. Local Ambiguity A substring can be parsed several ways. Vagueness Natural languages are also vague • “It’s hot outside.” 233 Disambiguation Disambiguation is a question of diagnosis. Models of the world are used to provide possible interpretations of a speech act. Models of the speaker Models of the hearer It is difficult to pick the right interpretation because there may be several right ones. In general, disambiguation requires the combination of four models: the world model the mental model the language model the acoustic model Natural language often uses deliberate ambiguity. Most language understanding programs ignore this possibility Context free grammars do not provide a very useful language model. Probabilistic context-free grammars (PCFG’s) each rewrite rule has a probability associated with it S –> NP VP (0.9) S –> S Conjunction S (0.1) Discourse Understanding: A discourse is any string of language-usually one that is more than one sentence long. Textbooks, novels, weather reports and conversations are all discourses. So far we have largely ignored the problems of discourse, preferring to dissect language into individual sentences that can be studied in vitro. 234 We will look at two particular subproblems: reference resolution coherence. Reference resolution Reference resolution is the interpretation of a pronoun or a definite noun phrase that refers to an object in the world. The resolution is based on knowledge of the world and of the previous parts of the discourse. Consider the passage "John flagged down the waiter. He ordered a hani sandwich." To understand that "he" in the second sentence refers to John, we need to have understood that the first sentence mentions two people and that John is playing the role of a customer and hence is likely to order, whereas the waiter is not. The structure of coherent discourse If you open up this book to 10 random pages, and copy down the first sentence from each page. The result is bound to be incoherent. Similarly, if you take a coherent 10-sentence passage and permute the sentences, the result is incoherent. This demonstrates that sentences in natural language discourse are quite different from sentences in logic. In logic, if we TELL sentences A, B and C to a knowledge base, in any order, we end up with the conjunction A A B A C. In natural language, sentence order matters; consider the difference between "Go two blocks. Turn right." and "Turn right. Go two blocks." Grammar Induction: Grammar induction is the task of learning a grammar from data. It is an obvious task to attempt, given that it has proven to be so difficult to construct a grammar by hand and that billions of example utterances are available for free on the Internet. It is a difficult task because the space of possible grammars is infinite and because verifying that a given grammar generates a set of sentences is computationally expensive. Grammar induction can learn a grammar from examples, although there are limitations on how well the grammar will generalize. 235 PERCEPTION: Contents • Perception • Image Formation • Image Processing • Computer Vision • Representation and Description • Object Recognition Perception • Perception provides an agent with information about the world they inhabit – Provided by sensors • Anything that can record some aspect of the environment and pass it as input to a program – Simple 1 bit sensors…Complex human retina • There are basically two approaches for perception – Feature Extraction • Detect some small number of features in sensory input and pass them to their agent program • Agent program will combine features with other information • “bottom up” – Model Based • Sensory stimulus is used to reconstruct a model of the world • Start with a function that maps from a state of the world to a stimulus • “top down” • S = g(W) – Generating S from g and a real or imaginary world W is accomplished by computer graphics 236 • W = g-1(S) – Computer vision is in some sense the inverse of computer graphics • But not a proper inverse… – We cannot see around corners and thus we cannot recover all aspects of the world from a stimulus • In reality, both feature extraction and model-based approaches are needed – Not well understood how to combine these approaches – Knowledge representation of the model is the problem A Roadmap of Computer Vision 237 Computer Vision Systems Image Formation • An image is a rectangular grid of data of light values – Commonly known as pixels – Pixel values can be… 238 – Binary – Gray scale – Color – Multimodal • Many different wavelengths (IR, UV, SAR, etc) 239 240 • I(x,y,t) is the intensity at (x,y) at time t • CCD camera has approximately 1,000,000 pixels • Human eyes have approximately 240,000,000 “pixels” – i.e. 0.25 terabits /second • Read pages 865-869 in textbook “lightly” 241 Image Processing • Image processing operations often apply a function to an image and the result is another image – “Enhance the image” in some fashion – Smoothing – Histogram equalization – Edge detection • Image processing operations can be done in either the spatial domain or the frequency domain 242 243 • Image data can be represented in a spatial domain or a frequency domain • The transformation from the spatial domain to the frequency domain is accomplished by the Fourier Transform • By transforming image data to the frequency domain, it is often less computationally demanding to perform image processing operations 244 245 • Low Pass Filter 246 – Allows low frequencies to pass • High Pass Filter – Allows high frequencies to pass • Band Pass Filter – Allows frequencies in a given range to pass • Notch Filter – Suppresses frequencies in a range (attenuate) • High frequencies are more noisy – Similar to the “salt and pepper” fleck on a TV – Use a low pass filter to remove the high frequencies from an image – Convert image back to spatial domain – Result is a “smoothed image” 247 • Image enhancement can be done with high pass filters and amplifying the filter function – Sharper edges 248 • Transforming images to the frequency domain was (and is still) done to improve computational efficiency – Filters were just like addition and subtraction • Now computers are so fast that filter functions can be done in the spatial domain – Convolution • Convolution is the spatial equivalent to filtering in the frequency domain – More computation involved 249 • By changing the size and the values in the convolution window different filter functions can be obtained • AFTER PERFORMING IMAGE ENHANCEMENT, THE NEXT STEP IS USUALLY TO DETECT EDGES IN THE IMAGE – EDGE DETECTION – USE THE CONVOLUTION ALGORITHM WITH EDGE DETECTION FILTERS TO FIND VERTICAL AND HORIZONTAL EDGES COMPUTER VISION • ONCE EDGES ARE DETECTED, WE CAN USE THEM TO DO STEREOSCOPIC PROCESSING, DETECT MOTION, OR RECOGNIZE OBJECTS • SEGMENTATION IS THE PROCESS OF BREAKING AN IMAGE INTO GROUPS, BASED ON SIMILARITIES OF THE PIXELS 250 251 252 253 COMPUTER VISION 254 255 REPRESENTATION AND DESCRIPTION 256 257 258 COMPUTER VISION • CONTOUR TRACING • CONNECTED COMPONENT ANALYSIS – WHEN CAN WE SAY THAT 2 PIXELS ARE NEIGHBORS? – IN GENERAL, A CONNECTED COMPONENT IS A SET OF BLACK PIXELS, P, SUCH THAT FOR EVERY PAIR OF PIXELS PI AND PJ IN P, THERE EXISTS A SEQUENCE OF PIXELS PI, ..., PJ SUCH THAT: • ALL PIXELS IN THE SEQUENCE ARE IN THE SET P I.E. ARE BLACK, AND • EVERY 2 PIXELS THAT ARE ADJACENT IN THE SEQUENCE ARE "NEIGHBORS" 259 REPRESENTATION AND DESCRIPTION • TOPOLOGICAL DESCRIPTORS – “RUBBER SHEET DISTORTION” • DONUT AND COFFEE CUP – NUMBER OF HOLES – NUMBER OF CONNECTED COMPONENTS – EULER NUMBER • E = C -H 260 • EULER FORMULA W – Q + F = C – H • W IS NUMBER OF VERTICES • Q IS NUMBER OF EDGES • F IS NUMBER OF FACES • C IS NUMBER OF COMPONENTS • H IS NUMBER OF HOLES 7 – 11 + 2 = 1 – 3 = -2 261 OBJECT RECOGNITION • L-JUNCTION – A VERTEX DEFINED BY ONLY TWO LINES…THE ENDPOINTS TOUCH • Y-JUNCTION – A THREE LINE VERTEX WHERE THE ANGLE BETWEEN EACH OF THE LINES AND THE OTHERS IS LESS THAN 180O • W-JUNCTION – A THREE LINE VERTEX WHERE ONE OF THE ANGLES BETWEEN ADJACENT LINE PAIRS IS GREATER THAN 180O • T-JUNCTION – A THREE LINE VERTEX WHERE ONE OF THE ANGLES IS EXACTLY 180O • AN OCCLUDING EDGE IS MARKED WITH AN ARROW, – HIDES PART FROM VIEW • A CONVEX EDGE IS MARKED WITH A PLUS, + 262 – POINTING TOWARDS VIEWER • A CONCAVE EDGE IS MARKED WITH A MINUS, -– POINTING AWAY FROM THE VIEWER 263 264 • SHAPE CONTEXT MATCHING – BASIC IDEA: CONVERT SHAPE (A RELATIONAL CONCEPT) INTO A FIXED SET OF ATTRIBUTES USING THE SPATIAL CONTEXT OF EACH OF A FIXED SET OF POINTS ON THE SURFACE OF THE SHAPE. 265 • EACH POINT IS DESCRIBED BY ITS LOCAL CONTEXT HISTOGRAM – (NUMBER OF POINTS FALLING INTO EACH LOG-POLAR GRID BIN) • DETERMINE TOTAL DISTANCE BETWEEN SHAPES BY SUM OF DISTANCES FOR CORRESPONDING POINTS UNDER BEST MATCHING 266 SUMMARY • COMPUTER VISION IS HARD!!! – NOISE, AMBIGUITY, COMPLEXITY • PRIOR KNOWLEDGE IS ESSENTIAL TO CONSTRAIN THE PROBLEM • NEED TO COMBINE MULTIPLE CUES: MOTION, CONTOUR, SHADING, TEXTURE, STEREO • “LIBRARY" OBJECT REPRESENTATION: SHAPE VS. ASPECTS • IMAGE/OBJECT MATCHING: FEATURES, LINES, REGIONS, ETC. 267 ARTIFICIAL INTELLIGENCE QUESTION BANK UNIT-1 TWO MARKS: 1. Define AI? 2. What is meant by Turing Test? 3. What are the capabilities the computer would have during Turing Test? 4. Define Agent? 5. Define Rational Agent? 6. Define the following terms a) Percept b) Percept Sequence c) Agent Function d) Agent Program 7. Define Task Environment? 8. Describe PEAS with automated taxi driver? 9. List out the properties of task environment? 10. List out the four basic kinds of agent program? 11. What are the four components available in Learning Agent? 12. Define Problem Solving Agent? 13. List out the steps used to maximize the performance measure? 14. Define Search? 15. Define the term Problem? List out the component available in problem? 16. Define State space? 17. Define Path cost? 18. Define path? 19. List out the real world problem? 20. Define Blind Search? 21. Differentiate Uniformed Search and Heuristic Search? 22. Define BFS, DFS, Backtracking search, DLS, IDS, Iterative lengthening search, Bidirectional search? 23. What are the requirements by a computer to pass a Turing test? 24. Describe any two search strategies? 25. What is the role of agent program? 26. What are the advantages of one infer when machines perform intelligently? 27. Why does human being want machines to perform very similar to them? 28. Name two areas where machines cant excel human beings? 29. List some metrics to measure the success of an intelligent program? 268 30. Give the PEAS description of an “Interactive English Tutor” System? 31. Why problem formulation must follow the goal formulation? 32. Discuss the characteristics of AI Technique? 33. What are the advantages of Depth First Search? 34. What kinds of techniques will be useful for solving AI problems? 35. How will you know when you have succeed in building an intelligent program? 36. Give an example of a problem in which breadth first search would be better than DFS? 37. Define Fringe or Frontier. 38. Differentiate DFS with BFS? Big Question: 1. Explain different type’s agent with an example? 2. Explain briefly about the environment programs. Give Suitable for the same. 3. Define and solve the following problems a) Water jug problems b) Missionaries and cannibals problem 4. Define the Vacuum world problem and draw the corresponding state set space representations. 5. Explain briefly about Blind Search techniques with an example for each search. 6. Explain in detail the wumpus world environment. Discuss the model based agent and goal based agent? 7. Explain in detail the History of AI? 8. Write the algorithm for Depth First search and Breadth first search techniques? 9. What are the various domains of AI? 10. What are the requirements of intelligent agents? 11. Describe in brief the Depth First Search. 12. What is an ideal rational agent? 13. Name at least 5 agent types with percepts, actions and goals with environment. 14. Give the structure of agent with goal? 15. What is real world problem? How to formulate a concise problem out of it for solving a RWP? 16. Can robot replace human beings? Justify your answer? 17. With reference to understanding, Learning, Testing and performance during actual test, how do intelligent machines perform on encountering strange situations? Compare the performance of a man and a machine under normal situation and abnormal conditions. 18. What is meant by state space? 19. What is meant by PEAS? 20. List out few agent types and describe their PEAS? 21. Explain the structure of agent with neat diagram? 22. What are the four basic agent types of agent program in any intelligent system? Explain how did you convert them into learning agents? 23. Explain the following uninformed search strategies with example? I. BFS II. DFS 269 III. DLS 24. Explain the following uninformed search strategies I. DFS II. Iterative deepening DFS III. Bidirectional Search 25. Give an example of a problem for which breadth first search would work better than Depth First Search? 26. Implement Depth First Iterative Technique to the water jug problem? 27. Describe the DFS and BFS techniques. Explain with examples as to what type of problem each one of these techniques will be preferred? 28. Consider the DFS and BFS search methods. Write a short description to two distinct search problems each of which would solve by one of these methods and indicates the features of the problems and search method which led to your choice? UNIT-2 Two marks: 1. Define Heuristic Search? 2. What is the power of Heuristic Search? 3. When does one go for Heuristic Search? 4. What is meant by Heuristic Search techniques? 5. What is Local minima problem? 6. How does Alpha Beta pruning technique works? 7. What is the use of online search agents in unknown environment? 8. Specify the complexity of Expect minimax? 9. What is the use of Heuristic Functions? 10. How to improve the effectiveness of a search based problem solving technique? 11. What is constraint satisfaction problem? 12. What is Heuristic? 13. Define Admissibility? 14. What is Alpha Beta pruning? 15. Why the name minimax procedure? 16. Explain the concept of waiting Quiescence? 17. Is the minimax procedure a Depth first or Breadth first search procedure? 18. Differentiate Greedy Search with A* search? 19. List out the behavior of A* search? 20. Give example of for effective branching factor? 21. Define Relaxed Problem? 22. Give two examples of memory bounded search? 23. Could an agent learn how to search better? 24. Define Local Search algorithm? 25. List out the types of Local Search algorithm? 26. Write the drawbacks of Hill Climbing? 270 27. Define Simulated Annealing? 28. Briefly explain Genetic algorithm? 29. Differentiate online search with offline search? 30. Define Competive Ratio? 31. Define CSP. 32. Define Backtracking Search. 33. Define the following terms a) Chronological backtracking b) Back jumping 34. Define Degree Heuristic 35. What are the special constraints available in backtracking search? 36. Define Pruning? 37. Why study games? 38. Briefly discuss about Adversarial Search? 39. Write about Games Vs Search problem? Big Question: 1. What is heuristic search? How does it simplify learning or Knowledge representation? 2. Explain perfect decision in Game playing. Give an example? 3. Write the minimax algorithm and how does it works for the game of tic-tac-toe. 4. How does Hill Climbing ensure greedy local search? What are the problems of hill climbing? 5. How does Genetic algorithm come out with optimal solutions? 6. Write A* algorithm. Explain the processing of A* algorithm with a n example search tree with Heuristic values? 7. Explain the logic of Min-Max algorithm? 8. What are Constraint Satisfaction problem? How can you formulate them as a search problem? 9. Discuss the various issues associated with backtracking search CSPs .How they are addressed? 10. Explain the following local search strategies with example? a) Hill Climbing b) Genetic Algorithm c) Simulated Annealing d) Local Beam Search 11. Describe A* Search and give the proof of optimality of A*? 12. Give the algorithm for solving constraint satisfaction problem by local search? 13. Explain Min-Max algorithm and Alpha-Beta Pruning? 14. Explain the algorithm for steepest hill climbing? 15. Explain Min-Max Procedure? 16. Describe Alpha-Beta Pruning and give the other modifications to the minimax procedure to improve its performance? 17. Implement the Alpha-Beta seach procedure. Use it to play a Simple game such as tic-tac-toe? 18. What is Heuristic Search? Explain the 8-puzzle problem as a heuristic search algorithm? 271 19. Why does the search in game playing programs always proceed forward from the current position rather than backward from goal state? 20. A problem solving search can proceed either forward or backward .What factors determine the choice of direction of a particular problem? 21. What do you mean by Dependency Directed Backtracking? 22. Explain how the minimax algorithm is modified using alpha-beta pruning? 23. Explain the real time A* algorithm? UNIT-3 TWO MARKS: 1. Define First Order Logic. 2. Compare forward chaining and backward chaining? 3. What does ‘Description logics’ mean with reference to knowledge representation? 4. What do you understand by First Order Logic? 5. When does one go for forward chaining and backward chaining? 6. What is the requirement to develop knowledge base in FOL? 7. Draw the semantic net for the statement.”The Dog bit the mail carrier” 8. Explain modus ponen and And – elimination inference rules? 9. Write the semantics of the universal and Existential quantifiers? 10. How Tell and Ask are used in FOL? 11. What is unification algorithm? 12. Define Horn Clause? 13. Write the two functions of KB agent? 14. Define ground term, Inference logic, entailment? 15. State the advantage of generalized modus ponen rule? 16. List the names of inference rules with quantifiers? 17. How to convert the given sentence into CNF and INF form? 18. Write the short notes of canonical form. 19. Differentiate proposition logic with FOL? 20. Write the short notes of Resolution Strategies? Big Question: 1. Write the forward chaining and backward chaining algorithm and explain their use by taking simple examples? 2. Write the Unification algorithm and explain its working by taking suitable examples? 3. Discuss a backward chaining algorithm and its strength? 4. List out the conditions that are to be met to apply forward chaining. What are the steps involved in forward chaining? 5. How is First Order Logic different from higher order logic? 272 6. State additional inference rules for First Order Logic. How those can be exploited to do a proof comment on it. 7. What are the issues to be addressed in knowledge representation? 8. What are the steps involved in representing in knowledge using FOL? 9. What do you understand by symbols, interpretation and quantifiers? 10. Discuss unification and reduction to propositional inference? 11. Represent the following sentence using FOL a) Cats, dogs and horses are mammals. b) Every mammal has a parent. c) An offspring of a cat is a cat. d) Silky black is a cat. e) Silky black is a Browny’s parent. f) Offspring and parent are inverse relation. 12. Answer for the following queries a) Browny is acat? b) Silky black’s offspring is Browny? 13. Explain the steps in knowledge engineering? 14. Explain the various steps associated with knowledge engineeering process? Discuss them by applying the steps to any real world application of your choices? 15. Illustrate the use of First order logic to represent the knowledge. 16. Consider the following sentences: John likes all kinds of food. Apples are food. Chicken are food. Anything anyone eats and isn’t killed by is food. Bill eats peanuts and is still alive. Sue eats everything bill eats. i. Translate these sentences into formulas in Predicate Logic ii. Prove that john likes peanuts using backward chaining iii. Convert the formulas of a part into clause form. iv. Prove that John likes Peanuts using resolution. 17. How are the hidden properties of a world deduced in terms of its qualities? John went out to a restaurant last night. He ordered steak. When he paid for it ,he noticed that he was running out of money. He hurried home since it had started to rain. What are the hidden messages and how do you infer it? UNIT-4 TWO MARKS: 1. Define Leasrning? 2. List out the four components available in learning? 3. Briefly explain the types of feed back? 4. Define Information gain? 273 5. Define Learning curve? 6. Write about noise and overfitting? 7. Define Generalization? 8. Define Specialization? 9. Define Memoization? 10. Briefly explain EBL inputs? 11. List out the reasons why ILP gained popularity? Big Question: 1. Explain forms of Learning? 2. Explain Ensemble learning? 3. Explain Experience based learning? 4. Explain Inductive Logic Programming? UNIT-5 TWO MARKS: 1. What is Grammar induction? 2. What is meant by Communication? 3. Differentiate Bottom up and Top down parsing? 4. Give examples for open classes and closed classes? 5. Give example for augmented grammar? 6. What is meant by chart parsers? Big Question: 1. Perform Bottom up and Top down parsing for the input “The wumpus is dead”. 2. Describe the process involved in communication using the example sentence ”The wumpus is dead”. 3. How semantic interpretation is done for a sentence? Explain with an example? 4. Write short notes on a) Computer Vision b) Discourse understanding c) Grammar Induction 5. Explain the seven processes involved in communication with an example? 6. Explain the two types of parsing with suitable example? 7. Explain the chart parsing algorithm with suitable example?