Courses/CS 460/Search and the Boat-Torpedoes problem
From CSWiki
This page discusses search in the context of the boat-torpedoes problem.[1]
[edit] The essence of artificial intelligence is search
In most software, the algorithms that the software implements (or less formally, the processes that the software performs) solve a problem or perform a task in ways that are understood in advance. Artificial Intelligence is required when it is not known how to go about solving a problem or performing a task. In such cases one writes a program that searches for solutions to the problem or for ways to perform the task.
The following chapters from Russell and Norvig are most directly relevant to search.
Chapter 3. Solving Problems by Searching. This chapter discusses the basic search techniques.
Chapter 4. Informed Search and Exploration. This chapter discusses Best-first, A*, online search, and genetic algorithms.
Chapter 6. Adversarial Search. This chapter discusses search in which one is dealing with an adversary that is performing a search of the same search space but with opposing objectives.
Chapter 11. Planning. This chapter discusses search in a state space that is much less homogeneous than those discussed earlier.
Chapter 12. Planning in the Real World. This chapter discusses online search in an uncertain environment.
[edit] The boat-torpedoes problem
In the boat-torpedoes problem a boat on a featureless ocean has as a goal to reach a safe harbor, its home port. The boat is being pursued by torpedoes. The torpedoes are faster than the boat, but the boat is able to turn more sharply than the torpedoes. The boat's goal is to reach its home port without being hit by a torpedo.
All information is known. The boat knows where it is; it knowns where its home port is; and it knows where the torpedoes are. The torpedoes know where the boat and the boat's home port is.
The boat and the torpedoes move at constant speeds — with the torpedoes moving faster.
The problem is formulated as a time-stepped simulation. At each time step the boat and the torpedoes are required to turn either left or right by a fixed amount — with the boat turning more than the torpedoes — and then to move forward by a fixed amount, with the torpedoes moving farther than the boat. At the default settings, the torpedoes move twice as fast as the boat, and they turn 30% as sharply.
The boat reaches home if it comes within epsilon of its home. A torpedo hits the boat if it comes within epsilon of the boat. The overall space is square and toroidal. The sizes of the distance increments are such that at the default settings the boat would be able to traverse the height/width of the space in 400 steps were it to go in a straight line. Of course the boat can't move in a straight line because it is constrained to turn either left of right by a fixed angle (with a default value of 0.1 radians) at each time step.
[edit] Problem space size
The positions of the boat and torpedoes are recorded as double values. Were the space digitized more coarsely, it would consist of a square grid of at least 400 x 400 or 160,000 locations. The boat and torpedoes could each be at virtually any of those points. Each could be in any of more than 60 orientations. Under the default settings, there are 5 torpedoes. Hence the boat and torpedoes each could be in any of the nearly 160,000 locations facing in any of approximately 60 directions, resulting in an order of at least (160,000*60)6/5! ~= 10,000,0006/5! which is nearly 1040 possible states. In other words, the space is far too large to explore directly. Therefore one cannot approach the problem by presuming that one could enumerate the possible states and examine the entire problem space.
[edit] Intelligence
The torpedoes are constrained to be reactive. At each time step, they turn, as best they can, towards a point somewhere in front of where the boat is currently located in the direction in which it is currently headed. The closer a torpedo is to the boat, the less it leads the boat. (Currently an ad hoc approximation is used that seems to maximize the torpedoes' effectiveness in this regard.) The torpedoes do not work as a team. Each torpedo attempts to intercept the boat individually.
The boat is an intelligent agent. At each time step it projects ahead along its possible paths. It also projects how the torpedoes will behave given its possible moves. (It can do that because it knows how the torpedoes operate.) It then selects what it considers its best path. If time and space were not an issue, this would be a trivial problem. The boat would search for a path to its home base that avoided all the torpedoes. Depending on the initial conditions there may or may not be such a path. With no time and space constraints, this would be similar to a maze problem in which the boat is navigating a maze whose pathways predictably open or are blocked (by the presence of the torpedoes) as time passes and as a function of how the boat moves.
Because the search space is so huge, this problem can't be approach in the normal way. Furthermore, it is required that the boat respond in a fixed amount of time. This makes the problem a real-time search problem. The boat is required to take a step at each time tick after only a fixed amount of time has elapsed. (The torpedoes are also required to act in real-time. That isn't a problem for them. At each time tick, they perform a simple computation that determines how to adjust their heading as a function of their current heading and position and that of the boat.)
The rest of this page discusses the standard search strategies as presented in Russell and Norvig and then discusses a new search strategy that seems to be more effective for this problem.
[edit] Chapter 3. Basic tree search
For the problems addressed by Chapter 3, the objective is for an agent (the problem solver, in our case the boat) to solve a problem. A problem is defined in terms of what is often called a search space. Each point in the search space represents a state — of the world and of the search agent's relationship to the world. In the example discussed in the chapter, the agent is located at a starting city. The problem is to find a way for the agent to go to a goal city. Transitions from point to point represent possible transitions from one state to another, in this case from one city to another. In this problem, the transitions represent roads from one city to another. It is often desirable to find an optimal solution, i.e., the least expensive (according to some measure) way for the agent to move from its initial position to its goal.
Chapter 3 covers what are known as blind search strategies. The searches are called blind because the agent has (or uses) no information about the problem space other than the graph that defines possible transitions among states.
Most blind search approaches are best understood in terms of a tree. The agent is at the root of the tree. It is able to move from the root to various other states. Its job is to explore the tree until it finds a path (or an optimum path) to the goal state. Since the problem space itself is not necessarily a tree, i.e., there may be loops in the state transition graph, the agent must keep track of visited nodes so that it doesn't repeat itself. By doing this it essentially transforms the problem space graph into a tree with the initial (or current) state as the root. With this formulation, search can be re-examined as a problem in tree traversal.
The best known tree-traversal methods are breadth-first and depth-first. There are advantages and disadvantages to each. A variant of depth-first search, known as iterative deepening (depth-first search to increasingly large depth limits), eliminates some of the problems with depth-first search and gains some of the advantages of breadth-first search by paying a penalty in search time.
Both breadth-first search and depth-first search may be understood in terms of a frontier that keeps track of the nodes of the tree that are yet to be expanded. In bread-first search the frontier is a FIFO queue. In depth first search, the frontier is a LIFO stack. As the search progresses, the agent is at a particular node of the tree, which is the front element of the frontier. It generates a list of all the node's children. If it appends the list of children to the end of the frontier and draws nodes to be expanded from the front of the frontier, one has breadth-first search. If it inserts the list of children at the front of the frontier and then draws nodes to be examined from the front, one has depth-first search.
Blind search is not feasible for the boat-torpedoes problem. If one supposes that on average the boat will require 500 steps to dodge the torpedoes and reach home (an optimistic estimate), blind search would require that the boat explore a search tree of depth 500. Clearly no blind search strategy can do that in a reasonable amount of time.
More specifically, breadth-first search is unsuccessful because breadth-first search requires that the agent examine a full tree. Since each additional level of the tree represents only a relatively small distance traveled, the depth required to make any real difference is too great for realistic use.
Depth-first is no better. A pure depth-first search would expand nodes, continually turning in one direction until the generated path is hit by a torpedo — or reaches home. If hit by a torpedo, the search would back up one step, turn in the other direction, and continue turning in the original direction. But this approach too falls victim to the too-short-distance-per-step problem. Backing up a single step, turning in the other direction, and then proceeding in the original direction will almost always make very little difference. The torpedo that caught the boat along the first path will catch it along the second path as well. Eventually, the boat will backtrack far enough so that it will avoid that torpedo. But the number of nodes required to be examined will exceed any reasonable amount.
[edit] Chapter 4. Informed search
Informed search modifies the tree searches discussed in Chapter 3 by ordering the nodes in the frontier in a more sophisticated way. Instead of putting newly generated nodes either at the front or back of the frontier, one might evaluate a node and place it at a position in the frontier that corresponds to an estimate of how likely it is to produce a solution. Nodes that are deemed more likely to lead to a solution are placed near the front of the frontier. Those that are deemed less likely to lead to a solution are placed near the back of the frontier.
This approach is called informed search because the agent must have some way of evaluating nodes with respect to their likeliness of producing a solution. The two best-know informed search techniques are best-first and A*. In best-first search one orders the frontier on the basis of estimates of how close the nodes are to the goal. Nodes that are deemed closer to the goal (i.e., better) are placed toward the front of the frontier. In A* search, one orders the nodes on the basis of (a) how close the node is to the goal along with (b) a measure of how much work is required to reach that node. In other words, nodes are ordered according to an estimate of how good the solution will be if that node leads to a solution. The nodes that are deemed likely to lead to better overall solutions are put earlier in the frontier. For the boat-torpedoes problem, we don't care how long the boat stays out as long as it eventually gets home. So best-first is essentially equivalent to A*.
Best-first and A* search may seem more promising than pure blind search. Following a best-first search strategy the boat would explore the most direct path from its location to the home base. If that path is intercepted by a torpedo, the boat would back up a step and turn in the opposite direction from the direction that represents the intercepted path. It would then proceed toward its home base. But this approach also falls victim to the too-short-distance-per-step problem. As noted above, backing up a single step, turning in the other direction, and then proceeding toward home will almost always make very little difference. The torpedo that caught the boat along the first path will catch it along the second path as well.
[edit] The genetic algorithm
Sections 4.3 and 4.4 discuss searches in which one looks for solutions to a problem — rather than looking for paths that lead to goals. The genetic algorithm is the best known approach for this sort of search. The boat-torpedoes problem may be reformulated to fit this paradigm by reframing it as a search for a path from the current position to the home base. That is, one is no longer traversing a tree looking for a path from the root to a goal leaf. Instead one is standing back from the tree and looking at all possible paths from the root to determine whether one of them leads to the goal leaf. Since there are far too many paths to do that exhaustively, the genetic algorithm approach is to generate a population of possible paths and hope to evolve a successful path.
Originally, a form of genetic algorithm was used in this problem. The population consisted of paths through the tree, starting at the root. These paths were subjected to cross-over and mutation. They were evaluated on the basis of (a) whether they reached home and if not (b) some combination of how long they were before being hit by a torpedo and how close they came to reaching home.
The advantage of this approach is that it generates a population of fairly diverse paths — which do not suffer from the too-short-distance-per-step problem. That is, two elements of the population often differ significantly in terms of how they attempt to traverse the ocean.
There are two disadvantages to this approach.
- Genetic operators (especially cross-over) make no sense in this context. It makes no sense to append the end of one path to the start of another. More sophisticated cross-over schemes are even less applicable. If one drops cross-over and works strictly with mutation, this approach degenerates to looking for random paths through the tree.
- The population tends not to be diverse enough. Even though the population is diverse, it is not diverse in an important way. If one generates random paths through the tree, each path is likely to have more or less the same number of left and right turns. In fact, one would get a normal distribution with most of the paths bunched in the middle and having approximately equal left and right turns. The paths may be significantly different from each other on a turn-by-turn basis, but the population as a whole tends not to explore a very broad portion of the ocean.
Although a genetic algorithm approach turned out not to be successful, its deficiencies suggested an approach that was successful. Two variants of that approach, which work quite well, are described at the bottom of the page.
[edit] LRTA*
So far we have not attempted to say what we mean when we say that the searches require visits to too many nodes. Section 4.5 approaches this problem directly. As that section says, the searches we have examined so far are all what are called offline searches. The assumption is that the agent (the boat in our case) performs its search using some sort of internal computation before making any actual moves. That's possible for the boat-torpedo problem because the boat knows the locations of its home base and of the torpedoes, and it knows how the torpedoes will respond to its moves. In what Russell and Norvig call online search,the agent operates by interleaving computation and action; first it takes an action, then it observes the environment and computes the next action.The preceding, of course, is an accurate description of the boat-torpedo problem. The boat and the torpedoes are each required to do some time-limited computation and then to make a move in the simulated world. Consequently, one might hope that techniques that accommodate this framework might be useful.
According to Russell and Norvig, online searches are especially useful when an agent is attempting to explore an unknown environment. The agent has no way to learn about the environment except by moving around in it. That is not the case in our problem. The ocean is featureless; there are no hidden hazards. Furthermore, the boat knows its own location, the locations of the torpedoes, and the location of its home base. It also knows exactly how the torpedoes will move at each time step. Those moves will depend on the boat's own move, but the boat knows that also. So the boat doesn't have to move around in the environment to learn about it.
Nonetheless, section 4.5 discusses a form of search that may initially seem relevant to our problem. It is called Learning Real Time A* (LRTA*). For our purposes, the most relevant aspect of LRTA* is the requirement that the agent, like our boat, must act in real time. In other words, the agent is permitted only a fixed finite number of steps (either actual exploratory steps or internal node expansions) before it is required to make a move of some sort.
Unfortunately, LRTA* is less powerful than the searches we have already dismissed. That's because the agent is essentially constrained to position itself somewhere in the frontier. Consider the LRTA* variant of simple best-first. According to simple best first search, the agent has access to the entire frontier. As it expands the current best node, it inserts the children at appropriate places in the frontier. In LRTA* the agent is positioned somewhere in the frontier. The frontier cannot be ordered but corresponds roughly to the states the agent has already visited, along perhaps with neighbors of those states. Each frontier state is marked with an estimate of the cost to complete the path from that state. As the agent expands a node, it learns more about the possible paths from the expanded node. That gives it more information about the neighboring states, which it updates. But the agent is never able to develop a bird's-eye view of the entire frontier;. it can only follow the local gradient of the frontier to the best currently known node. Since A* itself is inadequate for the boat-torpedo problem, LRTA* is even worse. Not only is the boat still victim to the too-small-distance-per-step problem, the boat can't even jump from one point in the frontier to another that may look more promising.
Another mismatch between LRTA* and the boat torpedo problem is that LRTA* assumes what Russell and Norvig call a safe environment. The most salient feature of the boat-torpedo problem is that the environment is decidedly not safe: the boat may be destroyed by a torpedo. Fortunately, the boat is able to compute ahead and avoid many (but not all) of the torpedo hazards. In LRTA* search, the only way the agent can learn about hazards in the environment is to encounter them.
One might modify LRTA* by giving the agent the ability to sense a neighborhood around itself. Then the agent can learn about a hazard without necessarily falling victim to it. The real heart of the boat-torpedo problem is essentially that: to provide the boat a way of seeing far enough ahead in its environment so that it can avoid the torpedo hazards. The challenge is that in order to avoid a torpedo, the boat must generally make a decision about its direction very many moves in advance of its potential encounter with the torpedo.
[edit] Blind alley problems
One might call problems in which significant foresight is required blind alley problems. A blind alley problem is one that has the property that once an agent determines that the path it is exploring turns out to be a blind alley — with a torpedo at the end in our case — it's too late to turn back. It may be that the alley has some internal options. But the difficulty is that the torpedoes adjust their moves dynamically. Once a torpedo has the boat in range, it's often impossible for the boat to escape no matter how it turns. Thus the boat must anticipate blind alleys far in advance. None of the search techniques discussed so far allow the boat to explore multiple distinct paths in depth. It cannot look down an alley and determine what lurks at the end.
How in general should one approach a blind alley problem? Unless one can see through to the solution of a problem, every path is a potential blind alley. With time-limited computing, it's impossible to determine in general whether a path is a blind alley or not. The only advantage the agent has is that it can project ahead and keep a number of options open for itself. As time passes, and as it explores these options further, it may find that some of them are blind alleys and that some of them open up to new possibilities. The best the agent can do in a blind alley problem is to assure as best it can that it always has a range of available options. To do that it must ensure that the collection of options it maintains for itself is as diverse as possible. Since it doesn't know which of the options will eventually turn out to be dead-ends, it must provide for itself as many different kinds of options as possible.
In fact, one might want to modify the boat-torpedoes problem to be a problem in pure survival. Instead of having as a goal to reach a home base, the goal of the boat could be simply to avoid the torpedoes as long as possible. This would make the problem somewhat simpler to state and somewhat purer to solve. A strategy for the boat would then be evaluated on the basis of how long the boat stays out of the paths of the torpedoes. Although this seems like a worthwhile change, for this page we will stay with the original problem.
The more general case of this sort of problem is the generic strategy planning responsibility of people at very high levels of organizations. One never knows what's going to happen. So it's important to keep as many options open as possible. Of course it's also important to act when an opportunity presents itself. So there will always be a tension between committing to a course of action and ensuring that multiple options are available.
[edit] Chapter 6. Adversarial search
Adversarial search matches two adversaries with equal search capabilities. Adversarial search is used for game playing. Each participant knows the rules of the game; each participant knows what the possible moves are of the other participant; and each participant can determine how best to move to ensure its best possible result.
In adversarial search both participants construct what is called a game tree. The game tree indicates the states that would exist were the participants to make the various possible moves. Through a process of minimax (and its more efficient implementation called alpha-beta pruning) each participant can determine the best possible outcome.
In games in which the tree can be explored completely, the game is determined. In games, like chess, in which the tree is too large to explore completely, heuristics must be developed to assign values to the nodes that can be reached in the time available.
The boat-torpedo problem may be understood as something like a one-sided adversarial search problem. The boat is capable of making decisions about how to move. The torpedoes respond predictably at each opportunity. So one may imagine the boat expanding the game tree and looking for the best result. But putting the problem in those terms doesn't help. It simply points out that the problem is how to expand the tree. We already know that expanding the tree in any compact way (breadth-first, depth-first, best-first, A*, etc.) will be unsuccessful. Too many nodes are required before one gets far enough away from the root to be able to distinguish in a useful way among different paths.
[edit] Chapters 11 and 12. Planning
The chapters on planning extend search to problem spaces that have more sophisticated transitions. Like other searches, planning is an attempt to formulate a series of moves that will transform an initial state into a desired goal state. Planning is more complex than other searches in that the transition rules have preconditions for their application. In some ways this is not very different from saying that the road that leads from city 1 to city 2 requires that the agent be in city 1 before the road may be taken.
Since the boat-torpedo problem does not involve sophisticated transition rules, the additional power that sophisticated transition rules offer is of no help in this problem.
[edit] A successful approach
Interestingly, one of the approaches that Russell and Norvig dismiss turns out to be quite successful. It is iterated gradient ascent but with the the twist that the paths generated are intended not primarily for finding a solution but as a way to ensure that the boat has a diversity of options.
We have developed two versions of this approach. Under either of these approaches, the boat proceeds as follows. At each time step it generates a diverse population of possible paths. Actually, because the boat is always expanding a tree, it will always have a collection of options. At each time step it generates additional options and then selects which ones to keep.
It does this in real time by limiting itself to a fixed number of node expansions. All this work is done as a computation and does not involve the boat making any actual moves. Once the boat has generated its path options, it selects whether to move left or right. This approach then requires two types of decisions.
- How should the boat generate the population of paths?
- How should the boat select in which direction to more?
Our working framework has been (and continues to be) that the boat computes a tree of possibilities. Thus at each time step, the boat has access to the tree of nodes that it has already computed as being accessible, i.e., as being reachable without being hit by a torpedo. The boat must then select nodes from that tree to expand.
[edit] How should the boat select nodes to expand?
There are a number of possibilities.
- It could pick a node at random.
- It could pick the "best" node, where best is measured as closest to home or farthest from the nearest torpedo, or farthest from the root, or according to some other measure.
- It could select from nodes that are distributed between the two subtrees of its current root node. The argument for that strategy is that when it moves it will be required to select either the left or right subtree of its current root position. It therefore makes sense to attempt to explore nodes from both sides of those subtrees before it is required to discard one of them.
Once the boat has a strategy for selecting nodes to expand, it must generate expansion nodes.
[edit] How should the boat generate paths?
We developed three approaches. All of these approaches are intended to ensure that the generated paths are sufficiently different that as a collection they will offer the boat a reasonably diverse set of options. In all three versions, the boat generates paths from nodes selected for expansion. It generates each path until that path is intercepted by a torpedo or until that path reaches home.
- In one version the boat generates paths that reach toward home. A nice way to generate a diverse collection of long paths toward home is to realize that we are working in a toroidal world. From any point in that world, the boat has 9 straight line directions to its home base. To see this think of the world as a square surrounded on each side by a copy of itself and with four more copies at each of the four corners. From the boat's position in the middle square, which represents the actual world, there are nine lines to the home base images in each of the nine versions of the world. So to generate a path, the boat generates a best-first path to one of the nine images, selected at random. This approach tends to generate enough long paths that are different enough from each other that the boat buys itself time. Eventually, one of the randomly generated paths will actually reach its selected home image, and the boat will make it safely home.
- The other version is similar except that instead of generating best-first paths to a home image, the boat generates a collection of paths, each having a randomly selected percentage of left and right turns. That is, from the selected start node, the boat generates a path with a given random percentage of left and right turns. This too results in the generation of a diverse collection of sufficiently long paths that the boat buys time until it finds a way home. This approach requires that the boat favor path starting points that are closer to home. Note that this approach generates curves with higher or lower curvatures. A path with all right turns for example, will generate an approximate circle that reflects the tightest circle the boat can make.
- If, as suggested earlier on this page, we modified the problem to be one simply of staying alive, one could generate paths by selecting a random direction and then have the boat turn as closely as possible so that it is moving in that direction. The initial part of these paths will curve until the boat changes its heading to approximate the selected direction. After that the boat will zig-zag to move in the selected direction. This version would be similar to the approach of generating a straight line path to one of the 9 home images except that the boat is driven by a direction rather than a target point.
In both of these strategies, the boat continues to generate paths—terminating a path when it is hit by a torpedo or reaches home—until it reaches its node-expansion limit for the time tick.
[edit] How should the boat select which subtree to keep?
Once the boat generates its paths, it needn't commit itself to any one of them. All it has to do at each time tick is to commit itself to a single move: left or right. So the boat must decide not among paths but between trees. Is it better to keep the left subtree or the right subtree? The question becomes: is there a way of evaluating (sub)trees so that the boat can select one over the other?
- One option is to select the tree that contains the "best" node, however best is determined. But that may not be the best choice. The best node may eventually lead to a dead end, and the other paths in that tree may not provide enough viable options.
- Another possibility is to select the tree with the most nodes. That approach is based on the argument that the tree with the most nodes gives the boat the most chances. But if all the nodes of a tree are essentially corralled by the torpedoes, e.g., in a large blind alley, a simple count of the number of nodes may not be the best choice.
- Another possibility is to select the tree with the longest path. That tends not to be too bad a choice since a long path gives the boat a great many steps during which to generate new nodes before it eventually reaches the end of the initially selected path. By that time another path is likely to have opened up and be the longest.
- It would seem that the best choice is the (sub)tree with the most different paths. But it's not clear how to measure trees for path variety. At this point we have not developed an effective metric for comparing trees on this basis. We suspect that such a metric will combine path length with path direction. We will want long paths, and we will want paths that end up going in different directions. A subtree that has paths with both properties will probably offer the boat the broadest variety of viable options.
- ↑ The boat-torpedoes problem is based on a project created by Gerald DeJong in the late 90's for an artificial intelligence course.

