Sunday, December 20, 2009

A notion of schemata in the genetic algorithm (GA)

We address the issue mentioned above in this short review.. To understand why GA works, we will have to introduce a notion of schemata. The attributes of genetic algorithm usually include a binary string representation of a chromosome and the notion of schemata.

The cromosome is a string of a given length consisting of boolean values , e.g. (10010) or (11011) and etc.

The schemata is a template that allows to explore similarities between the chromosomes. A "don't care" symbol (*) represents either 0 or 1 and is considered as a part of the schemata. For example, consider the schemata of length 5 : (1*010). It matches two chromosomes: (10010) and (11010). The schemata (*****) represents all chromosomes of length 5. Clearly every schemata matches to 2^n chromosomes, where n is the number of "don't care" (*) symbols.

The number of fixed positions (the non don't care symbols) are referred to as the order of the schemata (denoted by o(S)). For example, the following four scemata, each of length 5 have the following orders.

S_1=(**1*0); o(S_1)=2;
S_2=(10*01); o(S_2)=4;
S_3=(**1**); o(S_3)=1;
S_4=(11001); o(S_4)=0;

The distance between the first and the last fixed string positions are referred to as the defining length of the schemata (denoted by delta(S)) . For example,

delta(S_1)=5-3=2; delta(S_2)=5-1=4; delta(S_3)=3-3=0; delta(S_4)=5-1=4.

Thursday, December 10, 2009

Constraint programming

Constraint programming is an exiting modeling technique that states the relations between the variables in the form of constraints. There is a variety of constraints used in the constraint satisfaction problems , e.g. boolean "R" is true and "M" is false, inequalities like "Y"<=10, equations "X"=5 and etc. The variables can take real or integer values and etc.

To be ammended...

Tuesday, November 17, 2009

Breadth-first search

Breadth-first search (BFS) is the search algorithm that is analogeous to the Depth-First Search (DFS) except that instead of exploring the nodes in to the depth, it explores the neighboring nodes in to the breadth.

Consider the order in which the nodes are explored :




Both BFS and DFS are known to be the methods of uninformed search.  In other words, either method exhaustively searches the entire graph without considering any rules directing to the goal until it finally encounters the goal, i.e. it is not using any heuristics.

Depth-First search

Depth-First search (DFS) is a very popular algorithm for traversing (or searching) a tree structure, or a graph. It is commonly used for a variety of applications in artificial intelligence (AI) and computer science!

The search is started at the root (there may be several roots in case of the graph) to explore all nodes along every branch untill either a goal node is found or the last node has no children to visit. Then, the algorithm backtracks to the previous node to explore adjacent branch until the final node is reached.

Consider the following example of order in which the nodes are visited:



The algorithm starts from the root (1) to explore the left branch until the node (4) is reached, then it backtracks to node (3) and proceeds to node (5), from which, it, again,  returns to (3) before going to (6). Since node (3) has no other descendants to visit from, the algorithm returns to node (2) in order to explore node (7) and etc. until the final (16) node is reached.

As one can see, we have to return to the parent nodes many times. For example,  since there are 3 descendants of the third node, the third node will be returned 3 times. However, there are some advanced implementations of DFS that avoid returning to the parent nodes, and, thus, saving a lot of time and efforts!

To be ammended ...

Sunday, November 15, 2009

Focus on modelling : Travelling Salesman Problem (TSP)

In addition to well-known technical algorithms, it would also be interesting to, sometimes, discuss most challenging problems of mathematical modelling. The problem, we shall discuss today, is the Travelling Salesman Problem, or TSP for short.

TSP is a well known area of reasearch having many practical applications. It originated in Medieval Times, and, at a first glance, TSP may seem to be a simple problem having the following description : a travelling merchant (salesman) is planning to visit every city exactly once before returning to his home town. Since the distances between the towns are known, the merchant is planning his itinerary to minimize the total distance travelled. The problem is easily solvable by inspection of routes for relatively small number of cities. However, the problem is becoming more difficult to solve the more cities become involved. The difficulty of the problem is n! permutations. The TSP was proven to be NP-hard. Therefore, exhaustive enumeration of routes becomes impossible for large instances.

Some state-of-the-art algorithms based on branch-and-cut approach sucessfully solve relatively large TSP instances having hundreds or thousands of cities. However, due to problem's complexity, heuristics-based methods also appear to be very popular. For example, genetic algorithms prove to be very effective at exploring various parts of the search space by gradually evolving populations of chromosomes towards the optimum. On this way, the challenging questions to think about are how the chromosomes will be represented, and how recombination and mutation will work ...

Meantime, the author will look for the most efficient representations of chromosomes and post same in the forthcomming days ... :)

Friday, November 13, 2009

Review of sorting methods

We would like to prepair a brief review of the state-of-the-art sorting algorithms. Readers are invited to send descriptions and/or references!

Thursday, November 12, 2009

Quick sorting

Today, we will discuss "quick sorting" one of the most efficient and widely used algorithm of sorting. The method is based on the "devide-and-conquer" approach. It is completed by recursion in the following 3 steps :

1. In the array of elements, choose any element a_p indexed by p;

2. Each element, less than or equal to a_p is placed to the left of a_p, otherwise - to the right. So that we have two subarrays {a_j} and {a_k} :

{a_j} <= [a_p] > {a_k}

3. If more than 2 elements are available in either subarray, launch the procedure recursively for such subarray .

Finally, we obtain completely sorted sequence of elements.

___

But, let us further look into the details :

Assume we have an array {a_i}, where i=0, ..., N . Randomly choose any element a_p indexed by p , and follow the 4 basic steps below :

1. Let i and j be the indexes pointing at the first and the last element of the array : a_i and a_j respectively, so that i=0 and j=N.

2. We will be increasing index i by 1 until we find an element a_i such that a_i >= a_p . Similarly, we will be decreasing j by 1 untill we find an element a_i <= a_p 3. if i <= j, swap elements a_i and a_j 4. while i <= j repeat steps 2 and 3 . For example, consider the following array having 7 elements a_0, ..., a_6 , where a_3=6 (remember elements are counted starting from 0) is the splitting element (circled in the Figure below), then :

By now, the array has been split into two parts : each element of part 1 is less than or equal to a_p = 6, while any element of the second part is greater than or equal to a_p. Therefore, at this stage, splitting has been done.

To recap, the pseudocode can be presented as follows.

quickSort(array a, a.size() ) {
     Select any element a_p (e.g. in the middle of array a);
  Split array {a} into two subarrays: to the left and to the right of a_p;
  If the subarray to the left of a_p contains more than 1 element, call quickSort for it;
  If the subarray to the right of a_p contains more than 1 element, call quickSort for it;
}

There are Theta(n) operations needed per each split. The depth of recursion is about log n, given that the array is being split in more or less equal parts. Therefore, total complexity is proven to be O(n log n).