Tuesday, July 19, 2016

Genetic Algorithms with C#

In this post I want to present so called genetic algorithms, which belong to the larger class of evolutionary algorithms. Of course we will arrive at a complete, ready-to-use C# implementation in the end, but this post is also a step in the direction to broaden the topics of this blog, for example also including theoretical computer science.

What are genetic algorithms? Genetic algorithms are stochastical search heuristics which mimic the process of natural selection. For finding a solution they use a group of individuals, which all represent single solutions of the problem. In every round of the algorithm via certain operators from the old group a new population is created.
These operators are:

  • Survival of the fittest (the best individuals from the previous round are taken in the next one)
  • Crossover (two individuals of the previous round take the role of "parents" of an individual of the new population)
  • Mutation (an individual mutates random properties)
We hope, that the individuals of the populations get "fitter" over time, and after several rounds we get a good individual, and thus a good solution.
Genetic algorithms are or can be used to solve complex problems, usually NP-hard ones, for which the search space is too big to completely traverse it and one generally has little clues about how to solve the problem. Genetic algorithms are a good general approach to solve problems of all kinds.
But in this generality lies also a weakness of this heuristic: For many problems, also hard ones, there are very good problem specific solving approaches with which genetic algorithms cannot compute. Further it is not clear, how the idea of two parents translates to the algorithmic principle. Procedures like Simulated Annealing inherently use one parent, when staying to the analogy, and thus might make more sense.


Readers interested in this wide topic are referred to the existing literature. Here I now want to continue with the implementation of the genetic algorithm and solve instances of the Travelling Salesman Problem (TSP) with it.
The Travelling Salesman Problem is a relatively well-known NP-hard problem. It is about finding a cheap round trip between a set of cities - that means, every city has to be visited exactly once, and the salesman has to return to his starting city in the end. With n cities there are (n - 1)! possible orders to visit them. This is an incredibly fast growing function, for example for 10 cities we get 360.000 combinations, for 15 cities there are already 87 billion combinations and for 70 cities more than there are atoms in the entire universe.
Thus in principle genetic algorithms seem like a good candidate to get a relative good approximate solution. But on the other hand there are very good approximation algorithms for this problem, so think of the introductory words.

We manage a TSP instance in the class TSPInstance. For n cities the n x n integer matrix Connections is created, in which entry (i, j) describes the distance from city i to j. If there is no connection between these cities, the entry is set to infinity (Integer.MAX_VALUE).
Such an instance then is passed to the genetic algorithm. Starting function is the function Go() of the class GeneticAlgorithm.
In this we see the basic flow of the algorithm: First we initialize the starting population. Then we run for a certain number of iterations, in which we call the function Breed(), which creates the new population out of the previous one.
We represent a solution of the TSP instance simply as a permutation of the cities, which describes the order in which these are visited. So ADBC for example stands for the route A-D-B-C-A. In the program the cities are numbered, so we use an integer array, the content would then for example be 0, 3, 1, 2. The "fitness" of an individual are the costs of the tour (infinity if the tour is not possible).

At the beginning of the algorithm we create an array of individuals and fill this with random tours, so random permutations of the city. The size of the array corresponds to the size of the population. As a size we take 30. This parameter, as the following ones, has a huge impact on the quality of the solution. For every problem different parameters should be tried. For the population size a size of 30 is often used.

In the function Breed() we first iterate over the current population and calculate the fitness of the individuals. Then we sort them by ascending fitness.
This way we can simply take the best individuals from the previous population into the new one. The percentage of these is expressed in percent, here we use 10, which means that the 3 best individuals of the previous population are taken into the new one.
We then have to fill the remaining 27 spots in the new population, which we do via crossover and mutation.

When iterating over the old population we create the array SummedFitness, in which cumulated fitness values are saved. So at position i the sum of the fitness values of the individuals 0 - i is saved. With this we can do a so called fitness proportionate selection, in which individuals with a higer fitness are chosen preferably as parents. For this we create a random number between 0 and the last value of SummedFitness, the individual, in whose interval the number falls, is selected.
This way we choose two parents. Now we again let a random coin decide whether we want to do a crossover operation. I here chose a crossover probability of 80 %. If no crossover is done, the child is simply a clone of parent 1. If a crossover is done, the child becomes a genetic combination of both parents. For mixing the genomes there are different possibilites. Here we use the following approach: We randomly choose a crossover point X between 0 and n - 1.
In the child we then select the first X - 1 cities from the tour directly from parent 1. The remaining cities are ordered in the order how they occur in parent 2.
As an example let us consider the parents AGEDCBF and FDAGEBC. As a crossover point we choose 3. The child thus starts its tour with the cities AGE. Remaining are the cities D, C, B and F, which are ordered like in parent 2: FDBC. Thus we get as the complete tour for the child: AGEFDBC. Again let me say, this is just one possible way to do a crossover, for example there is also the possibility to split the string in 2 parts (Two-point crossover).

In the last step the child is randomly mutated. Also for this there are different possibilities, we use the following one: We iterate over every element of the tour and mutate with a probability of 10 %. If the mutation rate is too low, the algorithm cannot create enough good solution candidates and gets stuck in a local minimum, if it is too high, the algorithm becomes random search. If we mutate, we randomly select another city in the tour and switch the two.

With this, we created, based on the old population, a new child. We repeat this process until there are again 30 individuals in the new population.
Since the best individuals remain in the populations, the fitness of the best individual never decreases and hopefully increases over time. In the end we output the best solution.

Here the code of a console application, I think this should be relatively good readable with the above explanation:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace GA_TSP
{
    class Program
    {
        static void Main(string[] args)
        {
            TSPInstance Instance = CreateRandomTSPInstance(20);
            GeneticAlgorithm GA = new GeneticAlgorithm();
            GA.Go(Instance, true, 0.8, 0.1, 0.1, 30, 10000);
        }

        // Create a random TSP instance, connect two cities with probability 50 % and choose a length between 0 and 1000.
        public static TSPInstance CreateRandomTSPInstance(int n)
        {
            TSPInstance Instance = new TSPInstance(n);
            Random Rnd = new Random();
            for (int i = 0; i < n; i++)
            {
                for (int j = 0; j < n; j++)
                {
                    if (Rnd.Next(100) < 70)
                    {
                        Instance.AddConnection(i, j, Rnd.Next(1000));
                    }
                }
            }
            return Instance;
        }
    }

    public class TSPInstance
    {
        public int n;
        public int[][] Connections;

        public TSPInstance(int _n)
        {
            n = _n;
            Connections = new int[n][];
            for (int i = 0; i < n; i++)
            {
                Connections[i] = new int[n];
                for (int j = 0; j < n; j++)
                {
                    Connections[i][j] = int.MaxValue;
                }
            }
        }

        public void AddConnection(int start, int end, int distance)
        {
            Connections[start][end] = distance;
        }
    }

    public class GeneticAlgorithm
    {
        public class Individual : ICloneable, IComparable<Individual>
        {
            public double Fitness;
            public int[] Tour;

            public Individual(int length)
            {
                Tour = new int[length];
            }

            // needed for sorting, sort individuals by their fitness
            public int CompareTo(Individual other)
            {
                if (other.Fitness < Fitness)
                    return -1;
                if (other.Fitness > Fitness)
                    return 1;
                else
                    return 0;
            }

            // creates a deep copy of an individual
            public object Clone()
            {
                Individual Cloned = new Individual(Tour.Length);

                for (int i = 0; i < Tour.Length; i++)
                {
                    Cloned.Tour[i] = Tour[i];
                }
                Cloned.Fitness = Fitness;
                return Cloned;
            }
        }

        // algorithm parameters, experiment!
        double CrossoverFrequency;
        double MutationFrequency;
        double ElitistFrequency;
        int PopulationSize;
        int NrIterations;

        Individual[] Population;
        Individual Best = null;
        TSPInstance Instance;
        Boolean Print;

        public Individual Go(TSPInstance inst, Boolean print, double co, double mu, double el, int popsize, int it)
        {
            CrossoverFrequency = co;
            MutationFrequency = mu;
            ElitistFrequency = el;
            PopulationSize = popsize;
            NrIterations = it;

            Instance = inst;
            Population = new Individual[PopulationSize];
            Print = print;

            for (int i = 0; i < PopulationSize; i++)
            {
                Population[i] = new Individual(inst.n);
            }

            Initialize();

            for (int i = 0; i < NrIterations; i++)
            {
                Breed();
                if (Print && i % 100 == 0)
                {
                    Console.WriteLine("Iteration: " + i.ToString());
                }
            }

            if (Print)
                Console.WriteLine("Best: " + Best.Fitness.ToString());

            return Best;
        }

        public void Initialize()
        {
            // fill the initial population, for each individual create a random permutation of the cities
            Random Rnd = new Random();
            for (int i = 0; i < PopulationSize; i++)
            {
                List<int> Cities = new List<int>();
                for (int j = 0; j < Instance.n; j++)
                {
                    Cities.Add(j);
                }
                int Counter = 0;
                while (Cities.Count > 0)
                {
                    int Index = Rnd.Next(Cities.Count);
                    Population[i].Tour[Counter++] = Cities[Index];
                    Cities.RemoveAt(Index);
                }
            }
        }

        public void Breed()
        {
            double[] SummedFitness = new double[PopulationSize];
            // first calculate the fitness of each individual and sort in ascending order
            double SumFitness = 0;
            for (int i = 0; i < PopulationSize; i++)
            {
                Population[i].Fitness = CalculateFitness(Population[i]);
            }
            Array.Sort(Population);

            for (int i = 0; i < PopulationSize; i++)
            {
                if (Best == null || Population[i].Fitness < Best.Fitness)
                {
                    Best = Population[i];
                    if (Print)
                        Console.WriteLine("New best found, fitness: " + Best.Fitness.ToString());
                }
                SumFitness += Population[i].Fitness;
                SummedFitness[i] = SumFitness;
            }

            Random Rnd = new Random();

            Individual[] NewPopulation = new Individual[PopulationSize];

            // take best individuals from last population
            for (int i = 0; i < Math.Ceiling(PopulationSize * ElitistFrequency); i++)
            {
                NewPopulation[i] = (Individual)Population[i].Clone();
            }

            // fill the rest of the new populaton
            for (int i = (int)Math.Ceiling(PopulationSize * ElitistFrequency); i < PopulationSize; i++)
            {
                double RParent1 = Rnd.NextDouble() * SummedFitness[PopulationSize - 1];
                double RParent2 = Rnd.NextDouble() * SummedFitness[PopulationSize - 1];

                Individual Parent1 = null;
                Individual Parent2 = null;
                Individual Child = new Individual(Instance.n);
                // roulette wheel selection of the parents
                for (int j = 0; j < PopulationSize; j++)
                {
                    if (Parent1 == null && SummedFitness[j] >= RParent1)
                    {
                        Parent1 = Population[j];
                    }
                    if (Parent2 == null && SummedFitness[j] >= RParent2)
                    {
                        Parent2 = Population[j];
                    }
                }

                // do a crossover with chance CrossoverFrequency
                Child = (Individual)Parent1.Clone();
                if (Rnd.NextDouble() <= CrossoverFrequency)
                {
                    // when doing a crossover, first copy the tour from parent 1, then starting from COPoint fill in the remaining spots in the order of parent 2
                    int COPoint = Rnd.Next(Instance.n);
                    List<int> VisitedCities = new List<int>();
                    for (int j = 0; j < COPoint; j++)
                    {
                        VisitedCities.Add(Parent1.Tour[j]);
                    }
                    int Pos = COPoint;
                    for (int j = 0; j < Instance.n; j++)
                    {
                        if (!VisitedCities.Contains(Parent2.Tour[j]))
                        {
                            Child.Tour[Pos++] = Parent2.Tour[j];
                        }
                    }
                }

                for (int j = 0; j < Instance.n; j++)
                {
                    // for each position of the tour, do a mutation with chance MutationFrequency
                    if (Rnd.NextDouble() <= MutationFrequency)
                    {
                        // when mutating, switch 2 cities
                        int End = Rnd.Next(Instance.n);
                        int Temp = Population[i].Tour[j];
                        Population[i].Tour[j] = Population[i].Tour[End];
                        Population[i].Tour[End] = Temp;
                    }
                }
                NewPopulation[i] = Child;
            }
            Population = NewPopulation;
        }

        public int CalculateFitness(Individual ind)
        {
            // sum over entire tour and return costs
            int Costs = 0;
            for (int i = 0; i < ind.Tour.Length; i++)
            {
                if (i == ind.Tour.Length - 1)
                {
                    if (Instance.Connections[ind.Tour[i]][ind.Tour[0]] == int.MaxValue)
                        return int.MaxValue;
                    Costs += Instance.Connections[ind.Tour[i]][ind.Tour[0]];
                }
                else
                {
                    if (Instance.Connections[ind.Tour[i]][ind.Tour[i + 1]] == int.MaxValue)
                        return int.MaxValue;
                    Costs += Instance.Connections[ind.Tour[i]][ind.Tour[i + 1]];
                }
            }
            return Costs;
        }
    }
}

Play with it, write me in the comments for what you use genetic algorithms and whether you can find better parameters / improvements of the algorithm for the TSP problem, I am very interested in that!

Friday, March 11, 2016

Randomly Browse through the Internet with C#

In this post I want to show how one can surf through the Internet via C# by following random links. This is not only very exciting by itself (pretty interesting to see where one ends up after a couple links), but also has practical applications: For example Google's PageRank algorithm for rating the popularity of websites uses a similar model.

The following C# program contains a webbrowser control and a button. When the user clicks the button, the program searches the current website for a random link and displays the target website in the webbrowser.
The code should be relatively self-explanatory: The class Browser covers the searching for new links. It saves the current page as well as the current page source code. If the method GoNext() is called, this calls FindLink() to find a random link. For this a random starting position in the source code of the current page (the source code is obtained via a Webclient) is chosen and then the first link after it chosen. I find this method to be more efficient than to scan through the whole document first and then choose a random link. But watch out: This way certain links are preferably chosen since we do not directly work with the probabilities of links anymore! We now work with a probability distribution over strings, and since the links are probably not uniformly distributed over the source code (for example in the beginning there is a big header etc.) our selection has a certain bias.
When we found a link we use the method from the previous post to convert relative links to absolute ones, if necessary, and follow it.
This program works but is still relative basic, for example it can run in dead ends etc., also as previously noted, the link selection is not totally random. I post it here in this form because I think that for an application it wil be customized by the user anyway, and the applications differ heavily.

So have fun trying this out and leave me interesting link chains in the comments!

The code:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Net;

namespace RandomSurfer
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        Browser B;

        private void Form1_Load(object sender, EventArgs e)
        {
            // start with some arbitrary page, here a random article on Wikipedia
            string StartPage = "https://de.wikipedia.org/wiki/Spezial:Zuf%C3%A4llige_Seite";
            B = new Browser(StartPage);
            webBrowser1.Navigate(StartPage);
        }

        private void button1_Click(object sender, EventArgs e)
        {
            webBrowser1.Navigate(B.GoNext());
        }
    }

    public class Browser
    {
        string CurrentPage; // stores url current website
        string CurrentContent; // stores content of current website

        public Browser(string Start)
        {
            CurrentPage = Start;
            CurrentContent = GetText(CurrentPage);
        }

        private string GetText(string url)
        {
            // use a webclient to download the source code of a website
            WebClient Webclient1 = new WebClient();
         
            Webclient1.Encoding = System.Text.Encoding.UTF8;
            string Content = ((Webclient1.DownloadString(url)));
            return (Content);
        }

        public string GoNext()
        {
            // randomly go to a new website
            CurrentPage = FindLink(); // for this find a random link
            CurrentContent = GetText(CurrentPage); // and for the next seach store the source code of the current webpage
            return CurrentPage;
        }

        private string FindLink()
        {
            // find a random link
            int Length = CurrentContent.Length;
            Random Rnd = new Random();
            int RndStart;
            int LinkStart = -1;
            int LinkEnd;
            string Link = "";

            // select a random starting point in the source code and find the first link after that
            // repeat if none find
            while (LinkStart == -1)
            {
                RndStart = Rnd.Next(Length);
                LinkStart = CurrentContent.IndexOf("a href=\"", RndStart);
            }

            // extract the link
            LinkEnd = CurrentContent.IndexOf("\"", LinkStart + 8);
            Link = CurrentContent.Substring(LinkStart + 8, LinkEnd - LinkStart - 8);

            // resolve its global url
            System.Uri Base = new System.Uri(CurrentPage);
            System.Uri ResolvedAbsoluteURL = new System.Uri(Base, Link);
            return ResolvedAbsoluteURL.ToString();
        }
    }
}

Monday, March 7, 2016

Resolve Relative URL with C#

Recently I wanted to browse a Webpage for links and then follow them with C#. During this of course also relative links were found and I found out that it is not that easy to in the Internet find a way to resolve these with C#. First I thought about writing my own function, but then I found a very easy method for this which I want to share here.
Let us begin with something about links in the Internet: In HTML one can refer to another page, thus create a link, as follows:

<a href="http://www.sudokusoftheday.blogspot.de/">Linktext</a>

In the above link I gave an absolute URL as target, which can be seen by http://www.
But I also can refer to pages relative to my current page, for example:

<a href="../../p/youtube-channel.html">Linktext</a>

This link calls the page http://csharp-tricks-en.blogspot.de/p/youtube-channel.html . Since this post is located in the virtual folder "/2016/03/", we browse two folders upwards towards the root URL via "../../", and from there call the page /p/youtube-channel.html.

When absolute links are encountered, this is no problem, we can simply follow them. But if we browse one page and want to follow a local link, this is a bit more complicated, also since all other known path expressions are allowed, like "../" seen above.

To not have to build these paths manually together one can use the class  System.Uri in the form System.Uri RelativeURL = new System.Uri(BaseUri, "relative Path");
As an example let us consider the Wikipedia article about the "Pronghorn". In this there is a relative link to the Wikipedia article about "Deer":

<a href="/wiki/Deer" title="Deer">deer</a>

To get the valid absolute link we execute the following code:
System.Uri Base = new System.Uri("https://en.wikipedia.org/wiki/Pronghorn");
System.Uri ResolvedAbsoluteURL = new System.Uri(Base, "/wiki/Deer");

Wednesday, September 30, 2015

Sudoku App for Android

Today some offtopic-theme: Next to my Sudoku page I now wrote an Android app, with which the Sudokus can be solved easily and conveniently on the smartphone. I would be happy, if some checked it out, or even got interested themselves in app programming. I wrote this app in Java, about the creation of Android apps with C# I wrote an introduction on this blog.
The app can be downloaded for free via Google Play Store.

Saturday, September 19, 2015

Matlab Tutorial Part 5 - Plots

In today's post I want to give a short introduction into the creation of plots with Matlab. It will only be a short summary, for more details I refer to the documentation of Matlab.  There all kinds of plots are listed with parameters, it is explained for example how to label axis, change colors etc.
Here I want to describe 2 kinds of 2 - dimensional plots, namely line and bar plots, as well as the 3 -dimensional surface plot.
We create a line plot with the function plot(). For this we first create some test data via x = rand(8, 1), a new vector with 8 random values between 0 and 1.
Then plot(x) gives us the following result:






















bar() creates a bar chart from the given data. bar(x) looks as follows:






















Now to a 3 - dimensional plot, the surface plot. This displays (for example) an inputted 2 -dimensional matrix in the third dimension (through height and color). On the point with the coordinates (i, j) thus the (i, j)th - entry of the matrix is displayed, meaning its value is represented by the z - coordinate and additionally colored corrrespondingly.
As an example via A = rand(32) we create a random 32 x 32 matrix. surf(A) then creates the following diagram:



Friday, September 18, 2015

Matlab Tutorial Part 4 - Scripts and Functions

In Matlab we have the possibility to create scripts and functions containing often used code, helping us to not having to type it frequently.
The scripts and functions are saved in files with the ending .m. Scripts are simply collections of lines of code, whereas functions have some return value.
We can edit these files conveniently out of Matlab: For that we edit the command edit Name.m and confirm. Then an editor opens with this file.
As an example we create the script MyScript.m with the following content:

A = rand(100);
d = det(A)

So a random 100 x 100 matrix A is created and the determinant d calculated. In Matlab we can call this script by entering the name MyScript. Then the script is executed and the variable d outputted. What is outputted we can control via the character ";". If we end one line with a semicolon, it is executed without output. If we omit it, the result of the operation is outputted. So if we remove the semicolon in the first line of the script, the complete matrix is displayed as well.

As a second example we now create a function which calculates and returns the average over the given values. For this we enter edit MyFunction.m.
As content we input:

function [y] = MyFunction(x)
y = sum(x);
y = y / numel(x);
end

As one can see a function declaration starts with the keyword function and ends with end. In square brackets we first list the return values - here only y is returned, multiple parameters are simply separated by comma. After the equal sign we write the name of the function (which should coincide with the file name) as well as the input parameters in round brackets - here only x.
In the function we then sum over x (thus we expect a vector) and then divide this by the number of elements in x. The result is then returned by the function in y. In Matlab we then can call the function and use the return value as follows: test = MyFunction([1,2,3,])
(Note: In Matlab of course the average value can be calculated easier - for example via the predefined function mean(). I only chose this example for explanation.)

Thursday, September 17, 2015

Matlab Tutorial Part 3 - Solve a System of Linear Equations

In today's part I want to show how to solve a system of linear equations. A system of linear equations has the form A * x = b, where A is the matrix representing the coefficients and b the right side of the equation system. We want to solve for x.
For this we simpy have to input A and x in Matlab and then call the command

x = linsolve(A, b)

One example:
We want to solve

x1   + x2     + x3   = 5
2x1             + 3x3 = 2
          10x2 - x3    = 1

Thus in Matlab we enter the matrix of coefficients (A = [1, 1, 1; 2, 0, 3; 0, 10, -1]) as well as the vector b (b = [5;2;1]).
After entering x = linsolve(A, b) Matlab returns the correct answer

x =

   15.6250
   -0.8750
   -9.7500