The Wall: 2012

středa 26. prosince 2012

Playing with Google Drive SDK and JavaScript

I am just starting to use the Google Drive SDK for one of my personal projects. The front-end of the application is written entirely in JavaScript. I am in the process of integrating Google Drive and it took me some time to get throw the API reference and get it to work, here are some useful code snippets. I have started with the JavaScript Quickstart available at the google sdk page and I have added couple useful methods:

The Google Drive API is a standard RESTful API. You can access the functionalities only by issuing HTTP requests, so you do not need any special SDK. However the requests have to be signed. OAuth protocol is used to secure the communication. Google provides a SDK for many languages, JavaScript being one of them. Using this SDK facilitates the creation of HTTP requests. The API provides a good compromise between the simplicity and flexibility.

The OAuth handshake

Every request has to be signed using OAuth token. The application has to first perform the OAuth handshake to obtain the token. JavaScript SDK provides gapi.auth.authorize function which can be used. This function takes the necessary parameters (OAuth client ID and the scope) and also the callback which will be executed when the handshake is over.

function checkAuth() {
 gapi.auth.authorize(
  {'client_id': CLIENT_ID, 'scope': SCOPES, 'immediate': true},
  handleAuthResult);
}

function handleAuthResult(authResult) {
 if (authResult && !authResult.error) {
  //Auth OK
 }
}

Once the client is authenticated, the SDK stores the token internally and adds it to any new request, created before the web page is closed.

Composing the Google Drive requests

Any simple request can be created with gapi.client.request function. The SDK will create a HTTP request using the supplied information. The method takes in JavaScript object. I have found that I am using mostly 4 fields in this object:

path – the url of the request
method – http method of the request (get/post/put/delete)
params – any information passed here will be added to the request as URL parameter.
headers – any information passed here will be added to the header of the HTTP request.
body – the body of the request. Usually posted JSON.

Getting first 10 items from the drive

function getItems() {
 var request = gapi.client.request({
  'path': 'drive/v2/files',
  'method': 'GET',
  'params': {
   'maxResults': '10'
  }
 });
 request.execute(listItems);
}

function listItems(resp) {
 var result = resp.items;
 var i = 0;
 for (i = 0; i < result.length; i++) {
  console.log(result[i].title);
 }
}

Creating a folder

function createFolder(folderName) {
 var request = gapi.client.request({
  'path': 'drive/v2/files',
  'method': 'POST',
  'body': {
   'title': folderName,
   'mimeType': 'application/vnd.google-apps.folder'
  }
 });
 request.execute(printout);
}

function printout(result) {
 console.log(result);
}

In this request, nothing is passed as parameter in the URL. Instead of that JSON object containing two fields (title and mimeType) is passed in the body of the request.

Searching for folders

function getAllFolders(folderName) {
 var request = gapi.client.request({
  'path': 'drive/v2/files',
  'method': 'GET',
  'params': {
   'maxResults': '20',
   'q':"mimeType = 'application/vnd.google-apps.folder' and title contains '" + folderName + "'"
  }
 });

 request.execute(listItems);
}

You can get more information about searching the google drive here.

úterý 23. října 2012

GIT stops after unpacking objects or resolving deltas

Clone, pull and push all seem to finish fine, but they always stop at "Resolving deltas" or "Unpacking objects". The root of this issue is the fact that MSGIT uses curl, which does not know how to talk to NTLM authenticated proxy. There are two solutions which worked for me, first one not being stable.

Solution 1: unstable, easiest

This *usually* works:

git clone -v https://myrepo.git
Cloning into 'myrepo-1'...
Password for 'https://user@myrepo.com':
POST git-upload-pack (174 bytes)
remote: Counting objects: 2287, done.
remote: Compressing objects: 100% (1276/1276), done.
remote: Total 2287 (delta 1340), reused 1692 (delta 883)
Receiving objects: 100% (2287/2287), 53.37 MiB | 632 KiB/s, done.
Resolving deltas: 100% (1340/1340), done.

Or another when, when pulling from a repo:

remote: Counting objects: 53, done.
remote: Compressing objects: 100% (28/28), done.
remote: Total 28 (delta 23), reused 2 (delta 0)
Unpacking objects: 100% (28/28), done.

The problem is that nothing is updated on the disk. I have tried using -v in order to obtain more information about what is going on, but I never get more than shown above. I am using Git over HTTPS with the HTTPS proxy well configured - so this should not be an issue.

After un-finished clone, I have noticed, that the objects directory inside the .git directory actually changed it's size quite a lot. So it seems, that the objects are well received but never written to the actual branch. That might sound like a user-rights issue. But I am admin on the machine and I have access write rights on the folder. So for now, I don't know the cause of this problem. I have simple solution to overcome this issue. What I do to fix the repository is quite straightforward:

$ git fsck
Checking object directories: 100% (256/256), done.
Checking objects: 100% (2187/2187), done.
dangling commit da575f887db63ccf97f37f9cf96316307398db82

Git fsck always founds one dangling commit. It is a utility which checks the integrity of the object database. Here the dangling commit is a commit which is not used by any branch. To fix the issue I can just merge the commit with the current branch.

$git merge da575f887db63ccf97f37f9cf96316307398db82

And everything works just fine...

Solution 2: stable solution

This needs sone more effort. The solution is to use the CNTLM as a proxy between the NTLM proxy and any application which needs to use it. The solution is exelently described on this blog.

Solution 3: not confirmed

I was not able to confirm this one, but at this thread on StackOverflow someone mentioned to use verify-pack function of git. Verify-pack verifies that .pack files, which are packed objects used by git during the transfer. If the files were transfered correctly you should be able to restore the commits contained inside the pack files. This actually did not work on my post, since the pack file was corrupted.

úterý 2. října 2012

Introduction to Fakes and migration from Moles

Fakes is a new test isolation framework from Microsoft. It is inspired by and resembles to Moles a framework which I have described in one of my previous blog posts. In this post I will briefly describe Fakes and than show the steps which have to be taken when migrating from Moles. You will find that the migration itself is not that complicated. Besides some changes in the structure of the project only few changes are needed in the code.

Code example related to this post are available at this GitHub repository.

Fakes framework contains two constructs which can be used to isolate code:

Stubs – should be used to implement interfaces and stub the behavior of public methods.
Shims – allow mocking the behavior of ANY method, including static and private methods inside .NET assemblies.

Stubs are generated classes. For each public method in the original interface a delegate is create which holds the action that should be executed while invoking the original method. In the case of Shims, such delegate is generated for all methods, even the private and static ones.

When using Stubs, you provide mocked implementation of any interface to the class or method which you are testing. This is done transparently before compilation and no changes are made to the code after the compilation. On the other hand Shims use the technic called IL weaving (injection of MSIL assembly at runtime). This way the code which should be executed is replaced at runtime by the code provided in the delegate.

The framework has caused some interesting discussions across the web. The pluralsight blog has found some negative points Rich Czyzewski describes how noninvasive tests can be create while using Fakes. And finally David Adsit nicely summarizes the benefits and the possible usage of the Fakes. From what has been said on the net, here is a simple summary of negative and positive points.

Pros

Lightweight framework, since all the power of this framework is based only on generated code and delegates.
Allows stubbing of any method (including private and static methods).
No complicated syntax. Just set the expected behavior to the delegate and you are ready to go.
Great tool when working with legacy code.

Cons

The test are based od generated code. That is to say, a phase of code generation is necessary to create the “stubs”.
No mocking is built-in into the framework. There is no built-in way to test whether a method has been called. This however can be achieved by manually adding specific code inside the stubbed method.

Migrating from Moles to Fakes

First a small warning, A bug was apparently introduced by the Code Contracts team, which causes a crash when building a solution which uses Fakes. You will need to install the latest version of Code Contracts. (If you do not know or use Code Contracts, you should not be impacted).

If you have already used Moles before, you might be wondering, how much code changes will the migration need. To give you a simple idea, I have migrated the code from my previous post about Moles in order to use Fakes. Two major steps have to be taken during the migration:

Change the structure of the project and generate new stubs
Rewrite the unit tests to use newly generated classes

To prepare the solution we have to remove all the references to Moles as well as the .moles files which were previously used during the code generation by the Moles framework. Next step is the generations of new stubs using Fakes framework. This is as simple as it has been before. Open the references window and right click on the DLL for which you want to generate the stubs. Than you should be able to select "Add Fakes Assembly” from the menu.

Following images show the difference between the old and new project structure (also note that I was using VS 2010 with Moles and I am now using VS 2012 with Fakes).

------------------>

The next step are the code changes.

Rewriting code using Shims

Here is a classical example of testing a method which depends on DataTime.Now value. The first snippet is isolated using Moles and the second contains the same test using Fakes:

[TestMethod]
[HostType("Moles")]
public void GetMessage()
{
  MDateTime.NowGet = () =>
  {
    return new DateTime(1, 1, 1);
  };
  string result = Utils.GetMessage();
  Assert.AreEqual(result, "Happy New Year!");
}

[TestMethod]
public void GetMessage()
{
 using (ShimsContext.Create())
 {
   System.Fakes.ShimDateTime.NowGet = () =>
   {
    return new DateTime(1, 1, 1);
   };
   string result = Utils.GetMessage();
   Assert.AreEqual(result, "Happy New Year!");
 }
}

The main differences:

Methods using Shims, do not need the HostType annotation previously needed by Moles.
On the other hand a ShimsContext has to be created and later disposed when the stubbing is not needed any more. The using directive provides a nice way to dispose the context right after its usage and marks the code block in which the system has “stubbed” behavior.
Only small changes are needed due to different names generated classes.

Rewriting code which is using only Stubs

Here the situation is even easier. Besides the changes in the naming of the generated classes, no additional changes are needed to migrate the solution. The following snippet test “MakeTransfer” method, which takes two accounts as parameters.

The service class containing the method needs Operations and Accounts repositories to be specified in the constructor. The behavior of these repositories is stubbed. This is might be typical business layer code of any CRUD application. First let’s see the example using Moles.

[TestMethod]
public void TestMakeTransfer()
{
 var operationsList = new List<Operation>();

 SIOperationRepository opRepository = new SIOperationRepository();
 opRepository.CreateOperationOperation = (x) =>
 {
  operationsList.Add(x);
 };

 SIAccountRepository acRepository = new SIAccountRepository();
 acRepository.UpdateAccountAccount = (x) =>
 {
  var acc1 = _accounts.SingleOrDefault(y => y.Id == x.Id);
  acc1.Operations = x.Operations;
 };

 AccountService service = new AccountService(acRepository, opRepository);
 service.MakeTransfer(_accounts[1], _accounts[0], 200);
 Assert.AreEqual(_accounts[1].Balance, 200);
 Assert.AreEqual(_accounts[0].Balance, 100);
 Assert.AreEqual(operationsList.Count, 2);
 Assert.AreEqual(_accounts[1].Operations.Count, 2);
 Assert.AreEqual(_accounts[0].Operations.Count, 3);
}

Note the way the repository methods are stubbed. Due to the fact that the stubs affect the globally defined variables (the list of operations, and the list of accounts) we can make assertions on these variables. This way we can achieve “mocking” and be sure that the CreateOperation method and the UpdateAccount method of Operation and Account repository have been executed. The operationsList variable in this example acts like a repository and we can easily Assert to see if the values have been changed in this list.

Let’s see the same example using Fakes:

[TestMethod]
public void TestMakeTransfer()
{
 var operationsList = new List<Operation>();

 StubIOperationRepository opRepository = new StubIOperationRepository();
 opRepository.CreateOperationOperation = (x) =>
 {
  operationsList.Add(x);
 };

 StubIAccountRepository acRepository = new StubIAccountRepository();
 acRepository.UpdateAccountAccount = (x) =>
 {
  var acc1 = _accounts.SingleOrDefault(y => y.Id == x.Id);
  acc1.Operations = x.Operations;
 };

 AccountService service = new AccountService(acRepository, opRepository);
 service.MakeTransfer(_accounts[1], _accounts[0], 200);
 //the asserts here....
}

You can see, that the code is almost identical. The only difference is in the prefix given to the stubs (SIAccountRepository becomes StubIAccountRepository). I am almost wondering if MS could not just keep the old names, than we would just need to change the using directive…

Fakes & Pex

One of the advantages of Moles compared to other isolation frameworks, was the fact that it was supported by Pex. When Pex explores the code, it enters deep into any isolation framework which is used. Since Moles is based purely on delegates, Pex is able to dive into the delegates and generated tests according the content inside the delegates. When using another isolation framework, Pex will try to enter the isolation framework itself, and thus will not be able to generate valid tests.

So now, when Fakes are here as replacement of Moles, the question is whether we will be able to use Pex with Fakes? Right now it is not possible. Pex add-on for Visual Studio 11 does not (yet) exists and I have no idea whether it will ever exists.

I guess Pex & Moles were not that adopted by the community. On the other hand both were good tools and found their users. Personally I would be glad if MS continued the investment into Pex and automated unit testing though I will not necessarily use it everyday in my professional projects. On the other hand I would always consider it as an option when starting new project.

neděle 2. září 2012

Reflection & Order of discovered properties

In .NET environement Reflection provides several methods to obtain information about any type from the type system. One of these methods is GetProperties method which retrieves a list of all the properties of a given type. This method returns an array of PropertyInfo objects.

PropertyInfo[] propListInfo = type.GetProperties();

In most cases you don't care, but the order of the properties does not have to be the same if you run this method several times. This is well described in the documentation of this method. Microsoft also states, that your code should not be depending on the order of the properties obtained.

I had a very nice example of a bug resulting from the misuse of this method. A ObjectComparer class, which is dedicated to comparison of two objects of same type by recursively comparing it's properties, which I have inherited as legacy code on my current Silverlight project.

I have noticed, that the results of the comparison are not the same every time I run the comparison. Concretely the first time the comparison was run on two same objects it always told me, that the objects are not equal. Take a look at the problematic code, which I have simplified a bit for this post:

private static bool CompareObjects(object initialObj, object currentObj, IList<String> filter)
{
 string returnMessage = string.Empty;

 Type type = initialObj.GetType();
 Type type2 = currentObj.GetType();

 PropertyInfo[] propListInfo = type.GetProperties(BindingFlags.GetProperty | BindingFlags.Public | BindingFlags.Instance).Where(x => !filter.Contains(x.Name)).ToArray();
 PropertyInfo[] propListInfo1 = type2.GetProperties(BindingFlags.GetProperty | BindingFlags.Public | BindingFlags.Instance).Where(x => !filter.Contains(x.Name)).ToArray();

 //if class type is native i.e. string, int, boolean, etc.
 if (type.IsSealed == true && type.IsGenericType == false)
 {
  if (!initialObj.Equals(currentObj))
  {
   return false;
  }
 }
 else //class type is object
 {
  //loop through each property of object
  for (int count = 0; count < propListInfo.Length; count++){
   var result = CompareValues(propListInfo[count].GetValue(initialObj),propListInfo2[count].GetValue(currentObj));
   if(result == false) {
    return result;
   }
  }
 }
 return true;
}

So in order to correct this code, you will have to order both arrays by MetadataToken, which is a unique identifier of each property.

propListInfo = propListInfo.OrderBy(x=>x.MetadataToken).ToArray();
propListInfo1 = propListInfo1.OrderBy(x=>x.MetadataToken).ToArray();

There is some more information about how reflection works in this blog post. The issue is that the Reflection engine holds a "cache" for each type, in which it stocks the already "discovered" properties. The problem is that, this cache is cleared during garbage collection. When we ask for the properties, than they are served from the cache in the order in which they have been discovered.

However in my case, this information does not help. The issue occures only the first time that I ask the ObjectComparator to compare the objects and there is no reason that there should be any garbage collection between the first and second run...well no idea here. Sorting by MetadataToken has fixed the issue for me.

středa 27. června 2012

Programming languages for the age of Cloud

This posts talks about the aspects which are influencing computer languages these days. We are in the age when the sequential execution is over. Even your laptop has a processor with several cores. The cloud provides us with tons of machines whic we can use to run our code on. We are in the age of distribution, parallelization, asynchronous programming and concurrency. As developers we have to deal with the chalenges which arise from this new environement. Computer language scientists have worked on the subject since the seventies. Nowadays concepts which have been studied for long time, influence the mainstream languages. This post describes how. The motivation for this post was this panel discussion at the last's year Lang.NEXT conference, where one of the greatest language architects of these days discuss what the ideal computer language should look like (Anders Hejlsberg - C#, Martin Odersky - Scala, Gilad Bracha - Newspeak, Java, Dart and Peter Alvaro).

Web and Cloud programming

"Web and cloud programming" was the title of the mentioned panel discussion. Cloud programming is quite noncommittal term. What do we mean by "cloud programming"? Is it programming on the cloud (with the IDE in the cloud)? or programming applications for the cloud (but that can be just any web app right)? It turns out this is just a term to depict the programming in distributed environment.

Programming in distributed environment

Programming in distributed environment is much better term - but again it might not be completely clear. The majority of todays applications is not sequential anymore. The code flow of the program is parallel, asynchronous and the program has to react to external events. The code and the application itself is distributed. It might be distributed over several cores, or nodes or it might be just server side - client side code separation. You can have code running on the backend, some another bits (maybe in different language) running on the front, some code is waiting for response from a web service and some other code is waiting for the response of the user on the client side. You as a developer have to handle the synchronization.

We might actually say that todays web programming is distributed and asynchronous. As developers, we have to make the switch from the traditional sequential programming to the distributed and asynchronous code. The advent of cloud computing is forcing this transition.

Non-sequential, parallel or asynchronous code is hard to write, hard to debug and even harder to maintain. Write asynchronous code is challenging, however write asynchronous application in a transparent maintainable manner might feel impossible. Just think about the global variables which you have to create to hold the information about ‘current situation’, so that when a response from a web service arrives you are able to decide and take the right actions. It is this maintance of the global state which is particulary dificult in asynchronous programming.

What are the tools which will help us with the transition to distruted asynchronous or parallel coding?

Here is a list of 3 tools which I think might be helpful:

Conteptual models - As developers we can follow some conceptual model - for instance the actors model in order to organize and architecture the program.
Libraries - To implement one of the models (or design patterns) we can use tested and well document code – for instance Akka
Computer languages - The biggest helper will be on the lowest level - the computer language itself.

Models, Libraries and languages

Libraries are and will be the principal tools to make developers life easier. There are several libraries available to help with the asynchronous, event-driven programming for many different languages. Akka, Node.JS or SignalR are just examples. But libraries themselves are build using languages. So the question is: What can languages bring to help make the life easier for developers in the age of cloud and distribution?

Modern languages have two characteristics:

They are build on powerful type systems: dynamic or static ones.
They borrow from different programming language paradigms.
Lately have been strongly influenced by functional programming.

Benefits of functional languages

Functional languages might be one of the helpers in the age of distributed computing. Some imperative languages are introducing functional aspects (for instance C# has been heading that way since long time), another ones designed from scratch are much more closer to "pure" functional style (Scala, F#, or the puriest - Haskel). Let's first define some terms and possible benefits of functional programming. From my point of view (and I admit quit simplified point of view) there are four aspects of functional programming that are useful in everyday coding.

Elimination of the "global state" – the result of method is guaranteed no matter what the actual state is.
The ability to treat functions as first class citizens.
Presence of immutable distributable structures - mainly collections.
Lazy evaluation – since there is no global state, we can postpone the execution and evaluation of methods till the time the results are needed.

Eliminating the state

It's hard to keep shared state in parallel systems written in imperative languages. Once we leave the save sequential execution of the language, we are never sure what are the values in the actual state. Callbacks might be executed about any time depending for example on network connection and the values in the main state could have changed a lot from the time of the "expected execution". Purely functional programming eliminates the outer "state of the system". The state has to be passed always locally. If we imagine such a language all the methods defined would need and additional parameter in the signature to pass the state.

int computeTheTaxes (List<income> incomes, StateOfTheWorld state);

That is really hard to imagine. So as it has been said in the discussion: pure functional programming is a lie. However we can keep this idea and apply it to some programming concerns. For instance the immutability of the collections might be seen as application of “no current state" paradigm.

Since the “current state” does not exists, the result of a function invoked with the same arguments should be always the same. This property is called “Referential transparency” and enables the lazy evaluation. So the elimination of the global state might be seen as the pre-condition for using other functional language features such as lazy evaluation

Function as a first class citizen

Another aspect of functional programming is the fact that functions become first class citizens. That means, that they can be passed as arguments to other functions (these are called higher order functions). This is extremely powerful concept and you can do a lot with it. Functions can be also composed and the functional compositions applied to values. So instead of applying a several consecutive functions on a collection of values, we can compose the resulting function and apply it at once. in C# the LINQ uses a form of functional composition, which will be discussed later.

Lambdas and Closures

Lambdas are anonymous functions - defined on the fly. Closure add the ability to capture the variables defined around the definition of the function. C# has closures and lambdas since the version 3.0, they should finally arrive to Java in the version 8. Talking about JavaScript, it just seems that they have always been there. Anytime you define an anonymous function directly in the code you know you can use the variables from the current scope in your function. Hence you are creating a closure.

var a = 10;
var input = 5
request(input, function(x) { 
   a = x;
});

Any time you use the variable from outer scope, in the inner anonymous function, we say that the variable is captured

Closures are also available in Python and since the version 11 they are even available in C++. Let's stop for a while here, because C++ adds the ability to distinguish between variables captured by reference and variables captured by value. The syntax for lambdas in C++ is a bit more complicated, but allows the developers for each variable to specify how it should be captured. In this folowing code the v1 variable is captured by value and all the variables are captured as references. So the value of v2 will depend on what happened before the lambda actually executed.

int v1 = 10;
inv v2 = 5;
for_each( v.begin(), v.end(), [=v1,&] (int val)
{
    cout << val + v2 - v1;
});

You can see, that even such and old school imperative language like C++ has been influenced and modified to embrass functional programming.

Closures add the ability to use the current state from the moment of the definition of the anonymous function, as the current state used during the functional execution.

Collections in functional programming

In functional programming languages (the pure ones), collections are immutable. Instead of modification a copy of the collection is returned on each operation which is performed on the collection. It is up to the designers of the language to force the compiler to reuse the maximum of the existing collection in order to lower the memory consumption. This allows the programmer to write the computation as a series of transformations overt the collections. Moreover thanks to lambdas and closures, these transformations may be defined on the fly. Here is a short example:

cars.Where(x=>x.Mark == ‘Citroen’).Select(x=>x.MaxSpeed);

This transformation will return an iterator of speeds of all Citroens in the collection. Here I am using C#/F# syntax, however almost the same code would compile in Scala.

The selector (“Where”) and the mapper (“Select”) both take as argument a function which takes an item of the collection. In the case of the selector the function is a predicate which returns “true” or “false” in case of the mapper, the function just returns a new object. Thanks to lambdas we can define both of them on the fly.

Language integrated data queries

Lazy loading also comes from functional languages. Since the result of the function does not depend on the "state of the world" it does not matter when we execute any given statement or computation. The designers of C# inspired themselves while creating LINQ. LINQ just enables the translation of the above presented chain of transformations to another domain specific language. Since the lazy loading is used, each computation is not performed separately, but rather a form of “functional composition” is used and the result is computed once it is needed. If the ‘cars’ collection would an abstraction for relational database table, the result would be translated into “select maxSpeed from cars where mark=’Citroen’. Instead of two queries (on for each function call).

Inside LINQ translates the C# query (the dotted pipeline of methods) into an expression tree. The tree is then analyzed (parsed) into domain specific language (such as SQL). Expression trees are a way to represent code (computation) as data in C#. So in order to develop and integrate the LINQ magic into the language, the language needs to support function as first class citizen and also has to be able to treat code as data.

Maybe as I wrote it, you are thinking about JavaScript and you are right. In JavaScript you can pass around functions and you can also pass around code and later execute it (the eval function). No wonder that there are several implementations of LINQ for JavaScript.

Similar concept inspired some Scala developers and since Scala posseses the necessary language features, we might see similar concepts in Scala also (ScalaQL).

Dynamic or Typed languages

What are the benefits of Dynamic or Strongly typed language? Let's look for the answer in everyday coding: Coding in dynamic language is at least at the beginning much faster than in a typed language. There is no need to declare the structure of the object before using it. No need to declare the type of the simple values nor objects. The type is just determined on the first assignment.

What works great for small programs might get problematic for larger ones. The biggest advantage of typed system in the language is the fact, that it is safer. It won't let you assign apples to oranges. It eliminates much of the errors such as looking for non-existing method of type.

The other advantage is the tooling which comes with the language. Auto-completion (code completion based on the knowledge of the type system) being example of one such a tool. The editor is able to propose you the correct types, methods, or properties. The types structure is used for analysis or later processing. For instance documentation might be easily generated from the type system just by adding certain metadata.

Several languages offer compromises between the strongly typed (safe) and dynamic (flexible) world. In C# we can use the dynamic keyword to postpone the determination of the object type to runtime. DART offers optional type system. Optional type systems let us use the tooling, without polluting our lifes with too much typing exercises. This comes handy sometimes.

JavaScript as the omnipresent language

JavaScript is everywhere and lately (with Node.JS and MS investing heavily into it) drawing more and more attention. The language has some nice features: it treats the functions as first class citizens, supports closures, it is dynamic, but one big drawback: It absolutely lacks any structure. Well it's not typed language, so we cannot expect any structure in the type system, but it also lacks any modularization.

Objects are defined as functions or 'just on the fly'. And there is always this giant current state which holds all the variables and everything, which get's propagated everywhere. I still did not learn how to write good structured JS programs. And there are to many concepts of JavaScript which I did not understand completely. As it has been said in the discussions: you can probably write big programs in JavaScript, but you cannot maintain them. That's why Google is working on DART. The future version of ECMAScript will try to solve the problems of JavaScript by bringing modular systems,classes, static typing. But the big questions will be of course the adoption by the browsers.

Summary

Libraries will be always the core pieces to enable writing distributed software
Language should be designed in a way to minimize the state and control 'purity' – functional languages are well studied and concepts coming from functional languages will become omnipresent in everyday programming.
Type systems should be there one we need them and should get out of our way when we don’t.

The future might be interesting. Lately I have been forced to write a lot of Java code and interact with some legacy code(<1.5) and besides the typing exercise it does not provide me any benefits. It just bores me. Well I am a fun of C#, because the authors of C# seem to search interesting features from other languages or concepts (closures, expression trees, dynamic typing, or later incorporating the asynchronous model directly to the language) and introduce them to the well established static typed, compiled (and for me well known) world.

But whether it is Scala, Python, Dart, JavaScript or C#/F# - I thing we should be trying to adopt modern languages as fast as possible and that for just one reason: to express more with less code.

středa 2. května 2012

Mocking the generic repository

This post describe one way to mock the generic repository. It assumes that you are familiar with the Service <-> Repository <-> Database architecture.
Another pre-requisity is the knowledge of the repository pattern and it's generic variant.

In the majority of my projects I am using the following generic repository class.

public interface IRepository
{
 T Load<T>(object id);
 T Get<T>(object id);
 IEnumerable<T> Find<T>(Expression<Func<T, bool>> matchingCriteria);
 IEnumerable<T> GetAll<T>();
 void Save<T>(T obj);
 void Update<T>(T obj);
 void Delete<T>(T obj);
 void Flush();
 int CountAll<T>();
 void Evict<T>(T obj);
 void Refresh<T>(T obj);
 void Clear();
 void SaveOrUpdate<T>(T obj);
}

Based on this technique, some people decide to implement concrete classes of this interface (CarRepository : IRepository), whereas others decide to keep using the generic implementation. That depends on the ORM that you are using. With EF and NHibernate you can easily implement the generic variant of the repository (check the links).

I am also using the generic variant (mostly with NHibernate). Now the question is: How to mock this generic repository? It can be a bit tricky to mock. When you have one class for each repository which works for one concrete type you can mock the repository quite easily. For example StudentRepository which handles entities of type Student might be backed up a list of students.

While when working with generic repository, it might be a bit harder. Here is how I have solved the problem:

public class MockedRepository :IRepository
{
 public MockedRepository()
 {
  cities = DeserializeList<City>("CityDto");
  stations = DeserializeList<Station>("StationDto");
  tips = DeserializeList<InformationTip>("InformationTipDto");
  countries = DeserializeList<Country>("CountryDto");
  
  dataDictionary = new Dictionary<Type, object>();
  dataDictionary.Add(typeof(City), cities);
  dataDictionary.Add(typeof(Station), stations);
  dataDictionary.Add(typeof(InformationTip), tips);
  dataDictionary.Add(typeof(Country), countries);
  }   

 public T Get<T>(object id)
 {
  Type type = typeof(T);
  var data = dataDictionary[type];
  IEnumerable<T> list = (IEnumerable<T>)data;
  var idProperty = type.GetProperty("Id");
  return list.FirstOrDefault(x=>(int)idProperty.GetValue(x,null) == (int)id);
 }

 public IEnumerable<T> Find<T>(Expression<Func<T, bool>> matchingCriteria)
 {
  Type type = typeof(T);
  var data = dataDictionary[type];
  IEnumerable<T> list = (IEnumerable<T>)data;
  var matchFunction = matchingCriteria.Compile();
  return list.Where(matchFunction);
 }

 public IEnumerable<T> GetAll<T>()
 {
  Type type = typeof(T);
  return (IEnumerable<T>)dataDictionary[type];
 }

 public void Save<T>(T obj)
 {
  Type type = typeof(T);
  List<T> data = (List<T>)dataDictionary[type];
  data.Add(obj);
 }
}

The main building block of this mocked repository is the dictionary which contains for each type in the repository the enumerable collection of objects. Each method in the mocked repository can use this dictionary to determine which is the collection addressed by the call (by using the generic type T.).

Type type = typeof(T);
var data = dataDictionary[type];
IEnumerable<T> list = (IEnumerable<T>)data;

Now what to do next, depends on each method. I have shown here only the methods which I needed to mock, but the other ones should not be harded to mock. The most interesting is the Find method, which takes as the parameter the matching criteria. In order to pass this criteria to the Where method on the collection, this criteria (represented by an Expression) has to be compiled into a predicate Func (in other words function which takes an object of type T and returns boolean value.

The Get also has some hidden complexity. In this implementation I assume, that there is a Id property defined on the object of type T. I am using reflection to obtain the value of that property and the whole thing happens inside the a LINQ statement.

This repository might be useful, but it is definitely not the only way to isolate your database. So the question is - Should this be the method to isolate my Unit or Integration tests? Let's take a look at other possible options:

Use mocking framework (there is quite a choice here)
This essentialy means that in each of your tests you define the behaviour of the repository class. This requires you to write a mock for each repository method that is called inside the service method. So it means more code to write. On the other hand you controll the behaviour needed for the particular tested method. While using mocking framework you have also the option to verify that methods have been caled.
Use the repository implementation and point it to in-memmory database (SQL Lite). That is a good option in the case when:
- You are able to populate the database with the data.
- You are sure of your repository implementation
Use the generic repository mock presented here. That is not a bad option if you have some way to populate the collections which serve as in-memmory database. I have used deserialization from JSON. Another option could be to use a framework such as AutoPoco to generate the data. You can also create one repository which can be used for the whole test suite (or application presentation).

Summary

As said before this might be a variant to consider. I am using it for Proof of Concepts and portable versions of database based applications. On the other hand for unit test you might consider either mocking framework or in-memory database. There is no clear winner in this comparison.

sobota 21. dubna 2012

Common.Logging and compatibility with other libraries

It has been the second time since I have run into the issue of configuring correctly Common.Logging on my project. So what is the problem? Let's start with the basics:

Common.Logging should be a generic interface for logging which can be used by other frameworks and libraries to perform logging. The final user (you or me) uses several frameworks in his final application and if all of these frameworks will use different logging framework it will turn into configuration nightmare.So our favorite frameworks such as Spring.NET, Quartz.NET are using Common.Logging. This interface in turn uses a concrete logging framework to perform the logging (the act of writing the log lines to somewhere).

Typical scenario can be for instance Common.Logging and Log4Net combination. In our application configuration file (web.config or app.config) we have to configure Common.Logging to use the Log4Net and than we can continue with the Log4Net configuration specifying what should be logged.

<common>
<logging>
  <factoryAdapter type="Common.Logging.Log4Net.Log4NetLoggerFactoryAdapter, Common.Logging.Log4Net">
 <arg key="configType" value="INLINE" />
  </factoryAdapter>
</logging>
</common>

<log4net>
<appender name="ConsoleAppender" type="log4net.Appender.ConsoleAppender">
  <layout type="log4net.Layout.PatternLayout">
 <conversionPattern value="%date %-5level %logger - %message%newline"/>
  </layout>
</appender>
</log4net>

My general problem is that Common.Loggin.Log4Net facade is looking for a concrete version of the Log4Net library. Concretely the version: 'log4net (= 1.2.10)'. That is not a problem if you are not using some other framework which depends on higher version of Log4Net.
In my case the le_log4net library (the logentries library) is using log4net 2.0. So if you are using NuGet, you might obtain the following exception while adding the references:

The similar thing might happen if you just decide to use the latest Log4Net by default. Then you might get an exception when initializing Spring.NET context or starting the Quartz.NET scheduler:

Could not load file or assembly 'log4net, Version=1.2.0.30714, Culture=neutral, PublicKeyToken=b32731d11ce58905' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)

Solution 1: Ignore NuGet, define Runtime Binding

One way to get around this is to define runtime assembly binding. But this solution forces you to add the reference to log4net manually. NuGet controls the version and wont let you at references on the fly the way that you would. So to get over add the latest Common.logging.Log4net façade and Log4Net version 2 (which you need for some reason). Than you have to define the assembly binding in the configuration file.

<runtime>
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
  <dependentAssembly>
 <assemblyIdentity name="Common.Logging" publicKeyToken="af08829b84f0328e"/>
 <bindingRedirect oldVersion="1.2.0.0" newVersion="2.0.0.0"/>
  </dependentAssembly>
</assemblyBinding>
</runtime>

Solution 2: Just use the older version of Log4Net (1.2.10)

If you do not have libraries that are dependent on Log4Net version 2.0.0 than just remember to always use log4net 1.2.10. This is the version which Common.Logging.Log4Net is looking for. Or just let NuGet manage it for you. You can add Common.Logging.Log4Net via NuGet and it will automatically load the correct version of Log4Net.

Solution 3: Try other logging library for instance NLog

This actually is not a real solution. I have experienced similar issues while using NLog, concretely try to use the latest NLog library with the Common.Logging.Nlog façade and you will obtain something similar to:

{"Could not load file or assembly 'NLog, Version=1.0.0.505, Culture=neutral, PublicKeyToken=5120e14c03d0593c' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)":"NLog, Version=1.0.0.505, Culture=neutral, PublicKeyToken=5120e14c03d0593c"}

The solution here is similar, you will have to define Runtime Binding:

<runtime>
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
  <dependentAssembly>
 <assemblyIdentity name="NLog" publicKeyToken="5120e14c03d0593c" culture="neutral" />
 <bindingRedirect oldVersion="0.0.0.0-2.0.0.0" newVersion="2.0.0.0" />
  </dependentAssembly>
</assemblyBinding>
</runtime>

What was interesting here, is that NuGet actually took care of this for me. I have just added the Common.Logging.NLog façade and I guess NuGet spotted that I have already NLog 2 and that this Runtime Binding is necessary. If you look at the documentation of bindingRedirect you will see, that we have the right to specify the range of versions in the oldVersion attribute. Here all the version will be bound to the 2.0.0.0 version.

Summary

Anyway NLog and Log4Net are both cool logging frameworks, just use the one you prefer. As I have showed above it is possible to use them together with Common.Logging it just takes a few more lines to configure it correctly.

úterý 3. dubna 2012

Mock Java Calendar - JMockit vs Mockito

To get the current time or day in Java, one should be using the Calendar class in the following way:

Calendar c = Calendar.getInstance();
int day = c.get(Calendar.DAY_OF_WEEK);

Now I imagine this code can be hidden somewhere inside a business method and the behaviour of that method would be dependent on the current day. Typical example can be the method which returns the schedule of the cinema on the current day.

public class ScheduleService{
  public Schedule getTodaySchedule(){
    Calendar c = Calendar.getInstance();
    int day = c.get(Calendar.DAY_OF_WEEK);

    //get it from DB or wherever you want
    Schedule s = lookupAccordingToDay(day);
  }
}

In order to test this method you have to mock the Calendar. You will have to verify, that for monday the service will return the schedule for monday. However since the test will be automatically called every day, not only monday, you will obtain whole different schedules and the assert will fail. There are several solutions to this. I have 3 in my mind.

Solution 1: create a separate service

First solution has nothing to do with mocking. The way to go here is to isolate the Calendar into a separate service (let's call it CurrentDayService). Than you can manually create a mock for this service. You will also have to change the body of your ScheduleService to use this CurrentDayService.

public interface ICurrentDayService {
   int getCurrentDay();
}

public class CurrentDayService {
   public int getCurrentDay(){
      Calendar c = Calendar.getInstance();
      return c.get(Calendar.DAY_OF_WEEK);
   }
}

public class CurrentDayServiceMock {
   private int dayToReturn;
   public CurrentDayServiceMock(int dayToReturn){
     this.dayToReturn = dayToReturn;
   }
   public int getCurrentDay(){
      return dayToReturn;
   }
}

public class ScheduleService {
  //@Autowire or inject this service
  private CurrentDayService dayService;
  
  public Schedule getTodaySchedule(){
    int day = dayService.getCurrentDay();
    //get it from DB or wherever you want
    Schedule s = lookupAccordingToDay(day);
  }
}

Now in the unit test your schedule service, can use the mock instead of the real implementation. If you are using Dependency Injection than you can define a different context for unit tests. If not, you will have to do it manually.

Solution 2: use Mockito

Mockito allows you to mock the real Calendar class. That means that you no longer need to wrap the Calendar by some CurrentDayService class just to be able to mock the behavior. However you will still have to add a mechanism to pass the mocked Calendar to your service. That is not that complicated. Have a look at the following definition o the ScheduleService and the unit test which comes with it.

public class ScheduleService{
  private Calendar calendar;
  public ScheduleService(){
    calendar = Calendar.getInstance();
  }
  public Schedule getTodaySchedule(){
    int day = calendar.get(Calendar.DAY_OF_WEEK);
    Schedule s = lookupAccordingToDay(day);
  }

  public setCalendar(Calendar c){
    calendar = c;
  }
}

@Test
public void testGetTodaySchedule() {
 Calendar c = Mockito.mock(Calendar.class);
 Mockito.when(c.get(Calendar.DAY_OF_WEEK)).thenReturn(2);

 ScheduleService sService = new SomeStrangeService();
 //there has to be a way to set the current calendar
 sService.setCalendar(c);
 Schedule schedule = sService.getTodaySchedule();
 //Assert your schedule values
}

To sum it up: if the setCalendar method is not called, than the Calendar is instantiated in the constructor. So in production, it will return the current day. In your unit test, you can easily mock it, to specify different behavior. Tha drawback: if someone accidentaly calls the setCalendar method in the production, you will get into truble.

Solution 3: use JMockit, mock all the calendars in you JVM

JMockit is strong framework which as some other mocking frameworks is using the Java Instrumentation API. The code that you want to execute in your mocks is injected as bytecode at runtime. This enables JMockit to, for instance mock all the instances of Calendar class in your JVM. Here is how you can achieve this:

@MockClass(realClass = Calendar.class)
public static class CallendarMock {

 private int hour;
 private int day;
 private int minute;

 public CallendarMock(int day, int hour, int minute) {
  this.hour = hour;
  this.day = day;
  this.minute = minute;
 }

 @Mock
 public int get(int id) {
  if (id == Calendar.HOUR_OF_DAY) {
   return hour;
  }
  if (id == Calendar.DAY_OF_WEEK) {
   return day;
  }

  if (id == Calendar.MINUTE) {
   return minute;
  }
  return -1;
 }
}

The previous code snippet is the infrastructure which I can use to mock the Calendar's get method. A utility class CalendarMock has to be created, which specifies the methods which are mocked. The realClass attribute in the MockClass annotation specifies which class is mocked by the defined class. So now the unit test is simplified. There is not need to specify the Calendar which should be used by the ScheduleService.

@Test
public void testGetTodaySchedule() {
 Mockit.setUpMocks(new CallendarMock(Calendar.MONDAY, 12, 20));
 ScheduleService sService = new SomeStrangeService();
 Schedule schedule = sService.getTodaySchedule();
 //Assert your schedule values
}
@After
public void destroyMock() {
    Mockit.tearDownMocks();
}

At the end, you have to remember to switch-off the mocking of the Calendar. If not the Calendar will be mocked in all the tests executed after this one. Hence the call to the tearDownMocks() method.

Summary

With Mockito you can mock the real Calendar. However you have to pass the instance of the mocked callendar to the class, which actually uses it. With JMockit you are able to tell to the JVM: "from now all my mocks behave like this...". For me this simplifies the situation, while I am not forced to create a setter for a Calendar to be passed to my service class. But it would take much more time and effort to compare the two frameworks. It might be that Mockito handles some situations better than JMockit.

sobota 17. března 2012

KnockoutJS and Google Maps binding

This post describes the integration between Google Maps and KnockoutJS. Concretely you can learn how to make the maps marker part of the View and automatically change it's position any time when the ViewModel behind changes. The ViewModel obviously has to contain the latitude and longitude positions of the point that you wish to visualize on the map.

Previously I have worked a bit with Silverlight/WPF which in general leaves one mark on a person: the preference for declarative definition of the UI leveraging the rich possibilities of data binding provided by the previously mentioned platforms. In this moment I have a small free-time project where I am visualizing a collection of points on a map. This post describes how to make the marker automatically change it's position after the model values behind changes. Just like in this picture bellow, where the position changes when user changes the values of latitude and longitude in the input boxes.

Since I like Model-View-ViewModel pattern I was looking for a framework to use this pattern in JS, obviously KnockoutJS saved me. The application that I am working on, has to visualize several markers on Google Maps. As far as I know there is no way to define the markers declaratively. You have to use JS:

marker = new google.maps.Marker({
  map:map,
  draggable:true,
  animation: google.maps.Animation.DROP,
  position: parliament
});
google.maps.event.addListener(marker, 'click', toggleBounce);

So let's say you have a ViewModel which holds a collection of interesting points, that will be visualized on the map. You have to iterate over this collection to show all of them on the map. One possible way around would be to use the subscribe method of KO. You could subscribe for example to the latitude of the point (assuming that the latitude would be an observable) and on any change perform the JS code. There is a better way.

Defining custom binding for Google Maps.

The way to go here is to define a custom binding, which will take care of the update of the point on the map, any time, that one of the observable properties (in basic scenario: latitude and longitude) would change.

ko.bindingHandlers.map = {
init: function (element, valueAccessor, allBindingsAccessor, viewModel) {
  var position = new google.maps.LatLng(allBindingsAccessor().latitude(), allBindingsAccessor().longitude());
  var marker = new google.maps.Marker({
    map: allBindingsAccessor().map,
    position: position,
    icon: 'Icons/star.png',
    title: name
  });
  viewModel._mapMarker = marker;
},
update: function (element, valueAccessor, allBindingsAccessor, viewModel) {
  var latlng = new google.maps.LatLng(allBindingsAccessor().latitude(), allBindingsAccessor().longitude());
  viewModel._mapMarker.setPosition(latlng);
 }
}

;<div data-bind="latitude: viewModel.Lat, longitude:viewModel.Lng, map:map" ></div>

So let's describe what is going on here. We have defined a map binding. This binding is used on a div element. Actually the type of the element is not important. There are also latitude and longitude bindings, which are not defined. That is because the map binding takes care of everything. The binding has two functions: init and update, first one called only once, the second one called every time the observable value changes.

The allBindingsAccessor parameter contains a collection of all sibling bindings passed to the element in the data-bind attribute. the valueAccessor, holds just the concrete binding (in this case the map value, because we are in definition of the map binding). So from the allBindingsAccessor we can easily obtain the values that we need:

allBindingsAccessor().latitude()
allBindingsAccessor().longitude()
allBindingsAccessor().map()

Notice that the map is passed to the binding in parameter (that is concretely the google.maps.Map object, not the DOM element. Once we have these values, there is nothing easier than to add the marker to the map.

And there is one important thing to do at the end – save the marker, somewhere so we can update it’s position later. Here again KO comes with rescue, because we can use the viewModel parameter passed to the binding and we can attach the marker to the ViewModel. Here I suppose that there is no existing variable with name _mapMarker in the viewModel and JS can happily add the variable to the ViewModel.

viewModel._mapMarker = marker;

The update method has an easy job, because the marker has been stored, and we only need to update it's position.

viewModel._mapMarker.setPosition(latlng);

Almost full example

Just check it here on JsFiddle.

Possible improvements

One thing that I do not like about this, is the fact, that you have to pass the map as an argument to the binding and the div element has to be outside of the map. Coming from Silverlight/WPF you would like to do something like this:

<div id=”map_canvas”>
<div data-bind=”latitude: vm.Lat, longitude:vm.Lng”>Whatever I want to show on the map marker</div>
</div>

That is actually the beauty of declarative UI definition. You can save a lot of code only by composing the elements in the correct order. However this is not possible – at least I was not able to get it to work. I was close however:

init: function (element, valueAccessor, allBindingsAccessor, viewModel) {
var map = element.parentNode;
var googleMap = new google.maps.Map(.map.);
//add the pointer
var contentToAddToMarker = element;
}

Again thanks to KO, here the element variable represents the DOM element to which the binding is attached. If the div element is inside the map, than we can get the parent element (which is the div for the map) and we are able to create a new map on this element. The problem which I had is the once, the new map was created, the div elements nested inside the map disappeared. Even if that would work, some mechanism would have to be introduced, in order to create the map only the first time (in case there are more markers to show on the map) and store it somewhere (probably as global JS variable).

On the other hand, thanks to the element you can get all the div which should be for example given as the description to the marker.

Summary: KnouckoutJS is great. It lets me get rid of the bordelic JS code.

neděle 11. března 2012

Collective Intelligence: Ants colony solving TSP

According to wikipedia:

“Collective intelligence is a shared or group intelligence that emerges from the collaboration and competition of many individuals and appears in consensus decision making in bacteria, animals, humans and computer networks”.

This article describes, how to make the ants, find the solution for TSP problem. Implemented in Python. Get the source from GitHUb.

The algorithms based on the collective intelligence have some “interesting” properties:

decentralization
parallelism
flexibility, adaptability
"robustness" (failures)
auto-organization

These algorithms are inspired by the nature. Here are some examples of collective intelligence which can be observed in the nature:

The swallows settle on wires before they are taking of for the next destination. There is no leader in the group. The decision whether to take of is taken collectively. The probability for the swallow to take of is getting higher when there are more swallows in the air. If the other swallows do not join, the swallow will again settle down on the wire. At one point the number of the swallows in the air reaches the “break-point” when all the swallows decide to take of.
The bees perform a special “dance” to show their peers where the foot-source is. This dance gives the information about the angle of the food source position with respect to the sun. All the bees do perform the dance when coming back in, which makes the algorithm adaptive.
When the ant founds food, he lays down a "positive pheromone" on his way back. This pheromone evaporates during the time. The other ants sniff for this pheromone when choosing their route and prefer to go in places where the concentration of the pheromone is higher. The shorter the path to the food source is, more pheromone stays on the track before it evaporates. The more pheromone there is, more ants take this path. When there is a obstacle in the route – the algorithm adapts easily to knew situation. Again the shortest route to evict the obstacle is chosen in the shortest time. Some details here.

There exist several applications of collective intelligence in engineering and optimization. The ants example has applications specially in routing. One of the basic problems which can be solved using an Ant colony is Travelling Salesman Problem.

Travelling Salesman Problem

Travelling Salesman Problem is a classical problem in the field of graph theory. Given n cities the salesman has to visit all of the nodes, come back to his starting location and and minimize traveled distance. Although the problem is NP - hard, several heuristic algorithms exists which obtain suitable results in polynomial time.

Ant colony algorithm for TSP

Ant colony algorithm was introduced in year 1997 by Italian researcher Marco Dorigo.

On the beginning the ants start to explore the graph. They choose their nodes randomly, until they visit all of the nodes. A this point the ant starts to copy his path back to his starting point. While he copies the path, on each edge he lays down certain amount of pheromone inversely proportional to the length of the tour. When each ant starts new route in each node he will compute the probability to choose an edge to continue his route. The probability of choosing edge e in each step is computed as follows.

π_e corresponds to the value of pheromone on the edge e.
η_e corresponds to the quality of the route. We can estimate this value by the length of the edge η_e = 1/d_e where d_e is the length of the edge.
J_e is a set of all the edges which the ant can use for his next step - includes all the edges except the one for which we compute the probability.
α and β are coefficients used to manage the impact of the length of the finished route to affect the decision other ants.
The amount of pheromone given to a certain edge is l = 1/routeLength^k, where k is a coefficient to amplify the impact of the length of the route to the decision.
During the time the pheromone evaporates on the edges. The evaporation can be expressed as: π_e = (1-ρ)π_e

The implementation in Python

Most of what I have learned and presented here was done during Collective Intelligence intensive course at Telecom ParisTech done by Jean-Luis Dessales, being part of the Athens program. The implementation was done in Python but as a module to a EvoLife program, which is a custom tool developed by Jean-Luis Dessales for scientific observations on genetic algorithms and collective intelligence algorithms. I have decided to make a stand-alone python program by taking the important bits out just for the Ants colony algorithm. I do almost never work in python, so for anyone out there if you see any big wrongdoing against the python culture, naming conventions etc, let me know.

The most important bits are in the Ant class.

self.Action = 'SearchNextNode'
self.Node
self.Path = []
self.PathLength = 0
self.ToVisit = []
self.Visited = []

The Action field remembers the Ant’s actual state. Each action has a corresponding method associated with it. The Node field holds the actual field. In ToVisit and Visited the ant stores the Nodes that he had already visited or that he needs to visit in order to achieve TSP completeness. Here is the “move” method which is called repetitively for each ant:

def moves(self):
#here is the ants move - one of the following actions is always selected
if self.Action == 'GoToNode':
self.goToNode()
if self.Action == 'SearchNextNode':
self.searchNextNode()
if self.Action == 'ReturnToStart':
self.returnToStart()
if self.Action == "GoToFirst":
self.goToFirst()

The most important of these method is searchNextNode, where the ant searches for his next edges to explore. In this method the behavior described in previous paragraph has to be defined.

def searchNextNode(self):
nodeindex = self.Node.Id        
#the maximal probability
pMax = 0
p = 0

#Try to select the best node by the pheromones in the direction
#have to iterate over all nodes
for i in range(NbNodes):
if i!=self.Node.Id and self.Visited[i]==0:
d = Land.Distances[self.Node.Id][i]

#get the value of pheromon on the edge
pi = Land.Edges[self.Node.Id][i]

#To prevent division by zero and get some info
#when d = 0 there would be problem in computation of d
if d==0:
print i
print self.Node.Id

#the quality of the route
nij = 1/d

pselected = math.pow(pi,alfa) * math.pow(nij,beta)

#normalization
#compute the sum of other options
sum = 0
for j in range(NbNodes):
if j!=self.Node.Id and self.Visited[j]==0 and j!=i:
dj = Land.Distances[self.Node.Id][j]
pij = Land.Edges[self.Node.Id][j]
nj = 1/dj
pj = math.pow(pij,alfa) * math.pow(nj,beta)
sum+=pj
if sum>0:
p = pselected/sum

#if we have a new best path - then remember the index
if p > pMax:
pMax = p
nodeindex = i

We have to iterate over all the neighbor nodes in order to choose the right one. For each node the probability of taking the edge going to this node is computed according to the formula given in previous paragraph. Some remarks: the value of the pheromone on each edge is stored in a global array: Land.Edge[nodeFrom][nodeTo]. Also the distances between all nodes are pre-calculated in Land.Distance[nodeFrom][nodeTo].

There is quite a lot of code, regarding the visualisation. The TkInter library was used for drawing. Also the PIL library has to be installed. It should not take too long the figure out the responsibility of each class.

The results

Here is how the resulting program looks like:

And here is the evolution of creating the final decision. First all edges have some amount of pheromone and during the time, the preferred edges are chosen. Because the ants choose the edges randomly on the beginning, the result is never assumed the same. The following three images show the evolution which resulted in quite not optimal solution.

The quality of the solution depends heavily on the values of the coefficients. These values can be changed in the Program.py file:

#level of pheromone to show
PToShow = 0.004
#factor which lowers the value given to a path on function of the paths length
k=1
#evaporation factor
theta = 0.07
#parameter which amplifies the value of the pheromon on the edge (pi^alfa)
alfa = 4
#parameter which amplifies the impact of the quality of the route  ni^beta; ni=1/de
beta = 2.5

Putting aside the coefficients described above, there is also PToShow value, which determines what is the minimal value of pheromone on the edge, in order for the edge to be dotted in the picture.

Before this implementation, I had one before – which was not at all efficient but quite funny. In this implementation the ants, could move freely around – there was no notion of edges. The ants simply took a directions towards a certain node and when they got close enough to it, they considered the node as reached and continued for the other one. This was useless, but I saved these funny graphics with the ants moving all around:

And the ants finding the solution:

The source code

Download it here.

As said before, the source was created as a module to a larger program and later taken out to be executable isolated. Therefor there still is quite a lot of refactoring which could be done, but I do not consider it necessary, since the purpose is merely to present the Ant colony algorithm.

čtvrtek 8. března 2012

Graph theory in Latex

For one of my previous posts, I needed some images of graphs. Initially I have taught, that I will just draw them in Inkscape or some other tool, but after a while I have decided to do something more clever – which might maybe serve me in the future – draw the graphs in Latex. After a quick search I have found this blog:

http://graphtheoryinlatex.wordpress.com/ and older posts from the same author: http://graphtheoryinlatex.blogspot.com/

This blog is about drawing graphs in TeX. So what do you need:

TikZ – a graphic system for Tex

Tkz-graph – style with basic graph drawing macros.

Tkz-berge – style with more complex drawing – such as predefined graphs (full graphs, bipartite graphs, grids etc.)

Some TeX editor. I am using Inlage. Which is not expensive and takes care for downloading all the necessary packages.

So here are the graphs and variants which I have used in the previous post:

Graph with 4 edges in a circle:

\begin{tikzpicture}
\GraphInit[vstyle=Welsh]
\SetGraphUnit{2}
\Vertices{circle}{A,B,C,D}
\Edges(A,B,C,D,A,C)
\SetVertexNoLabel
\end{tikzpicture}

Coloring some of the nodes:

\begin{tikzpicture}
\GraphInit[vstyle=Classic]
\SetGraphUnit{2}
\Vertices{circle}{A,B,C,D}
\Edges(A,B,C,D,A,C)
\SetVertexNoLabel
\AddVertexColor{red}{B,D}
\AddVertexColor{green}{A}
\AddVertexColor{blue}{C}
\end{tikzpicture}

Graph with labeled edges

\begin{tikzpicture}
\GraphInit[vstyle=Welsh]
\SetGraphUnit{2}
\Vertices{circle}{A,B,C,D,E}
\SetUpEdge[style={->}]
\Edges[label=1](A,B)
\Edges[label=1](B,C)
\Edges[label=1](C,D)
\Edges[label=$1$](D,E)
\Edges[label=x1](A,C)
\Edges[label=x2](A,D)
\Edges[label=x3](A,E)
\SetVertexNoLabel
\end{tikzpicture}

Graph with positioned vertices.

Using the EA, NOEA and similar macros (EA = East of first vertex define the second vertex, NOEA = Northeast of…)

\begin{tikzpicture}
\SetGraphUnit{2}
\GraphInit[vstyle=Normal]
\Vertex{S}
\NOEA(S){A} \SOEA(S){B}
\EA(A){T1} \EA(B){T2}
\SOEA(T1){F}
\Edges[label=1](S,A)
\Edges[label=1](S,B)
\Edges[label=4](A,T1)
\Edges[label=2](B,T2)
\Edges[label=1](T1,F)
\Edges[label=1](T2,F)
\Edges[style={pos=.25},label=3](A,T2)
\Edges[style={pos=.25},label=2](B,T1)
%Could use this for bending%
\tikzset{EdgeStyle/.append style = {bend left}}
\end{tikzpicture}

Bipartite graph using the berge styles:

\begin{tikzpicture}
\grCompleteBipartite[RA=4,RB=3,RS=6]{2}{2}
\end{tikzpicture}

For now I am content that I have found this blog and actually I have to say, that drawing graphs in TeX is much easier than I have expected.

neděle 4. března 2012

Wrong SystemRoot variable causes 0xC0150004

Last week I had experienced this quite "funny" situation on WinXP machine.

When I have run some post-build scripts using Jenkings & Maven I have obtained something like:

CMD is not known, not an executable file etc...

I took a look at it and I taught to myself - the "SystemRoot" variable is not set! I have to correct this and it will work fine. Actually this is not true, but apparently it Jenkins does some cleaning in the variables, so it might happend, that the scripts runned by Jenkins will not work as you have expected. Anyway that is not the point of my story. So what I did to "correct" the error was to add the following variable to my USER environement variables:

SystemRoot = SystemRoot\System32

I was too quick and too naive. First the SystemRoot had been already defined in the System variables and second the correct path is just C:\Windows. So I have used this recursion to define the varible - in wrong path.

When I have saved my master work, I could not execute almost any new program. I have always obtained the following error:

0xC0150004

So after a fast google I have found out, that this error appears when the SystemRoot variable is not correctly set. Cool so how to fix the mistake? You cannot launch any program or system utility at all and you need to correct the environement variable.

Well after a wail I have found this solution:

Login as another user
Start the CMD as the user with the wrong environement variable
Correct the variable using: SET SystemRoot=C:\Windows
Now you think it is ok, but after a restart you will find out, that the new settings is never taken into account. So do not bother to restart and try the following:
Run REGEDIT from the CMD which is opened using the account and where ther correct SystemRoot was set. Notice it will run....and before it was failing because regedit is dependent on that SystemRoot variable.
Find the user variables searching here: HKEY_CURRENT_USER\Environment
Now correct the SystemRoot variable and voila after restart you can log in again with the "infected account"

The moral of the story:

Lot of applications (notepad, regedit, explorer, task manager) would not start if the SystemRoot is not set correctly.

WinXP will let you set the variable to such a nonsence as:
SystemRoot = SystemRoot\System32

sobota 25. února 2012

Applications of Graph Theory

Graph Theory is just a beautiful part of mathematics. Not only Computer Science is heavily based on Graph Theory. There are a lot of applications of Graph Theory in Operational Research, Combinatorial Optimization, Bioinformatics.
For my personal clasification I have separated the tasks, which you can solve using Graph Theory into two groups:

Obvious applications – I mean, that the analogy between the graph and the problem is quite easy to imagine (maps, cities, relations etc.). In this post you can find the following:
- Vehicle Routing Problem
- Graph coloring
- Map coloring
Hidden applications - Tasks which you would never assume can be solved using Graph Theory. Than you see one of them, and than you think: “Wow, I wonder who came up with that one…”. I will provide the following ones in this post:
- Image or 3D model reconstruction from projections
- Prove of NP hardness of Integer Linear Programming
- Register allocation
- Approximation of data, data compression
- Task scheduling

Obvious applications

Here are some examples of problems for which the creation of the graph to model the problem is quite easy and obvious.

Vehicle routing problem and other variants of TSP

There is a whole set of problems, which are just variations of Traveling Salesman Problem. Vehicle Routing Problem (VRP) can be characterized by following description:

We are provided with a set of trucks, each of a certain capacity
Set of customers, each with certain need of the goods
The central repository, where the goods are stored

The tasks is to root the trucks, to distribute the goods to the clients and minimalize the distance. I have written a blog and a web utility which is able to solve the problem using two algorithms:

Clark & Wright Savings Algorithm
The Sweep Algorithm

You can find more algorithms for solving VRP here.

Graph coloring problem

Given a graph, we want to decide, whether it is possible to color each of the vertices in the graph in such way, that none of the vertices which are sharing and edge have the same color. Many real world problems can be formulated as Graph coloring problem. Ne first one of the Map coloring.

Map coloring

One of the first application is the map coloring problem. It has been proven, that each map can be colored using 4 colors. This problem can be converted to graph coloring problem by placing the vertex inside each country or region in the map. Two vertices are connected if and only if the two countries have a common border. More over here.

Hidden applications

There are tasks and problems for which you would not intuitively search the solution by applying graph theory. Here are some of them:

Image reconstruction from X-Rays – Computer tomography.

Tomography is a technique used to reconstruct an image or 3D model from series of projections, subsequently taken from different angles. When using technologies such as the x-rays, the image take from an angle gives for each pixel the total thickness of the scanned object. The questions is than how to reconstruct the image from several taken images which are containing only the thicknesses.

As described in great book "Network Flows – Theory, Algorithms and Applications”, concrete example of computer tomography is the “Reconstruction of the Left Ventricle from x-Rays projections”. This problem can be solved using the application of network flows theory. This method is applicable only to problems where the scanned object has a uniform structure. As mentioned in the book this assumes that the well-working Ventricle is filled uniformly with blood and dye mixture.

The following graphics was taken from the book. It explains the method on two dimensional image. Using two projections of the project, we obtain vectors which are containing for each pixel (or other unit) the probable mass hidden behind this pixel. Now is up to us to find out how this mass is distributed - that means where are the ‘1’ in the picture. The more projections we have, the better results we can obtain.

The problems is thus simplified to the problem of constructing binary matrix from the projection sums. This problem is a special case of the feasible flow problem.

The following image shows similar very simplified task, which I have taken from the Combinatorial Optimization course offered as part of Open Informatics program at CTU Prague.

The whole problem can be seen as the question of finding the feasible flow in a network (G, b, u, l,c). So what does network consist of:

Graph G
s – sources – the nodes which provide the fluid into the network – the nodes with positive values
t – appliances (or sinks) – the nodes which consume the fluid – the nodes with negative values
u – upper bound for the flow of each edge
l – lower bound for the flow of each edge
c – the actual flow in each edge – the one for which we are looking. The task is to find the values of c for each edge, in order to satisfy all sinks.

Here is the graph G which corresponds to the projections sumR and sumC from the previous image. Each edge in the graph corresponds to one pixel, connecting the two projections. The sumR are being sources in this network and the sumC edges are sinks.

For each edge the lower bound l(e) = 0, upper bound u(e) = 1 and we are looking for values of values of c(e), in order to for the flow to be feasible and also minimal. The edges which are used in the feasible and minimal flow are pixels which will have ‘1’ value in them.

Proving NP’s ness of some problems such as Integer Linear Programming

The graph coloring problem has been already mentioned above. We are trying to color each node of the graph in such a way, that nodes with same color cannot be connected by an edge.

Integer Linear Programming (ILP) is NP-hard problem. This can be proven by the polynomial reduction of Graph coloring problem to the ILP problem. Concretely we can say, that for each graph which can be colored using 3 colors, we are able to construct an ILP problem, which has a solution. From the theoretical point of view saying “we are able to construct” means that there is a polynomial reduction of Graph coloring problem to ILP problem. Polynomial reduction proves that:

If Graph Coloring is NP-hard problem, than ILP is also NP hard problem.

Polynomial reduction has to satisfy two conditions in order to prove the NP-hardness:

The reduction algorithm – the construction of one problem from another has to be performed in polynomial time
For each instance graph which can be colored with 3 colors an instance of ILP can be constructed which has a solution

Here is the reduction algorithm (the algorithm which explains how to define an ILP problem to given graph):

In the beginning we have a graph colored using 3 colors. We will try to create an instance of ILP out of this graph. That means we have to define the variables and the equations which build the ILP problem. We can do this in 3 steps.

Create N variables x_ncolor== 1 <=> the node n has the color c, where N is the number of nodes.
For each node in the graph add en equation to the ILP system:
- x_nred + x_nblue + n_ngreen = 1
for each edge e = {n_i, n_j} in the graph add following three equations in the system:
- x_nired + x_njred <= 1
- x_niblue + x_njblue <= 1
- x_nigreen + x_njgreen <= 1

Here is an example, we have an simple graph:

Now the first set of equations, which states, that each edge can have at most one color:

The following set of equations, which states, that nodes sharing edge cannot have the same color:

Now because the ILP problem can be reduced to graph coloring problem, we know, that this problem has solution, when the graph can be colored with three colors. Here is the solution:

Which corresponds to:

The coloring of the graph is NP hard, so also ILP is NP hard. If you wonder how to prove that NP graph coloring is NP hard: there is a polynomial reduction from one special type of SAT problem.

Register allocation

Register allocation is the process of assigning possibly infinite set of variables of assembly program to a finite set of registers which are available in the processor. Not all variables are used at the same time, so several variables can share a register (if not this mapping would not be possible). Even this problem is solved using graph coloring. For each variable a vertex is created. Vertices are connected if variables “live” in the program at the same time. The number of colors given to color the graph is equal to number of registers.

Approximation of the data – data compression

This technique is used in order to approximate the data which has to be stored while minimizing the loses of precision.

For example a data which represents taken temperatures during the time and builds a nice graph. However if this data was taken at high frequency, there might be too many records. The idea is to minimize the number of records, while keeping most of the information about the evolvement of the temperature.

The shortest path algorithm can be used to solve this problem. For instance the blue line in the following graphics represents the data to be stored. It is 10 values: the time x and Temperature( x) at the time x. The green and red line represent possible approximations, when we are leaving out one or two nodes. Of course there are several nodes which can be left out and the shortest path algorithm can help us to find which ones can be left out.

We can construct a full graph, which will contain 5 nodes, representing the 5 data points (the times x). Each edge represents the “precision loose” which we pay, when we take the direct path between the two nodes of the edge instead of passing the traditional way. The following picture represents the partial graph – the skipping edges start only in the first node ( A ). The edge with value x1 corresponds to the red line in the graph etc. The graph should be also filled with other edges starting in B and C (the only edge going from D to E is already present and there are no edges starting in E), but I have left them out for simplicity.

So without compression we have the simple path: A,B,C,D,E = 1 + 1 + 1 + 1 = 4

Taking the red edge and the rest of the path: A,C,D,E = 1 + 1 + 1+ x1

Taking the green edge and the rest of the path: A, D, E = 1 + 1 + x2

The values of the edges in the standard path should be the lowest (here all of them have value 1). On the other hand values of edges which will make us loose more precision should be the greatest. Then of course we can introduce some bonus to motivate the algorithm to take less edges (better compression). All this constraints can be modeled using heuristics.

One possible heuristics to evaluate the value of the edge is to measure the distance between the real data and the estimated date. For instance the value of the second point is 5. If we estimate the value using the red line (leaving out the second point) the corresponding value on the red line is 3. The distance between these two values is: 2.

If we use the the green line instead then the distance between the estimated value f’( x) and the real value f( x) is 1. On the other hand the green line also estimates the second point 3 point. And we see that the distance for the second point will be more or less 1.5. So we should add these distance together. So we get:

x1 = 2

x2 = 2.5

This is just a proposition. We could also multiply it by some coefficient to obtain some reasonable results.

With the developed and evaluated graph, finding the shortest path in the full graph from the node A to the node E will give us the best “size/precision” ratio.

Task scheduling

In this problem we have a group of workers and group of tasks. Each task can be processed by each worker. However the workers do not have the same performance on all tasks – the time for the processing of each task differs for each worker.

Let’s take a look at very simple example, we have two workers (A,B) and two tasks (T1,T2). The values in the table represent the processing time, that the worker needs for the given task.

This can be solved as finding the cheapest flow in the following graph.

Not that each edge has two values: u/c. The ‘u’ represents the capacity of the edge – it is always one. The ‘c’ represents the cost of the edge. Finding the cheapest flow in this graph from S to F will give us the best assignment of workers to tasks.

Other interesting applications

Development of custom processors for minimization of resources – data path optimization.
Scheduling on parallel processors – application of maximal flow finding problem.