Zobrazují se příspěvky se štítkemC#. Zobrazit všechny příspěvky
Zobrazují se příspěvky se štítkemC#. Zobrazit všechny příspěvky

čtvrtek 9. května 2013

Screen scraping in C# using WebClient

This post is intended to give you some useful tips to perform screen scraping in C#. Well first let's put it clear. In the ideal world we should not be forced to do screen scraping. Every solid web site, application or service should propose a decent API to provide the data to other applications. If the application holds resources of it's users, than it should propose OAuth protected API and thus allow the users to use their data through another application. But we are not yet in this situation.

Observing the communication

In order to know what kind of HTTP request you have to issue, you have to observe what the browser is doing when you browse the web page. There is not a better tool for the job than Fiddler. One of the features provided which you might find really useful is that it can automatically decrypt HTTPS traffic.

Getting the data

Once you determine which web requests you should replay you need the infrastructure necessary to execute the requests. .NET provides the WebClient class. Note that WebClient is a facade for using creating and handling HttpWebRequest and HttpWebResponse objects. Feel free to use these classes directly if you want, but by default the compiler will not like their usage since they are marked as obsolote.

Parsing the data

If you are just need to screen scrape a simple site which is invoked by HTTP GET request, than you do not need any special information. You can just fire WebClient, obtain the string and than parse the result. When parsing the result, you have to keep in mind, that HTML is not a regular language. Therefor you cannot always use Regular Expressions to parse it. However you can usually get around with it. A common task is to match some information in some concrete tag, here are two examples:

Matching any text inside a div with some special styles:

<div style="font:bold 11px verdana;color:#cf152c">Important information</div>
var addressTerm = new Regex("<div style=\"font:bold 11px verdana;color:#cf152c;\">(?<match>[^<]*?)</div>");

Matching two decimal values inside a div separated by BR tag:

<div style=\"margin-left:5px;float:left;font:bold 11px verdana\">10<br />12<br /></div>
var dataTerm = new Regex("<div style=\"margin-left:5px;float:left;font:bold 11px verdana;color:green\">(?<free>\\d*)<br />(?<places>\\d*)<br /></div>");

Posting values

When submiting a form to a web application, the browser usually performs a http POST request and encodes the values to the posting URL. In order to create such a request, you have to set the content type of the request to application/x-www-form-urlencoded. Then you can use the UploadData of the WebClient.

using(var client = new WebClient()){
 var contentType = "application/x-www-form-urlencoded";
 client.Headers.Add("Content-Type", contentType);
 
 var values = new NameAndValueCollection();
 values.Add("name", name);
 values.Add("pass", pass);
 var response = client.UploadValues(url, "POST", values);
}

Handling the authentification

In some cases you have to pass the authentication before you get to the information that you need. Most of the web sites use cookie based authentication. Once the user is authenticated the server generates an authentication cookie which than is automatically added to any susccesive request by the web browser. By default WebClient does not accept store cookies. The infrastructure to handle cookies is implemented on the level of HttpWebRequest. I have found a very useful example of "cookie aware" WebClient which keeps all the cookies that it has recieved so far and adds them to any newer request on the following StackOverflow link:

http://stackoverflow.com/questions/1777221/using-cookiecontainer-with-webclient-class
public class WebClientEx : WebClient
{
    public WebClientEx(CookieContainer container)
    {
        this.container = container;
    }

    private readonly CookieContainer container = new CookieContainer();

    protected override WebRequest GetWebRequest(Uri address)
    {
        WebRequest r = base.GetWebRequest(address);
        var request = r as HttpWebRequest;
        if (request != null)
        {
            request.CookieContainer = container;
        }
        return r;
    }

    protected override WebResponse GetWebResponse(WebRequest request, IAsyncResult result)
    {
        WebResponse response = base.GetWebResponse(request, result);
        ReadCookies(response);
        return response;
    }

    protected override WebResponse GetWebResponse(WebRequest request)
    {
        WebResponse response = base.GetWebResponse(request);
        ReadCookies(response);
        return response;
    }

    private void ReadCookies(WebResponse r)
    {
        var response = r as HttpWebResponse;
        if (response != null)
        {
            CookieCollection cookies = response.Cookies;
            container.Add(cookies);
        }
    }
}

Diggest authentication

Some web site may employ "digest" authentication, which based on hashing, adds a little more security againts "man-in-the-middle attacks. In that case you will see, that a login request is not just composed of a simple POST request with the "login" and "password" values. Instead a combination of random value (which the server knows) and the password is composed, hashed together and sent to the server.

digestPassword = hash(hash(login+password)+nonce);

Nonce - in the previous definition is the "Number Used Only Once", which is generated by the server and which the server keeps in a pool in order to keep track of already used values. Here are two simple methods to create a digestPassword:

public static String DigestResponse(String idClient, String password, String nonce)
{
 var cp = idClient + password;
 var hashedCP = CalculateSHA1(cp, Encoding.UTF8);
 var cnp = hashedCP + nonce;
 return CalculateSHA1(cnp, Encoding.UTF8);
}

public static string CalculateSHA1(string text, Encoding enc)
{
 byte[] buffer = enc.GetBytes(text);
 var cryptoTransformSHA1 = new SHA1CryptoServiceProvider();
 return BitConverter.ToString(cryptoTransformSHA1.ComputeHash(buffer)).Replace("-", "").ToLower();
}

Ofcourse when using the digest authentication, the server has to provide the value of the "Nonce" to the client. The value is usually a part of the login page and the authentication and the hashing is one in JavaScript

State-full JSF applications

Most of the web applications that we see today are composed of stateless services. There are some really good reasons for that, however it is still posible that you might have to analyze a stateful application. In this situation the order of the http web requests matters. JSF is one of such web technologies which favor stateful applications. In my case I needed to obtain a CSV file which was generated using the data previously shown to the user in a HTML table. The way this was done, was that the ID of the table element was passed to the CSV generation request. So these two requests were interconnected. More than that, the ID value was generated by JSF and I think that it was dependent on the number of previously generated HTML elements. Typically the generated ID values are prefixed by "j_id" and if I wanted to hardcode this value, I had to compose always exactly the same set of HTTP requests.

values.Add("source", "j_id630");

Make them think you are a serious browser

Some web page check for the browser accessing the page, you can easily make them think you are Mozilla Firefox:

var mozilaAgent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)";
client.Headers.Add("User-Agent", mozilaAgent);

Summary

If there is any other way to obtain the data, than it is probably better way. If you cannnot avoid it, I hope this gave you couple hints.

pondělí 25. března 2013

Sample application: RavenDB, KnockoutJS, Bootstrap and more

While learning a new technology or framework, I always like to build small but well covering Proof Of Concept application. It is even better if one can combine several new technologies into such a project. This is description of one such project which uses RavenDB, WebAPI, KnockoutJS, Bootstrap, D3JS.
Source code is available on GitHub

The Use Case

Everyone renting an apartment or any other property knows that it might be quite difficult to track the expenses and income in order to assure himself of the rent-ability of the given property. I have created an applications which helps with just that and thanks to this applications I was able to lern the mentioned technologies. Now let's take a look at them closer.
  • KnockoutJS - to glue the interaction on the client side. Knockout is one of the cool JavaScript MV(*) frameworks which provide a way to organise and facilitate the JavaScript development. Unlike other frameworks (Backbone or Ember) KnockoutJS concentrate itself only on binding of the data and actions between the GUI (HTML) and the ViewModel (JavaScript) and does not take care for other aspects (such as client side routing). The framework is very flexible and allows you to bind almost anything to any DOM's elenent value or style.
  • RavenDB - to stored the data. RavenDB is a document database, which seamlessly integrates into any C# project.
  • WebAPI - to serve the data through REST services. WebAPI is a quite new technology from MS which is meant to provide better support for building REST services. Of course we have built REST services with WCF before, so the questions is why should we change to WebAPI? WCF was created in the age of WSDL. It was adapted later to generate JSON, however inside it still uses XML as data transformation format. WebAPI is complete rewrite which also provides other interesting features.
  • Bootstrap - to give it a decent GUI. As its name says, bootstrap enables a quick development of a web application's GUI. It is a great tool to all of us who just want to get the project out and we still need a decent user interface.
  • D3.js - to visualize data using charts. D3JS is a JavaScript library enabling the user to manipulate the DOM and SVG elements.
  • KoExtensions - very small set of tools which I have created, allowing easy creation of pie charts or binding to google maps while using KnockoutJS.
Here is how it looks like at the end:

The architecture of the application

The architecture is visualized in the following diagram. The backend is composed of MVC application, which exposes several API controllers. These controllers talk directly to the database through RavenDB IDocumentSession interface. The REST services are invoked by ViewModel code written in JavaScript. The content of the ViewModels is bound the view using Knockout.


This application is as lightweight as possible. It is composed of a MVC 4 application with two types of Controllers: Standard and API. Standard controllers are used to render the base web pages.
Even though that this applications uses client side MVVM, the Html and JavaScript of the client side app have to be hosted in some server side application. I have chosen to host the applications inside the classic ASP.MVC application, but I could as well choose to use standard ASP.NET application.
But as many on the web I prefer MVC style applications. It is not a sin to mix server and client side MVC in one application.
This application has no service layer. All the logic can be found inside the Controllers. The controllers all use directly the IDocumentSession of RavenDB to access the database. The correct approach to user RavenDB when using ASP.MVC is described on the official web page. Basically the RavenDB session is opened when the controller's action is started and is closed when the action terminates. The structure of API controller however differs a little bit, but the principle is the same.

When to use Knockout or client side MV*

There are probably a lot of people around there with exactly the same question. It basically comes to the answer of whether to use or not any client side MVC JavaScript framework. From my purely personal point of view this makes sense when one or more of these conditions are met:
  • You have a good server side REST API (or you plan to build one) and want to use it to build a web page.
  • You are building more web-application then a website. That is to say, your users will stay at the page for some time, perform multiple actions, keep some user state and you need a responsive application for that.
  • You need a really dynamic page. Even if you would use server side MVC than you would somehow need to include a lot of JavaScript for the dynamics of the page.
This is just my personal opinion and there is a lot of discussion around internet and as usually no silver-bullet answer.

Data model

RavenDB is NoSQL database, or as it would be better to say non-relational database. The data is stored in document collections, serialized to JSON. Each document contains an object or more specifically graph of objects serialized to JSON.
When working with relational databases, the aggregated graph of objects which is served to the user is usually constructed by several joins into several tables. On the other hand when working with document databases, the data which is aggregated into one object graph, should be also stored that way.
In our particular example, one property or asset can have several rents and several charges. One rent does not really have sense without the asset to which it is attached. That's why the rents and charges are stored directly inside each asset. This applications is composed of two collections: Owners and Assets. Here are examples of Owner and Asset document.
{
   "Name": null,
   "UserName": "test",   
   "Password": "test"
}
 
{
  "OwnerId": 1,
  "LastChargeId": 5,
  "LastRentId": 0,
  "Name": "Appartment #1",
  "Address": "5th Ave",
  "City": "New York",
  "Country": "USA",
  "ZipCode": "10021",
  "Latitude": 40.774,
  "Longitude": -73.965,
  "InitialCosts": 0.0,
  "Rents": [],
  "Charges": [
 {
   "Counterparty": "New York Electrics",
   "Type": null,
   "Automatic": false,
   "Regularity": "MONTH",
   "Id": 2,
   "Name": "Electricity",
   "PaymentDay": 4,
   "AccountNumber": "9084938890-2491",
   "Amount": 1000.0,
   "Unit": 3,
   "Notes": "",
   "End": "2013-03-19T23:00:00.0000000Z",
   "Start": "2013-03-10T23:00:00.0000000Z",
 },
 { ... },
 { ... }
  ],
  "Ebit": 0.0,
  "Size": 80.0,
  "PMS": 1250000.0,
  "Price": 100000000.0,
  "IncomeTax": 0.0,
  "InterestRate": 0.0
}

One question you might be asking yourself is why did I not use only one collection of Owners. Each Owner document would than contain all the assets as an inner collection. This is just because, I thought it might make sense in the future, to have an asset shared by two owners. The current design allows us anytime in the future, connect the asset to an collection of Owners, simply by replacing OwnerID property with and collection of integers, containing all the ids of the owners.

The Backend

The backend is composed by set of REST controllers. Here is the provided API:

  • GET api/assets - get the list of all the appartment of current user
  • DELETE api/asset/{id} - removing existing asset
  • PUT api/asset - adding new asset
  • PUT api/charges?assetID={id} - add new charge to existing asset
  • POST api/charges?assetID={id} - update existing charge in given asset
  • DELETE api/charge/assetID={id}?assetID={assetID} - removing charge from existing asset
  • PUT api/rents/?assetID={id} - add new charge
  • POST api/rents/?assetID={id} - update existing charge
  • DELETE api/rents/assetID={id}?assetID={assetID} - removing rent from existing asset

Getting all the assets

Without further introduction let's take a look at the first Controller which returns
 all the apartments of the logged owner. This service is available at api/assets url.
[Authorize]
public IEnumerable<Object> Get()
{
 var owner = ObtainCurrentOwner();
 var assets = GetAssets(owner.Id);
 return result;
}

protected Owner ObtainCurrentOwner()
{
 return RavenSession.Query<Owner>().SingleOrDefault(x => x.UserName == HttpContext.Current.User.Identity.Name);
}

public IEnumerable<Asset> GetAssets(int ownerID)
{
 return RavenSession.Query<Asset>().Where(x => x.OwnerId == ownerID);
}
 
This method is decorated with the [Authorize] attribute. This mechanism was known previosly in WCF. ASP.NET checks for the cookie within this request and if no cookie is present the request is rejected. Getting the current user and all it's assets is a metter of two linq queries using the RavenSession. which has to be opened before.

Opening RavenDB session

All the controllers inherit from a base controller called RavenApiController. This controller opens the session to RavenDB when it is initialized and than potentially saves the changes to the database when the work is finished. The dispose method of the controller is the last method which is invoked when the work is over.
protected override void Initialize(System.Web.Http.Controllers.HttpControllerContext controllerContext)
{
   base.Initialize(controllerContext);
   if(RavenSession == null)
     RavenSession = WebApiApplication.Store.OpenSession();
}

protected override void Dispose(bool disposing)
{
 base.Dispose(disposing);
 using (RavenSession)
 {
  if (RavenSession != null)
   RavenSession.SaveChanges();
 }
}
public Object Post(Charge value, int assetID) { var owner = ObtainCurrentOwner(); var asset = GetAsset(assetID,owner); value.Id = asset.GenerateChargeId(); if (asset.Charges == null) { asset.Charges = new List<Charge>(); } asset.Charges.Add(value); return GetResponse(value, asset, true); }

Since RavenDB provides changes tracking, there is no need to perform additional work. RavenDB will notice, that new charge was added to the Charges collection and when SaveChanges function is invoked on the Raven session, the new charge will be persisted to the database. As it has been explained before, the SaveChanges is invoked while disposing the controller.

Note that if you want to acceess the Charge object in the future, you need to give it and ID. RavenDB does generated IDs only for documents, but not for any inner objects. The solution here is to give each Asset and id counter for the charges, which is incremented any time new asset is added.

The FrontEnd

All the logic on the client side resides in ViewModel classes. I assume you are familiar with MVVM pattern. If not you can still continue reading, while the understanding should be intuitive if you have worked with MVC frameworks before. The parent ViewModel and the one which aggregates others is the OwnerViewModel. The ViewModels build up a hierarchy similary as the domain objects.

The OwnerViewModel has to get all the assets and build an AssetViewModel around each received Asset. The data is retrieved from the server as JSON using asynchronous request.

function OwnerViewModel() {
 var self = this;
 $.extend(self, new BaseViewModel());
 self.assets = ko.observableArray([]);
 self.selectedAsset = ko.observable([]);
 self.isBusy(true);
 self.message('Loading');

 $.ajax("/../api/assets", {
  type: "get", contentType: "application/json",
  statusCode: {
   401: function () { window.location = "/en/Account/LogOn" }
  },
  success: function (data) {
   var mappedAssets = $.map(data, function (item) {
    return new AssetViewModel(self, item);
   });
   self.assets(mappedAssets);
  }
 });
}

You can notice that this ViewModel calls jQuery's $.extend method right at the begining of the function. This is one of the ways to express inheritance in JavaScript. JavaScript is prototype based language. The objects derive directly from other objects, not from classes. The extend method basically copies all properties from the object specified in the parameter.

All of my ViewModels have certain common properties such as busy or message. These are help variables which I use on all ViewModels to visualize progress or show some info messages in the GUI. The BaseViewModel is a good place to define these common properties. Notice also the selectedAsset property, which holds the currently selected AsseetViewModel (imagine user selecting one line in the table of assets).

Wihtout further examination let's take a look at the AssetViewModel. There are several self-epxlanatory properties such as address, price and similar. What is more interesting are the arrays of Rents and Charges. These are observable arrays of ViewModels which are filled during the construction of the AssetViewModel object. The data to this object is passed from the OwnerViewModel. The asset also holds its value to the owner in the parent property.

function AssetViewModel(parent,data) {
    var self = this;
    $.extend(self, new BaseViewModel());
    self.lat = ko.observable();
    self.lng = ko.observable();
    self.city = ko.observable();
    self.country = ko.observable();
    self.zipCode = ko.observable();
    self.address = ko.observable();
    self.name = ko.observable();
    self.charges = ko.observableArray([]);
    self.rents = ko.observableArray([]);
    self.parent = parent;
 
  if (data != null) {
        self.isNew(false);
        self.name(data.Name);
        //update all asset data here
        
  //fill the charges collection - note the rents are filled similarly
        if (data.Charges != null) {
            self.charges($.map(data.Charges, function (data) {
                return new ChargeViewModel(self, data);
            }));
        }
    }
}

To sum it up: When the OwnerViewModel is loaded in the screen, it immidiately starts HTTP request to obtain all the data. It will recieve a JSON which contains all the assets, each asset containing the charges and rents inside. This JSON is parsed respectively by OwnerViewModel, AssetViewModel and Charge and RetnViewModel. At the end the complete hierarchy of ViewModels is created on the client side which copies exactly the server side.

Before detailing the last missing ViewModels (Rents and Charges), let's take a look at the first part of the View. The parent layout is defined in _Layout.cshtml however the part mastered by Knockout is defined in the Index.cshtml file. The left side menu is composed of two smaller menus. One which contains the list of properties with the possibility to create new one and another one which allows to switch over details of the selected property. Here is the View representing the first menu:

<div class="well sidebar-nav">
 <li class="nav-header">Property list:</li>
 <ul class="nav nav-list" data-bind="foreach:assets">
  <li><a data-bind="text:name,click:select" href="#"></a></li>
 </ul>
 <ul class="nav nav-list">
  <li class="nav-header">Actions:</li>
  <li><a href="#" data-bind="click: newAsset"><i class="icon-pencil"></i>@BasicResources.NewProperty</a></li>
 </ul>
</div>

Foreach binding was used in order to render all the apartments. For each apartment an anchor tag is emitted. The text of this tag is bound to the name of the apartment and the click actions is bound to the select function. The creation of new asset is handled by the newAsset function of the OwnerViewModel.

The second part of the menu is defined directly as html. Three anchor tags are render, each of them pointing to different tab, using the same url pattern. For example the URL "#/{property-name}/overview" should navigate to the "Overview" tab of the property with given name.

Client side routing is used, in order to execute certain actions depending on the accessed url. In order to enable client side rendering Path.JS library is used. The attribute binding of knockout is used to render the correct anchor tag.

<div class="well sidebar-nav" data-bind="with:selectedAsset">
 <ul class="nav nav-list">
  <li class="nav-header" data-bind="text:name"></li>
  <li><a data-bind="attr: {href: '#/' + name() + '/overview'}"><i class="icon-pencil"></i>Overview</a></li>
  <li><a data-bind="attr: {href: '#/' + name() + '/charges'}"><i class="icon-arrow-down"></i>Charges</a></li>
  <li><a data-bind="attr: {href: '#/' + name() + '/rents'}"><i class="icon-arrow-up"></i>Rents</a></li>
 </ul>
</div>

You can also notice, that the with binding was used to set the current asset view model as the parent for the navigation div. The right part simply contains all of the 3 tabs (overview, charges or rents), only one of them visible at time. In order to separate the content into multiple files, partial rendering of ASP.MVC is used.

<div id="assetDetail" class="span9" data-bind="template: {data: selectedAsset, if:selectedAsset, afterRender: detailsRendered}">
 <div id="overview">
  @Html.Partial("Overview")
 </div>
 <div id="charges">
  @Html.Partial("Charges")
 </div>
 <div id="rents">
  @Html.Partial("Rents")
 </div>
</div>

Again the with binding is used as the selected apartment's ViewModel is used to back-up this part of the view.

Now let's go back to the ViewModels. ChargeViewModel and RentViewModel have a same ancestor which is called ObligationViewModel. Since both rents and charges how some common properties such as the amount or the regularity, a common parent ViewModel is good place to define them.
The most interesting part of ChargeViewModel is the save function which uses JQuery to emit a HTTP request to the ChargesController. As previously described, two different operations are exposed with the same url, one for creation (HTTP PUT) another for update (HTTP POST). The ViewModel uses a new flag to distinguish these two cases. Before the request is executed, the ViewModel uses Knockout.Validation plugin to perform this check with the errors property.
self.save = function () {
 if (self.errors().length != 0) {
  self.errors().showAllMessages();
  return;
 }

 self.isBusy(true);
 data = self.toDto();
 var rUrl = "/../api/charges?assetID=" + self.parent.id();
 if (self.isNew())
  var opType = "post";
 else
  var opType = "put";

 $.ajax(rUrl, {
  data: JSON.stringify(data),
  type: opType, contentType: "application/json",
  success: function (result) {
   self.isBusy(false);
   self.message(result.message);
   if (self.isNew()) {
    self.update(result.dto);
    parent.charges.push(self);
   }
  }
 });
}

When there are no validation errors, the object which will be sent to the server is created from the ViewModel by the toDto method. It does not make sense to serialize the whole ViewModel and send it to the server. In the toDto method the ViewModel is converted to an JSON object which can be directly mapped to the server side entity. The ajax method of jQuery is called, which creates new HTTP request.

When the response from the server comes back, the callback is executed, which performs several operations. Besides updating the GUI-helpful variables the callback performs two different operations. If the new charge was added, then it has to be added also to the parent ViewModel(appartment - represented by AssetViewModel). The new charge also recieved the server side ID which has to be updated. All other properties are already up-to-date.

Removing charge

The delete operation is very simple. Only asset and charge ids have to be supplied to the controller. If the operation has succeed, then againg the collection of charges inside AssetViewModel has to be updated.

self.remove = function () {
 $.ajax("/../api/charges/" + self.id() + "?assetID=" + self.parent.id(), {
  type: "delete", contentType: "application/json",
  success: function (result) {
   self.isBusy(false);
   parent.charges.remove(self);
   parent.selectedCharge(null);
  }
 });
};

Charges View

The charges view is classic master detail view. We have list of items on the left side and the detail of one of the items on the right.  A table of charges is rendered using the foreach binding and then the currently selected charge is rendered in a side div tag using the with binding.

<div class="row-fluid">
 <table class="table table-bordered table-condensed">
  <tbody data-bind="foreach: charges">
   <tr style="cursor: pointer;" data-bind="click: select">
    <td style="vertical-align: middle">
     <div data-bind='text: name'></div>
    </td>
    <td style="vertical-align: middle">
     <div data-bind="text: amount"></div>
    </td>
    <td style="vertical-align: middle">
     <div data-bind="text: amount"></div>
    </td>
    <td>
     <button type="submit" class="btn" data-bind="visibility: !isNew(), click:remove"><i class="icon-trash"></i></button>
    </td>
   </tr>
  </tbody>
 </table>
</div>
You can see, that the click action of the table row is bound the the select method of the ChargeViewModel

Using the KoExtensions

As you can see there is a pie chart representing the charges repartition. This chart is rendered using D3JS, more specifically by a special binding of a small project of mine called KoExtensions. The rendering of the graph is really simple. The only thing to do is to use the piechart binding which is part of KoExtensions. This binding takes 3 parameters: the collection of the data to be rendered, transformation function to indicate which values inside the collection should be used to render the graph and the last but not least the initialization parameters.


<div data-bind="piechart: charges, transformation:obligationToChart"></div>
function obligationToChart(d) {
 return { x: d.name(), y: d.amount() };
}
 
In order to render the graph, the KoExtensions binding needs to know which value in one concretion collection item specifies the with of each arc in the pie chart and which value is the title. Internally these values are called simply x and y. The developer has to specify function which for each item in the collection returns {x,y} pair. The transformation function uses the name and the amount values of the charge. The initialization parameters of the chart are not set, so the default once are used.

Bootstrap style date-time picker

Bootstrap does not contain a date-time picker nor is it on their roadmap. Luckily the community came up with a solution. I have used the one called bootstrap-datepicker.js. Since I needed to use it with Knockout, I have came up with another special binding which you can find in KoExtensions, it's usage is fairly simple.

<div class="controls">
 <input type="text" data-bind="datepicker:end">
</div>
 

Binding to the map

The last usage of KoExtensions is the rendering of the map containing all the assets in the left hand bar. I have created a binding which enables the rendering of one or more ViewModels on the map, by specifing which property contains the latitude and longitude values. Here the binding is used withing a foreach binding, in order to display all the appartments in the map.

<div class="row-fluid">
 <div data-bind="foreach: assets">
  <div data-bind="latitude: lat, longitude:lng, map:map, selected:selected">
  </div>
 </div>
 <div id="map" style="width: 100%; height: 300px">
 </div>
</div>
 
The map has to be initialized the usual way as described in the official google maps tutorial, the binding does not take care of this. This enables the developer to define the map exactly the way he likes. Any other elements can be rendered on the same map, simply by passing the same map object to other bindings. The selected property which is passed in the binding tells the binding which variable it should update or which function to call when one element is selected in the map.

Knockout Validation and Bootstrap styles

One of the Knockout's features which make it a really great tool, is the styles binding, providing you with the ability to associate one concrete css style to an UI component, if some condition in the ViewModel was met. One of the typical examples is giving the selected row in a table a highlight.

<tr style="cursor: pointer;" data-bind="css : {info:selected},click: select">...</tr>

Bootstrap provides styles for highlighting UI components such as textboxes and are ready to use.

Knockout-Validation is a great plugin which extends any observable value with isValid property, and enables the developer to define rules which will determine the value of this property.

self.amount = ko.observable().extend({ required: true, number: true });
self.name = ko.observable().extend({ required: true });

<div class="control-group" data-bind="css : {error:!name.isValid()}">
 <label class="control-label">Name</label>
 <div class="controls">
  <input type="text" placeholder="@BasicResources.Name" data-bind="value:name">
  <span class="help-inline" data-bind="validationMessage: name"></span>
 </div>
</div>
<div class="control-group" data-bind="css : {error:!amount.isValid()}">
 <label class="control-label">@BasicResources.Amount</label>
 <div class="controls">
  <input type="text" placeholder="@BasicResources.Amount" data-bind="value: amount">
  <span class="help-inline" data-bind="validationMessage: amount"></span>
 </div>
</div>
 


By combining Bootstrap with Knockout-Validation, we can achieve a very nice effect of highlighting when the value is invalid.

What is not described in this article

I did not describe every line of code, but since the project is available at my Github account, you can easily examine it. There are interesting parts at which you might take a look at: JavaScript unit tests, integration test for WebAPI Controller, bundles to regroup and minimize several JS files. Also please not, that the code is not perfect I have used it to play around, not to create a production ready application.

Summary

I think that the frameworks which I have used are all great at what they do. RavenDB in a .NET project is extremely not-present. You don't even have to think about your data storage layer. I know that this DB has much more to offer, but I did not dig to it enough to be able to talk about performance or optimization it provides, but I will definitely check it out later.
KnockoutJS is great at UI data binding. It does not pretend to do more but it does that perfectly. There is not a better tool to declaratively define UI and comportement. And any-time there is some challenging task to do, Knockout usually provides an elegant way to achieve it (like css style binding for the validation).
D3.js even though I did not use it a lot is very powerfull. You can visualize any data any way you want. The only minus might be it's size.
And bootstrap is finally a tool which enables us to get out usable UI in reasonable time, without having a designer at our side. This was not really possible before. Go and use them.

úterý 2. října 2012

Introduction to Fakes and migration from Moles

Fakes is a new test isolation framework from Microsoft. It is inspired by and resembles to Moles a framework which I have described in one of my previous blog posts. In this post I will briefly describe Fakes and than show the steps which have to be taken when migrating from Moles. You will find that the migration itself is not that complicated. Besides some changes in the structure of the project only few changes are needed in the code.

Code example related to this post are available at this GitHub repository.

Fakes framework contains two constructs which can be used to isolate code:
  • Stubs – should be used to implement interfaces and stub the behavior of public methods.
  • Shims – allow mocking the behavior of ANY method, including static and private methods inside .NET assemblies.
Stubs are generated classes. For each public method in the original interface a delegate is create which holds the action that should be executed while invoking the original method. In the case of Shims, such delegate is generated for all methods, even the private and static ones.

When using Stubs, you provide mocked implementation of any interface to the class or method which you are testing. This is done transparently before compilation and no changes are made to the code after the compilation. On the other hand Shims use the technic called IL weaving (injection of MSIL assembly at runtime). This way the code which should be executed is replaced at runtime by the code provided in the delegate.

The framework has caused some interesting discussions across the web. The pluralsight blog has found some negative points Rich Czyzewski describes how noninvasive tests can be create while using Fakes. And finally David Adsit nicely summarizes the benefits and the possible usage of the Fakes. From what has been said on the net, here is a simple summary of negative and positive points.

Pros
  • Lightweight framework, since all the power of this framework is based only on generated code and delegates.
  • Allows stubbing of any method (including private and static methods).
  • No complicated syntax. Just set the expected behavior to the delegate and you are ready to go.
  • Great tool when working with legacy code.
Cons
  • The test are based od generated code. That is to say, a phase of code generation is necessary to create the “stubs”.
  • No mocking is built-in into the framework. There is no built-in way to test whether a method has been called. This however can be achieved by manually adding specific code inside the stubbed method.

Migrating from Moles to Fakes

First a small warning, A bug was apparently introduced by the Code Contracts team, which causes a crash when building a solution which uses Fakes. You will need to install the latest version of Code Contracts. (If you do not know or use Code Contracts, you should not be impacted).

If you have already used Moles before, you might be wondering, how much code changes will the migration need. To give you a simple idea, I have migrated the code from my previous post about Moles in order to use Fakes. Two major steps have to be taken during the migration:
  • Change the structure of the project and generate new stubs
  • Rewrite the unit tests to use newly generated classes
To prepare the solution we have to remove all the references to Moles as well as the .moles files which were previously used during the code generation by the Moles framework. Next step is the generations of new stubs using Fakes framework. This is as simple as it has been before. Open the references window and right click on the DLL for which you want to generate the stubs. Than you should be able to select "Add Fakes Assembly” from the menu.

Following images show the difference between the old and new project structure (also note that I was using VS 2010 with Moles and I am now using VS 2012 with Fakes).

image





------------------>
image
The next step are the code changes.

Rewriting code using Shims

Here is a classical example of testing a method which depends on DataTime.Now value. The first snippet is isolated using Moles and the second contains the same test using Fakes:
[TestMethod]
[HostType("Moles")]
public void GetMessage()
{
  MDateTime.NowGet = () =>
  {
    return new DateTime(1, 1, 1);
  };
  string result = Utils.GetMessage();
  Assert.AreEqual(result, "Happy New Year!");
}

[TestMethod]
public void GetMessage()
{
 using (ShimsContext.Create())
 {
   System.Fakes.ShimDateTime.NowGet = () =>
   {
    return new DateTime(1, 1, 1);
   };
   string result = Utils.GetMessage();
   Assert.AreEqual(result, "Happy New Year!");
 }
}
The main differences:

  • Methods using Shims, do not need the HostType annotation previously needed by Moles.
  • On the other hand a ShimsContext has to be created and later disposed when the stubbing is not needed any more. The using directive provides a nice way to dispose the context right after its usage and marks the code block in which the system has “stubbed” behavior.
  • Only small changes are needed due to different names generated classes.

Rewriting code which is using only Stubs


Here the situation is even easier. Besides the changes in the naming of the generated classes, no additional changes are needed to migrate the solution. The following snippet test “MakeTransfer” method, which takes two accounts as parameters.

The service class containing the method needs Operations and Accounts repositories to be specified in the constructor. The behavior of these repositories is stubbed. This is might be typical business layer code of any CRUD application. First let’s see the example using Moles.
[TestMethod]
public void TestMakeTransfer()
{
 var operationsList = new List<Operation>();

 SIOperationRepository opRepository = new SIOperationRepository();
 opRepository.CreateOperationOperation = (x) =>
 {
  operationsList.Add(x);
 };

 SIAccountRepository acRepository = new SIAccountRepository();
 acRepository.UpdateAccountAccount = (x) =>
 {
  var acc1 = _accounts.SingleOrDefault(y => y.Id == x.Id);
  acc1.Operations = x.Operations;
 };

 AccountService service = new AccountService(acRepository, opRepository);
 service.MakeTransfer(_accounts[1], _accounts[0], 200);
 Assert.AreEqual(_accounts[1].Balance, 200);
 Assert.AreEqual(_accounts[0].Balance, 100);
 Assert.AreEqual(operationsList.Count, 2);
 Assert.AreEqual(_accounts[1].Operations.Count, 2);
 Assert.AreEqual(_accounts[0].Operations.Count, 3);
}

Note the way the repository methods are stubbed. Due to the fact that the stubs affect the globally defined variables (the list of operations, and the list of accounts) we can make assertions on these variables. This way we can achieve “mocking” and be sure that the CreateOperation method and the UpdateAccount method of Operation and Account repository have been executed. The operationsList variable in this example acts like a repository and we can easily Assert to see if the values have been changed in this list.

Let’s see the same example using Fakes:
[TestMethod]
public void TestMakeTransfer()
{
 var operationsList = new List<Operation>();

 StubIOperationRepository opRepository = new StubIOperationRepository();
 opRepository.CreateOperationOperation = (x) =>
 {
  operationsList.Add(x);
 };

 StubIAccountRepository acRepository = new StubIAccountRepository();
 acRepository.UpdateAccountAccount = (x) =>
 {
  var acc1 = _accounts.SingleOrDefault(y => y.Id == x.Id);
  acc1.Operations = x.Operations;
 };

 AccountService service = new AccountService(acRepository, opRepository);
 service.MakeTransfer(_accounts[1], _accounts[0], 200);
 //the asserts here....
}

You can see, that the code is almost identical. The only difference is in the prefix given to the stubs (SIAccountRepository becomes StubIAccountRepository). I am almost wondering if MS could not just keep the old names, than we would just need to change the using directive…

Fakes & Pex


One of the advantages of Moles compared to other isolation frameworks, was the fact that it was supported by Pex. When Pex explores the code, it enters deep into any isolation framework which is used. Since Moles is based purely on delegates, Pex is able to dive into the delegates and generated tests according the content inside the delegates. When using another isolation framework, Pex will try to enter the isolation framework itself, and thus will not be able to generate valid tests.

So now, when Fakes are here as replacement of Moles, the question is whether we will be able to use Pex with Fakes? Right now it is not possible. Pex add-on for Visual Studio 11 does not (yet) exists and I have no idea whether it will ever exists.

I guess Pex & Moles were not that adopted by the community. On the other hand both were good tools and found their users. Personally I would be glad if MS continued the investment into Pex and automated unit testing though I will not necessarily use it everyday in my professional projects. On the other hand I would always consider it as an option when starting new project.

neděle 2. září 2012

Reflection & Order of discovered properties

In .NET environement Reflection provides several methods to obtain information about any type from the type system. One of these methods is GetProperties method which retrieves a list of all the properties of a given type. This method returns an array of PropertyInfo objects.
PropertyInfo[] propListInfo = type.GetProperties();
In most cases you don't care, but the order of the properties does not have to be the same if you run this method several times. This is well described in the documentation of this method. Microsoft also states, that your code should not be depending on the order of the properties obtained.

I had a very nice example of a bug resulting from the misuse of this method. A ObjectComparer class, which is dedicated to comparison of two objects of same type by recursively comparing it's properties, which I have inherited as legacy code on my current Silverlight project.

I have noticed, that the results of the comparison are not the same every time I run the comparison. Concretely the first time the comparison was run on two same objects it always told me, that the objects are not equal. Take a look at the problematic code, which I have simplified a bit for this post:
private static bool CompareObjects(object initialObj, object currentObj, IList<String> filter)
{
 string returnMessage = string.Empty;

 Type type = initialObj.GetType();
 Type type2 = currentObj.GetType();

 PropertyInfo[] propListInfo = type.GetProperties(BindingFlags.GetProperty | BindingFlags.Public | BindingFlags.Instance).Where(x => !filter.Contains(x.Name)).ToArray();
 PropertyInfo[] propListInfo1 = type2.GetProperties(BindingFlags.GetProperty | BindingFlags.Public | BindingFlags.Instance).Where(x => !filter.Contains(x.Name)).ToArray();

 //if class type is native i.e. string, int, boolean, etc.
 if (type.IsSealed == true && type.IsGenericType == false)
 {
  if (!initialObj.Equals(currentObj))
  {
   return false;
  }
 }
 else //class type is object
 {
  //loop through each property of object
  for (int count = 0; count < propListInfo.Length; count++){
   var result = CompareValues(propListInfo[count].GetValue(initialObj),propListInfo2[count].GetValue(currentObj));
   if(result == false) {
    return result;
   }
  }
 }
 return true;
}
So in order to correct this code, you will have to order both arrays by MetadataToken, which is a unique identifier of each property.
propListInfo = propListInfo.OrderBy(x=>x.MetadataToken).ToArray();
propListInfo1 = propListInfo1.OrderBy(x=>x.MetadataToken).ToArray();
There is some more information about how reflection works in this blog post. The issue is that the Reflection engine holds a "cache" for each type, in which it stocks the already "discovered" properties. The problem is that, this cache is cleared during garbage collection. When we ask for the properties, than they are served from the cache in the order in which they have been discovered.

However in my case, this information does not help. The issue occures only the first time that I ask the ObjectComparator to compare the objects and there is no reason that there should be any garbage collection between the first and second run...well no idea here. Sorting by MetadataToken has fixed the issue for me.

středa 27. června 2012

Programming languages for the age of Cloud

This posts talks about the aspects which are influencing computer languages these days. We are in the age when the sequential execution is over. Even your laptop has a processor with several cores. The cloud provides us with tons of machines whic we can use to run our code on. We are in the age of distribution, parallelization, asynchronous programming and concurrency. As developers we have to deal with the chalenges which arise from this new environement. Computer language scientists have worked on the subject since the seventies. Nowadays concepts which have been studied for long time, influence the mainstream languages. This post describes how. The motivation for this post was this panel discussion at the last's year Lang.NEXT conference, where one of the greatest language architects of these days discuss what the ideal computer language should look like (Anders Hejlsberg - C#, Martin Odersky - Scala, Gilad Bracha - Newspeak, Java, Dart and Peter Alvaro).

Web and Cloud programming

"Web and cloud programming" was the title of the mentioned panel discussion. Cloud programming is quite noncommittal term. What do we mean by "cloud programming"? Is it programming on the cloud (with the IDE in the cloud)? or programming applications for the cloud (but that can be just any web app right)? It turns out this is just a term to depict the programming in distributed environment.

Programming in distributed environment

Programming in distributed environment is much better term - but again it might not be completely clear. The majority of todays applications is not sequential anymore. The code flow of the program is parallel, asynchronous and the program has to react to external events. The code and the application itself is distributed. It might be distributed over several cores, or nodes or it might be just server side - client side code separation. You can have code running on the backend, some another bits (maybe in different language) running on the front, some code is waiting for response from a web service and some other code is waiting for the response of the user on the client side. You as a developer have to handle the synchronization.

We might actually say that todays web programming is distributed and asynchronous. As developers, we have to make the switch from the traditional sequential programming to the distributed and asynchronous code. The advent of cloud computing is forcing this transition.

Non-sequential, parallel or asynchronous code is hard to write, hard to debug and even harder to maintain. Write asynchronous code is challenging, however write asynchronous application in a transparent maintainable manner might feel impossible. Just think about the global variables which you have to create to hold the information about ‘current situation’, so that when a response from a web service arrives you are able to decide and take the right actions. It is this maintance of the global state which is particulary dificult in asynchronous programming.

What are the tools which will help us with the transition to distruted asynchronous or parallel coding?

Here is a list of 3 tools which I think might be helpful:

  • Conteptual models - As developers we can follow some conceptual model - for instance the actors model in order to organize and architecture the program.
  • Libraries - To implement one of the models (or design patterns) we can use tested and well document code – for instance Akka
  • Computer languages - The biggest helper will be on the lowest level - the computer language itself.

Models, Libraries and languages

Libraries are and will be the principal tools to make developers life easier. There are several libraries available to help with the asynchronous, event-driven programming for many different languages. Akka, Node.JS or SignalR are just examples. But libraries themselves are build using languages. So the question is: What can languages bring to help make the life easier for developers in the age of cloud and distribution?

Modern languages have two characteristics:

Benefits of functional languages

Functional languages might be one of the helpers in the age of distributed computing. Some imperative languages are introducing functional aspects (for instance C# has been heading that way since long time), another ones designed from scratch are much more closer to "pure" functional style (Scala, F#, or the puriest - Haskel). Let's first define some terms and possible benefits of functional programming. From my point of view (and I admit quit simplified point of view) there are four aspects of functional programming that are useful in everyday coding.

  • Elimination of the "global state" – the result of method is guaranteed no matter what the actual state is.
  • The ability to treat functions as first class citizens.
  • Presence of immutable distributable structures - mainly collections.
  • Lazy evaluation – since there is no global state, we can postpone the execution and evaluation of methods till the time the results are needed.

Eliminating the state

It's hard to keep shared state in parallel systems written in imperative languages. Once we leave the save sequential execution of the language, we are never sure what are the values in the actual state. Callbacks might be executed about any time depending for example on network connection and the values in the main state could have changed a lot from the time of the "expected execution". Purely functional programming eliminates the outer "state of the system". The state has to be passed always locally. If we imagine such a language all the methods defined would need and additional parameter in the signature to pass the state.

int computeTheTaxes (List<income> incomes, StateOfTheWorld state);

That is really hard to imagine. So as it has been said in the discussion: pure functional programming is a lie. However we can keep this idea and apply it to some programming concerns. For instance the immutability of the collections might be seen as application of “no current state" paradigm.

Since the “current state” does not exists, the result of a function invoked with the same arguments should be always the same. This property is called “Referential transparency” and enables the lazy evaluation. So the elimination of the global state  might be seen as the pre-condition for using other functional language features such as lazy evaluation

Function as a first class citizen

Another aspect of functional programming is the fact that functions become first class citizens. That means, that they can be passed as arguments to other functions (these are called higher order functions). This is extremely powerful concept and you can do a lot with it. Functions can be also composed and the functional compositions applied to values. So instead of applying a several consecutive functions on a collection of values, we can compose the resulting function and apply it at once. in C# the LINQ uses a form of functional composition, which will be discussed later.

Lambdas and Closures

Lambdas are anonymous functions - defined on the fly. Closure add the ability to capture the variables defined around the definition of the function. C# has closures and lambdas since the version 3.0, they should finally arrive to Java in the version 8. Talking about JavaScript, it just seems that they have always been there. Anytime you define an anonymous function directly in the code you know you can use the variables from the current scope in your function. Hence you are creating a closure.

var a = 10;
var input = 5
request(input, function(x) { 
   a = x;
});

Any time you use the variable from outer scope, in the inner anonymous function, we say that the variable is captured

Closures are also available in Python and since the version 11 they are even available in C++. Let's stop for a while here, because C++ adds the ability to distinguish between variables captured by reference and variables captured by value. The syntax for lambdas in C++ is a bit more complicated, but allows the developers for each variable to specify how it should be captured. In this folowing code the v1 variable is captured by value and all the variables are captured as references. So the value of v2 will depend on what happened before the lambda actually executed.

int v1 = 10;
inv v2 = 5;
for_each( v.begin(), v.end(), [=v1,&] (int val)
{
    cout << val + v2 - v1;
});

You can see, that even such and old school imperative language like C++ has been influenced and modified to embrass functional programming.

Closures add the ability to use the current state from the moment of the definition of the anonymous function, as the current state used during the functional execution.

Collections in functional programming

In functional programming languages (the pure ones), collections are immutable. Instead of modification a copy of the collection is returned on each operation which is performed on the collection. It is up to the designers of the language to force the compiler to reuse the maximum of the existing collection in order to lower the memory consumption. This allows the programmer to write the computation as a series of transformations overt the collections. Moreover thanks to lambdas and closures, these transformations may be defined on the fly. Here is a short example:

cars.Where(x=>x.Mark == ‘Citroen’).Select(x=>x.MaxSpeed);

This transformation will return an iterator of speeds of all Citroens in the collection. Here I am using C#/F# syntax, however almost the same code would compile in Scala.

The selector (“Where”) and the mapper (“Select”) both take as argument a function which takes an item of the collection. In the case of the selector the function is a predicate which returns “true” or “false” in case of the mapper, the function just returns a new object. Thanks to lambdas we can define both of them on the fly.

Language integrated data queries

Lazy loading also comes from functional languages. Since the result of the function does not depend on the "state of the world" it does not matter when we execute any given statement or computation. The designers of C# inspired themselves while creating LINQ. LINQ just enables the translation of the above presented chain of transformations to another domain specific language. Since the lazy loading is used, each computation is not performed separately, but rather a form of “functional composition” is used and the result is computed once it is needed. If the ‘cars’ collection would an abstraction for relational database table, the result would be translated into “select maxSpeed from cars where mark=’Citroen’. Instead of two queries (on for each function call).

Inside LINQ translates the C# query (the dotted pipeline of methods) into an expression tree. The tree is then analyzed (parsed) into domain specific language (such as SQL). Expression trees are a way to represent code (computation) as data in C#. So in order to develop and integrate the LINQ magic into the language, the language needs to support function as first class citizen and also has to be able to treat code as data.

Maybe as I wrote it, you are thinking about JavaScript and you are right. In JavaScript you can pass around functions and you can also pass around code and later execute it (the eval function). No wonder that there are several implementations of LINQ for JavaScript.

Similar concept inspired some Scala developers and since Scala posseses the necessary language features, we might see similar concepts in Scala also (ScalaQL).

Dynamic or Typed languages

What are the benefits of Dynamic or Strongly typed language? Let's look for the answer in everyday coding: Coding in dynamic language is at least at the beginning much faster than in a typed language. There is no need to declare the structure of the object before using it. No need to declare the type of the simple values nor objects. The type is just determined on the first assignment.

What works great for small programs might get problematic for larger ones. The biggest advantage of typed system in the language is the fact, that it is safer. It won't let you assign apples to oranges. It eliminates much of the errors such as looking for non-existing method of type.

The other advantage is the tooling which comes with the language. Auto-completion (code completion based on the knowledge of the type system) being example of one such a tool. The editor is able to propose you the correct types, methods, or properties. The types structure is used for analysis or later processing. For instance documentation might be easily generated from the type system just by adding certain metadata.

Several languages offer compromises between the strongly typed (safe) and dynamic (flexible) world. In C# we can use the dynamic keyword to postpone the determination of the object type to runtime. DART offers optional type system. Optional type systems let us use the tooling, without polluting our lifes with too much typing exercises. This comes handy sometimes.

JavaScript as the omnipresent language


JavaScript is everywhere and lately (with Node.JS and MS investing heavily into it) drawing more and more attention. The language has some nice features: it treats the functions as first class citizens, supports closures, it is dynamic, but one big drawback: It absolutely lacks any structure. Well it's not typed language, so we cannot expect any structure in the type system, but it also lacks any modularization.

Objects are defined as functions or 'just on the fly'. And there is always this giant current state which holds all the variables and everything, which get's propagated everywhere. I still did not learn how to write good structured JS programs. And there are to many concepts of JavaScript which I did not understand completely. As it has been said in the discussions: you can probably write big programs in JavaScript, but you cannot maintain them. That's why Google is working on DART. The future version of ECMAScript will try to solve the problems of JavaScript by bringing modular systems,classes, static typing. But the big questions will be of course the adoption by the browsers.

Summary

  • Libraries will be always the core pieces to enable writing distributed software
  • Language should be designed in a way to minimize the state and control 'purity' – functional languages are well studied and concepts coming from functional languages will become omnipresent in everyday programming.
  • Type systems should be there one we need them and should get out of our way when we don’t.
The future might be interesting. Lately I have been forced to write a lot of Java code and interact with some legacy code(<1.5) and besides the typing exercise it does not provide me any benefits. It just bores me. Well I am a fun of C#, because the authors of C# seem to search interesting features from other languages or concepts (closures, expression trees, dynamic typing, or later incorporating the asynchronous model directly to the language) and introduce them to the well established static typed, compiled (and for me well known) world.

But whether it is Scala, Python, Dart, JavaScript or C#/F# - I thing we should be trying to adopt modern languages as fast as possible and that for just one reason: to express more with less code.

středa 2. května 2012

Mocking the generic repository

This post describe one way to mock the generic repository. It assumes that you are familiar with the Service <-> Repository <-> Database architecture.
Another pre-requisity is the knowledge of the repository pattern and it's generic variant.

In the majority of my projects I am using the following generic repository class.
public interface IRepository
{
 T Load<T>(object id);
 T Get<T>(object id);
 IEnumerable<T> Find<T>(Expression<Func<T, bool>> matchingCriteria);
 IEnumerable<T> GetAll<T>();
 void Save<T>(T obj);
 void Update<T>(T obj);
 void Delete<T>(T obj);
 void Flush();
 int CountAll<T>();
 void Evict<T>(T obj);
 void Refresh<T>(T obj);
 void Clear();
 void SaveOrUpdate<T>(T obj);
}

Based on this technique, some people decide to implement concrete classes of this interface (CarRepository : IRepository), whereas others decide to keep using the generic implementation. That depends on the ORM that you are using. With EF and NHibernate you can easily implement the generic variant of the repository (check the links).

I am also using the generic variant (mostly with NHibernate). Now the question is: How to mock this generic repository? It can be a bit tricky to mock. When you have one class for each repository which works for one concrete type you can mock the repository quite easily. For example StudentRepository which handles entities of type Student might be backed up a list of students.

While when working with generic repository, it might be a bit harder. Here is how I have solved the problem:
public class MockedRepository :IRepository
{
 public MockedRepository()
 {
  cities = DeserializeList<City>("CityDto");
  stations = DeserializeList<Station>("StationDto");
  tips = DeserializeList<InformationTip>("InformationTipDto");
  countries = DeserializeList<Country>("CountryDto");
  
  dataDictionary = new Dictionary<Type, object>();
  dataDictionary.Add(typeof(City), cities);
  dataDictionary.Add(typeof(Station), stations);
  dataDictionary.Add(typeof(InformationTip), tips);
  dataDictionary.Add(typeof(Country), countries);
  }   

 public T Get<T>(object id)
 {
  Type type = typeof(T);
  var data = dataDictionary[type];
  IEnumerable<T> list = (IEnumerable<T>)data;
  var idProperty = type.GetProperty("Id");
  return list.FirstOrDefault(x=>(int)idProperty.GetValue(x,null) == (int)id);
 }

 public IEnumerable<T> Find<T>(Expression<Func<T, bool>> matchingCriteria)
 {
  Type type = typeof(T);
  var data = dataDictionary[type];
  IEnumerable<T> list = (IEnumerable<T>)data;
  var matchFunction = matchingCriteria.Compile();
  return list.Where(matchFunction);
 }

 public IEnumerable<T> GetAll<T>()
 {
  Type type = typeof(T);
  return (IEnumerable<T>)dataDictionary[type];
 }

 public void Save<T>(T obj)
 {
  Type type = typeof(T);
  List<T> data = (List<T>)dataDictionary[type];
  data.Add(obj);
 }
}
The main building block of this mocked repository is the dictionary which contains for each type in the repository the enumerable collection of objects. Each method in the mocked repository can use this dictionary to determine which is the collection addressed by the call (by using the generic type T.).
Type type = typeof(T);
var data = dataDictionary[type];
IEnumerable<T> list = (IEnumerable<T>)data;
Now what to do next, depends on each method. I have shown here only the methods which I needed to mock, but the other ones should not be harded to mock. The most interesting is the Find method, which takes as the parameter the matching criteria. In order to pass this criteria to the Where method on the collection, this criteria (represented by an Expression) has to be compiled into a predicate Func (in other words function which takes an object of type T and returns boolean value.

The Get also has some hidden complexity. In this implementation I assume, that there is a Id property defined on the object of type T. I am using reflection to obtain the value of that property and the whole thing happens inside the a LINQ statement.

This repository might be useful, but it is definitely not the only way to isolate your database. So the question is - Should this be the method to isolate my Unit or Integration tests? Let's take a look at other possible options:

  • Use mocking framework (there is quite a choice here)
    This essentialy means that in each of your tests you define the behaviour of the repository class. This requires you to write a mock for each repository method that is called inside the service method. So it means more code to write. On the other hand you controll the behaviour needed for the particular tested method. While using mocking framework you have also the option to verify that methods have been caled.
  • Use the repository implementation and point it to in-memmory database (SQL Lite). That is a good option in the case when:
    • You are able to populate the database with the data.
    • You are sure of your repository implementation
  • Use the generic repository mock presented here. That is not a bad option if you have some way to populate the collections which serve as in-memmory database. I have used deserialization from JSON. Another option could be to use a framework such as AutoPoco to generate the data. You can also create one repository which can be used for the whole test suite (or application presentation).

Summary

As said before this might be a variant to consider. I am using it for Proof of Concepts and portable versions of database based applications. On the other hand for unit test you might consider either mocking framework or in-memory database. There is no clear winner in this comparison.

sobota 21. dubna 2012

Common.Logging and compatibility with other libraries

It has been the second time since I have run into the issue of configuring correctly Common.Logging on my project. So what is the problem? Let's start with the basics:

Common.Logging should be a generic interface for logging which can be used by other frameworks and libraries to perform logging. The final user (you or me) uses several frameworks in his final application and if all of these frameworks will use different logging framework it will turn into configuration nightmare.So our favorite frameworks such as Spring.NET, Quartz.NET are using Common.Logging. This interface in turn uses a concrete logging framework to perform the logging (the act of writing the log lines to somewhere).

Typical scenario can be for instance Common.Logging and Log4Net combination. In our application configuration file (web.config or app.config) we have to configure Common.Logging to use the Log4Net and than we can continue with the Log4Net configuration specifying what should be logged.

<common>
<logging>
  <factoryAdapter type="Common.Logging.Log4Net.Log4NetLoggerFactoryAdapter, Common.Logging.Log4Net">
 <arg key="configType" value="INLINE" />
  </factoryAdapter>
</logging>
</common>

<log4net>
<appender name="ConsoleAppender" type="log4net.Appender.ConsoleAppender">
  <layout type="log4net.Layout.PatternLayout">
 <conversionPattern value="%date %-5level %logger - %message%newline"/>
  </layout>
</appender>
</log4net>

My general problem is that Common.Loggin.Log4Net facade is looking for a concrete version of the Log4Net library. Concretely the version: 'log4net (= 1.2.10)'. That is not a problem if you are not using some other framework which depends on higher version of Log4Net.
In my case the le_log4net library (the logentries library) is using log4net 2.0. So if you are using NuGet, you might obtain the following exception while adding the references:


image


The similar thing might happen if you just decide to use the latest Log4Net by default. Then you might get an exception when initializing Spring.NET context or starting the Quartz.NET scheduler:


Could not load file or assembly 'log4net, Version=1.2.0.30714, Culture=neutral, PublicKeyToken=b32731d11ce58905' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)


Solution 1: Ignore NuGet, define Runtime Binding


One way to get around this is to define runtime assembly binding. But this solution forces you to add the reference to log4net manually. NuGet controls the version and wont let you at references on the fly the way that you would. So to get over add the latest Common.logging.Log4net façade and Log4Net version 2 (which you need for some reason). Than you have to define the assembly  binding in the configuration file.

<runtime>
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
  <dependentAssembly>
 <assemblyIdentity name="Common.Logging" publicKeyToken="af08829b84f0328e"/>
 <bindingRedirect oldVersion="1.2.0.0" newVersion="2.0.0.0"/>
  </dependentAssembly>
</assemblyBinding>
</runtime>

Solution 2: Just use the older version of Log4Net (1.2.10)


If you do not have libraries that are dependent on Log4Net version 2.0.0 than just remember to always use log4net 1.2.10. This is the version which Common.Logging.Log4Net is looking for. Or just let NuGet manage it for you. You can add Common.Logging.Log4Net via NuGet and it will automatically load the correct version of Log4Net.


Solution 3: Try other logging library for instance NLog


This actually is not a real solution. I have experienced similar issues while using NLog, concretely try to use the latest NLog library with the Common.Logging.Nlog façade and you will obtain something similar to:


{"Could not load file or assembly 'NLog, Version=1.0.0.505, Culture=neutral, PublicKeyToken=5120e14c03d0593c' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)":"NLog, Version=1.0.0.505, Culture=neutral, PublicKeyToken=5120e14c03d0593c"}


The solution here is similar, you will have to define Runtime Binding:

<runtime>
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
  <dependentAssembly>
 <assemblyIdentity name="NLog" publicKeyToken="5120e14c03d0593c" culture="neutral" />
 <bindingRedirect oldVersion="0.0.0.0-2.0.0.0" newVersion="2.0.0.0" />
  </dependentAssembly>
</assemblyBinding>
</runtime>

What was interesting here, is that NuGet actually took care of this for me. I have just added the Common.Logging.NLog façade and I guess NuGet spotted that I have already NLog 2 and that this Runtime Binding is necessary. If you look at the documentation of bindingRedirect you will see, that we have the right to specify the range of versions in the oldVersion attribute. Here all the version will be bound to the 2.0.0.0 version.


Summary


Anyway NLog and Log4Net are both cool logging frameworks, just use the one you prefer. As I have showed  above it is possible to use them together with Common.Logging it just takes a few more lines to configure it correctly.

pondělí 9. ledna 2012

NHibernate NFluent and custom HiLo generator

Azure SQL is not completely compatible with SQL Server. All the limitations are described over here. One of the limitations is that every table in Azure SQL needs CLUSTERED INDEX.

If you are using NHibernate & NFluent, than any identity mapping will create clustered index if it can.

If you want to use HiLo generator to get the ID's, than you need to configure special table for the generator. To use the generator you can let NHibernate to create the table.
Id(x => x.Id).GeneratedBy.HiLo("1000");
However this way it will create only one table with one ID. In a typical scenario you will want to use one table and store all the actual ID's in a particular row or column for each of the entities in the database.
Id(x => x.Id).GeneratedBy.HiLo("1000","hiloTable","myentity");
This supposes that you have a table called "hiloTable" which contains "myentity" column.

However you would have to write the script for the table creation, so you are loosing the possibility to run NHibernate and generate your database.

The solution which solves this two issues is to create own generator and base it on HiLo generator.
Here is the mapping for using own generator
Custom%lt;UniversalHiloGenerator%gt;(
x => x.AddParam("table", "NH_HiLo")
.AddParam("column", "NextHi")
.AddParam("maxLo", "10000")
.AddParam("where", "TableKey='BalancePoint'"));

When overriding the NHibernate.Id.TableHiLoGenerator we have the option to override the script which is used for the creation of the table containing the IDs. This can be achieved by overriding the SqlCreateStrings method which returns an array of Strings, which are executed as SQL scripts against the database.

public class UniversalHiloGenerator : NHibernate.Id.TableHiLoGenerator
{
public override string[] SqlCreateStrings(NHibernate.Dialect.Dialect dialect)
{
List commands = new List();
var dialectName = dialect.ToString();

if(dialectName != "NHibernate.Dialect.SQLiteDialect")
commands.Add("IF OBJECT_ID('dbo.NH_HiLo', 'U') IS NOT NULL \n DROP TABLE dbo.NH_HiLo; \nGO");

commands.Add("CREATE TABLE NH_HiLo (TableKey varchar(50), NextHi int)");

if (dialectName != "NHibernate.Dialect.SQLiteDialect")
commands.Add("CREATE CLUSTERED INDEX NH_HiLoIndex ON NH_HiLo (TableKey)");

string[] tables = {"Operation","Account"};

var returnArray = commands.Concat(GetInserts(tables)).ToArray();
return returnArray;
}

private IEnumerable GetInserts(string[] tables)
{
foreach (var table in tables)
{
yield return String.Format("insert into NH_HiLo values ('{0}',1)", table);
}
}
}

This code is quite simple. The sql scripts create the table for storing the ID's for all the entities in the database. In this particular case, in each row of the HiLo table there are two columns, one specifying the name of the table for which the ID is stored and in the second column is the ID.

The code also checks the dialect of the database. This way it can create an CLUSTERED index on the table (which will run fine for SQL server and Azure SQL and is REQUIRED for AZURE) and will skip the creation of the index SQL Lite, where clustered indexes do not exists.

In the example above two table entities are envisaged: Operations and Accounts in separate tables.

This way several issues are solved:
  • The schema of the database can be created automatically by NHibernate
  • The HiLo table is created for each entity. To add an entity you can simply just add the name of the entity into the list of tables.
  • Clustered index is created on the entity in the case that the script is not run against SQL lite.

sobota 12. listopadu 2011

Universal Naive Bayes Classifier for C#

This post is dedicated to describe the internal structure and the possible use of Naive Bayes classifier implemented in C#.

I was searching for a machine learning library for C#, something that would be equivalent to what WEKA is to Java. I have found machine.codeplex.com but it did not include the Bayesian classification (the one in which I was interested). So I decided to implement it into the library.

How to use it

One of the aims of machine.codeplex.com is to allow the users to use simple POCO's for the classification. This can be achieved by using the C# attributes. Take a look at the following example which treats categorization of payments, based on two features: Amount and Description.
First this is the Payment POCO object with added attributes:
public class Payment
{
    [StringFeature(SplitType = StringType.Word)]
    public String Description { get; set; }

    [Feature]
    public Decimal Amount { get; set; }

    [Label]
    public String Category { get; set; }
}
And here is how to train the Naive Bayes classifier using a set of payments and than classify new payment.
var data = Payment.GetData();            
NaiveBayesModel<Payment> model = new NaiveBayesModel<Payment>();
var predictor = model.Generate(data);
var item = predictor.Predict(new Payment { Amount = 110, Description = "SPORT SF - PARIS 18 Rue Fleurus" });

After the execution the item.Category property should be set to a value based on the analysis of the previously supplied payments.

About Naive Bayes classifier

This is just small and simplify introduction, refer to the Wikipedia article for more details about Bayesian classification.

Naive Bayes is a very simple classifier which is based on a simple premise that all the features (or characteristics) of classified items are independent. This is not really true in the real life, that is why the model is called naive.
The total probability of a item having features F1, F2, F3 being of category "C1" can be expressed as:

p(F1,F2,F3|C1) = P(C1)*P(F1|C1)*P(F2|C1)*P(F3|C1)

Where P(C1) is the A priory probability of item being of category C1 and P(F1|C1) is the Posteriori probability of item being of category C1 when it has feature F1.
That is simple for binary features (like "Tall", "Rich"...). For example p(Tall|UngulateAnimal) = 0.8, says that the posteriori probability for an animal to be and ungulate is 0.8, when it is a tall animal.

If we have continuous features (just like the "Amount" in the payment example), the Posteriori probability will be expressed slightly differently. For example P(Amount=123|Household) = 0.4 - can be translated as: the probability of the payment being part of my household payments is 0.4, when the amount was 123$.

When we classify, we compute the total probability for each category (or class if you want) and we select the category with maximal probability. We have to thus iterate over all the categories and all the features of each item and multiply the probabilities to obtain the probability of the item being in each class.

How it works inside

After calling the Generate method on the model a NaiveBayesPredictor class is created. This class contains the Predict method to classify new objects.
My model can work with three types of features (or characteristics, or properties):
  • String properties. These properties have to be converted to a binary vectors based on the words which they contain. The classifier builds a list of all existing words in the set and then the String feature can be represented as a set of binary features. For example if the bag of all worlds contains four words: (Hello, World, Is, Cool), than the following vector [0,1,0,1] represents text "World Cool".
  • Binary properties. Simple true or false properties
  • Continuous properties. By default these are Double or Decimal values, but the list could be extend to other types.
After converting the String features to binary features, we have two types of features:
  • Binary features
  • Continuous features
As mentioned in the introduction for each feature in the item we have to compute the A priori and Posteriori probabilities. The following pseudocode shows how to estimate the values of A priori and Posteriori probabilities. I use array-like notation, just because I have used arrays also in the implementation.

Apriori probability

The computation of Apriori probability will be the same for both type of features.

Apriori[i] = #ItemsOfCategory[i] / #Items

Posteriori probability

The Posteriori for binary features will be estimated:

Posteriori[i][j] = #ItemsHavingFeature[j]AndCategory[i] / #ItemsOfCategory[i]

And the Posteriori probability for contiunous features:

Posteriori[i][j] = Normal(Avg[i][j],Variance[i][j],value)

Where Normal references the normal probability distribution. Avg[i][j] is the average value of feature "j" for items of category "i". Variance[i][j] is the variance of feature "j" for items of category "i".
If we want to know the probability of payment with Amount=123 being of category "Food", we have the average of all payments of that category let's say: Avg[Food][Amount] = 80, and we have the Variance[Food][Amount] = 24, then the posteriori probability will be equal: Normal(80, 24, 123).

What does the classifier need?

The response to this question is quite simple, we need at least 4 structures, the meaning should be clear from the previous explication.

public double[][] Posteriori { get; set; }
public double[] Apriori { get; set; }
public double[][] CategoryFeatureAvg { get; set; }
public double[][] CategoryFeatureVariance { get; set; }

And how does it classify?

As said before the classification is a loop for all the categories in the set. For each category we compute the probability by multiplying apriori probability with posteriori probability of each feature. As we have two types of features, the computation differs for both of them. Take a look at this quite simplified code:

public T Predict (T item){
  Vector values; // represents the item as a vector
  foreach (var category in Categories)
  {
      for (var feature in Features)
      {
          if (NaiveBayesModel<t>.ContinuesTypes.Contains(feature.Type))
          {
              var value = values[feature];
              var normalProbability = Helper.Gauss(value, CategoryFeatureAvg[category][j], CategoryFeatureVariance[category][j]);
              probability = probability * normalProbability;
          }
  
          if (feature.Type == typeof(bool)) //String properties are converted also to binary
          {
              var probabilityValue = Posteriori[category][j];
          }
      }
  
      if (probability > maxProbability)
      {
          maxProbability = probability;
          maxCategory = category;
      }
  }
  item.SetValue(maxCategory);
}


That's all there is to it. Once you understand that we need just 4 arrays, it is just a question of how to fill these arrays, that is not hard (it should be clear from the previous explication), but it takes some plumbing and looping over all the items in the learning collection.
If you would like to see the Source Code - check my fork machine.codeplex.com.

úterý 26. dubna 2011

Pex & Moles - Testing business layer

The question is fairly simple: Should I use Pex to generate unit tests for my business layer?

Code examples related to this post are available at this GitHub repository.

In this post I would like to cover two parts:
  • Pex and Moles basics - just a quick overview, because this is covered by other blogs and by official documentation.
  • Using Pex to test business layer - I have been strungling to find a pattern to use Pex to generate unit tests for business layers of my applications. The problem is that there are quite a lot of samples which explain the basic and advanced aspects of Pex, but there is not that many examples which would show you have to use Pex in real life (putting aside the ambiguous definition of what real life is:).

Pex and Moles basics

Pex is a testing tool which helps you generate unit tests. Moles is a framework which enables you to isolate parts which are tested from other application layers.

Pex basics

Pex is a tool which can help you generate inputs for your unit tests. To use Pex you have to be writing Parametrized Unit Tests. Parametrized Unit Tests are simple tests which accept parameters and Pex could help you generate these parameters.

Lets take a look at a first example, here is a simple method which you would like to test:
public static string SomeDumbMethod(int i, int j)
{
 if (i > j )
 {
  if (j == 12)
   return "output1";
  else
   return "output2";
 }
 else
 {
  return "output3";
 }
}
To test this method, you should write at least 3 unit test - in order to cover all the branches of the method, thus cover all the possible outputs (that is not a generic rule). But instead of that we will write a unit test which accepts the possible inputs as parameters.
[PexClass(typeof(Utils))]
[TestClass]
public partial class UtilsTest
{
    [PexMethod]
    public string SomeDumpMethod(int i, int j)
    {
         string result = Utils.SomeDumbMethod(i, j);
         return result;
    }
}
I have decorated the method with PexMethod and the class with PexClass attribute this way Pex knows that this class is used to generate unit tests. So now to ask Pex to generate the inputs, click right on the body of the method and select Run Pex Explorations. Pex will generate 3 unit tests, which you can review in the Pex Window.

How does Pex work

Pex is using static analysis of your code, to determine which inputs will achieve the maximal coverage of exposed method. Pex does not randomly pick values to use as inputs, instead of that Pex is using an algebraic solver (MS Research Z3 project)to determine what values of parameters will suite the conditions leading to enter a not-yet explored branch of code.
The main force of Pex is above all the ability to generate parameters which would allow to cover all the branches of tested method.

Moles basics

Moles is a stubbing framework. It allows you to isolate the parts of the code which you want to test from other layers. Several other stubbing or mocking frameworks (RhinoMock, NMock) are out there for free or not, so the question is which is the advantage of Moles?
There are basically two reasons why use Moles:
  • Moles works great with Pex. Because Pex explores the execution tree of your code, so it also tries to enter inside all the mocking frameworks which you might use. This can be problematic, since Pex will generate inputs which will cause exceptions inside the mocking frameworks. By contrast Moles generates simple stubs of classes containing delegates for each method, which are completely customizable and transparent.
  • Moles allows to stub static classes, including the ones of .NET framework which are usually problematic to mock(typically DateTime, File, etc)
As it says on the official web: "Moles allows you to replace any .NET method by delegate". So before writing your unit test, you can ask Moles to generate the needed stubs for any assembly (yours or other) and than use these moles in your tests.
Instead of complicated descriptions, here is a simple method, which checks the actual date and outputs a string based on the date:
public static String GetMessage()
{
 if (DateTime.Now.DayOfYear == 1)
 {
  return "Happy New Year!";
 }
 return "Just a normal day!";
}
Now to test this method, we need to be able to set the output of the static DateTime.Now property. Moles will help us to achieve this. You can see that in the following testing method I use MDateTime which is a mole for DateTime class, which allows me to set the delegate NowGet, which gets called when asked for DateTime.Now. To be able to use MDateTime you have to add the moles assemblies by right clicking the References in your project.
After that you can write your method as follows:
[PexMethod]
public string GetMessage(bool newyear)
{
 MDateTime.NowGet = () =>;
  {
   if (newyear)
   {
    return new DateTime(1,1,1);
   }
   return new DateTime(2,2,2);
  };

 string result = Utils.GetMessage();
 return result;
}
Note that here I am using Pex play around a bit. I want to test both branches of my method. The only possibility which Pex has to influence the executed brunch is by generating parameters. So I add a bool parameter to the test method, which I will ask Pex to generate. Here is the result which I get:
This was a particular case, but the approach should be always the same. When stubs are needed for certain assembly you can always generate them by right-clicking the reference and selecting Add moles assembly. Than you can use these stubs as any other classes in your test methods.

Use Pex to test business layer

So you are probably thinking that all that is nice, but it does not really serve in real projects? That is what I am sometimes thinking also, so here I would like to present an attempt to use Pex to test business layer of a typical Bank application. This application uses Repository pattern. Simply service classes which provide the business methods (like MakeTransfer etc.) use repositories to access the database (or any other data source).

In this example I introduce an AccountService class, which depends on two repositories: AccountRepository and OperationRepository. Here are the definitions of the repositories:
public interface IOperationRepository
{
 void CreateOperation(Operation o);
}

public interface IAccountRepository
{
 void CreateAccount(Account account);
 Account GetAccount(int id);
 void UpdateAccount(Account account);
}
The actual implementations of these repositories are not important, since I want to test just the AccountServices class which is dependend on these two repositories. To test just AccountServices class I will mock these repositories (but about that later).
Here is AccountServices class:
public class AccountService
{
 private IAccountRepository _accountRepository;
 private IOperationRepository _operationRepository;

 public AccountService(IAccountRepository accountRepository, IOperationRepository operationRepository)
 {
  _accountRepository = accountRepository;
  _operationRepository = operationRepository;
 }
    public void MakeTransfer(){ ... }
    public IList<operation> GetOperationsForAccount() {...}
    public decimal ComputeInterest(Account account, double rate) { ... }
}
AccountServices will have three methods to test:
  • MakeTransfer
  • ComputeIntereset
  • GetOperationsForAccount
Now to test these methods we have to stub or mock OperationRepository and AccountRepository.
Let's start with MakeTransfer method.
public void MakeTransfer(Account creditAccount, Account debitAccount, decimal amount)
{
 if (creditAccount == null)
 {
  throw new AccountServiceException("creditAccount null");
 }

 if (debitAccount == null)
 {
  throw new AccountServiceException("debitAccount null");
 }

 if (debitAccount.Balance < amount && debitAccount.AutorizeOverdraft == false)
 {
  throw new AccountServiceException("not enough money");
 }

 Operation creditOperation = new Operation() { Amount = amount, Direction = Direction.Credit};
 Operation debitOperation = new Operation() { Amount = amount, Direction = Direction.Debit };

 creditAccount.Operations.Add(creditOperation);
 debitAccount.Operations.Add(debitOperation);

 creditAccount.Balance += amount;
 debitAccount.Balance -= amount;


 _operationRepository.CreateOperation(creditOperation);
 _operationRepository.CreateOperation(debitOperation);

 _accountRepository.UpdateAccount(creditAccount);
 _accountRepository.UpdateAccount(debitAccount);
}
This method calls the CreateOperation method of OperationRepository and UpdateAccount method of AccountRepository. Neither of these two methods returns any value, so in your unit test you do not have to define exact behavior of these methods, so you can provide a simple stub generated by Moles to the constructor of AccountServices class.
In the following example SIAccountRepository and SIOperationRepository are stubs generated by Moles.
[PexMethod, PexAllowedException("SimpleBank", "SimpleBank.AccountServiceException")]
public void MakeTransfer(Account creditAccount,Account debitAccount,decimal amount)
{
 SIAccountRepository accountRepository = new SIAccountRepository();
 SIOperationRepository operationRepository = new SIOperationRepository();
 AccountService service = new AccountService(accountRepository, operationRepository);
 service.MakeTransfer(creditAccount, debitAccount, amount);
}
Let's take a look at Pex's output after running the Pex Test.
That is not bad, so Pex generated for me 6 unit tests, which normally I would have to write and also discovered Overflow exception which I did not cover in my code. What might be missing is the possibility to verify if the Update/Create method of each of the repositories was called. In other words we are limited by the fact that Moles can generate only stubs, which are not able to verify that method was executed as Mocks would be. If we wish to check whether the methods were called, we have to implement this on our own.
Now let's take a look at GetCustomersForAdvisor.
public List<operation> GetOperationsForAccount(int accountID)
{
 Account account = _accountRepository.GetAccount(accountID);
 if (account == null)
 {
  return null;
 }

 if (account.Operations == null)
 {
  return null;
 }

 return account.Operations.ToList();
}
This method calls the GetAccount(int id) method of AccountRepository, than it performs some null value checks and returns the result. So in order to test this method we will have to provide the behavior of the GetAccount method. In the following snippet of code I use SIAccountRepository stub generated by Moles and I specify the value which should be return after callin GetAccount(int x) method.
[PexMethod]
public List<Operation> GetOperationsForAccount(int accountID)
{
 List<Operation> operations1 = new List();
 operations1.Add(new Operation { Amount = 100, Direction = Domain.Direction.Credit });
 operations1.Add(new Operation { Amount = 200, Direction = Domain.Direction.Debit });


 List<Account> accounts = new List<Account>();
 accounts.Add(new Account { Balance = 300, Operations = operations1, AutorizeOverdraft = true, Id = 1 });
 accounts.Add(new Account { Balance = 0, Operations = null, AutorizeOverdraft = false, Id = 2 });

 SIAccountRepository accountRepository = new SIAccountRepository();
 accountRepository.GetAccountInt32 = (x) =>
 {
  return accounts.SingleOrDefault(a => a.Id == x);
 };

 SIOperationRepository operationRepository = new SIOperationRepository();
 AccountService service = new AccountService(accountRepository, operationRepository);

 List result = service.GetOperationsForAccount(accountID);
 return result;
}
At the beginning of the testing method I define a list of accounts, with two accounts, one having several operations and other with no operations. Than I set the delegate of GetAccount method of the SIAccountRepository stub to search in the list by the account id. Now let's run Pex and see the result.
So Pex basically tried the two ID's of the accounts in the predefined list and also checked the null account. There is still a drawback and that is the fact, that I have to define my own list of accounts to stub the account repository, on the other hand I do it only once and also the way the stub is of the GetAccount method is defined is quite straight-forward; I only tell Pex to search in the list, and I do not have to specify exactely which ID will provide me with which account. The last method is ComputeInterest, which should compute the monthly interest computed on annual basis (note that this is here just for demonstration).
public decimal ComputeInterest(Account account, double annualRate, int months)
{
 if (account == null)
 {
  throw new AccountServiceException("Account is null");
 }

 double yearInterest = Math.Round((double)account.Balance * annualRate);
 double monthInterest = yearInterest / 12;

 return (decimal)(monthInterest * months);
 
}
This method takes the balance of the account, computes the annual interest and gives a value for one month(yes it is completely non-real life method). Now lets take a look at the test for this method.
[PexMethod, PexAllowedException(typeof(AccountServiceException))]
public decimal ComputeInterest(Account account,double annualRate,int months)
{
 PexAssume.Implies(account != null, () => account.Balance = 1000);
 PexAssume.IsTrue(annualRate != 0);
 PexAssume.IsTrue(months != 0);

 SIAccountRepository accountRepository = new SIAccountRepository();
 SIOperationRepository operationRepository = new SIOperationRepository();

 AccountService service = new AccountService(accountRepository, operationRepository);

 decimal result = service.ComputeInterest(account, annualRate, months);

 return result;
}
Here we use PexAssume to shape the inputs of the unit tests. PexAssume is a static class which provides several methods to elaborate the inputs. The most useful methods are IsTrue(cond) which shapes the inputs in that form that the condition will always be true, and Implies(cond, fact) which allows conditional clarification of inputs.

Pex tries always the simpliest inputs, so right after trying a null account, it will try an account with 0 balance. If we want Pex to provide an account with different balance, than we have to use PexAssume.Implies method. If we would use just PexAssume.IsTrue(account.Balance==1000), than we would obtain null pointer exception in the test for which Pex generates null account. Now let's take a look at the result:
So here Pex generates only two cases - but that is exactly sufficient to cover all the code blocks. What is interesting is that we do not obtain the case for OverflowException here, maybe because the multiplications result in double values and the later conversion to decimal does not throw OverflowException.

Summary

Pex is a great tool when it comes to code coverage. It will exercise all the paths in your code to look for errors or exceptions.
However sometimes you will have to generate the data for your test by hand and provide them to Pex.
Moles is a great tool to provide stubs for static methods (and specially static framework's methods) which normally are hard to test. It also cooperates well with Pex, because it is completely transparent. For each of you abstract classes or interfaces a stub is generated with delegates that you can redefine to fit your needs. If you would try to use other mock/stub framework, Pex will try to enter the scenes behind the framework, which might result in unexpected exceptions.
However Moles lacks the "mocking" functionality. You can substitute any method with a delegate, but there is no build-in function which would tell you if the delegate was invoked. On the other hand this functionality can be easily developed.
The provided description is my personal experience, I am still not sure if I should use Pex in my personal projects and I am definitely not sure if I am using it the right way. From my point of view Pex is great for projects containing complex method with several branches. Quite a lot of time the code, that I have to write is quite straight-forward and because Pex generates the simplest values often it will finish by a single null value passed as a test parameter.
This post does cover only small fraction of Pex capabilities and there is a lot more to learn, to start with you can check PexFactories which allow customize the generation of test inputs, the capabilities of PexAssert or cooperation of Pex and CodeContracts.

PS: If someone has another approach or some additional advices on how to use Pex it would be great to share them, I have wrote this post partially because I would like to get some feedback on the subject.