BM-Bloggers

The blogs of Black Marble staff

The best way to enjoy Build on the road.

The way most big conferences manage to virtually live stream everything is very impressive. I started watching the stream of yesterdays Microsoft Build keynote on the office’s big projection screen with everyone else at Black Marble. I have always said the best way to enjoy a keynote is on the comfy sofa with a beer at the end of the day. So much better than an early queue then a usually over air conditioned hall with 10,000 close friends.

Unfortunately I had to leave about 1 hour into the keynote, so fired up my Lumia 800 Window 7.8 phone (yes an older one, but I like the size) and hit the Channel 9 site. This picked up the stream, just in the browser, and I was able to listen to the rest of the session via 3G whilst travelling home, I of course made sure the screen was off as I was driving. It was seamless.

Clever what this technology stuff can do now isn’t it

A day of TFS upgrades

After last nights release of new TFS and Visual Studio bits at the Build conference I spent this morning upgrading my demo VMs. Firstly I upgraded to TFS 2012.3 and then snapshotting before going onto 2013 Preview. So by changing snapshot I can now demo either version. In both cases the upgrade process was as expected, basically a rerun of the configuration wizard with all the fields bar the password prefilled. Martin Hinshelwood has done a nice post if you want more details on the process

Looking at the session at Build on Channel9 there are not too many on TFS, to find out more about the new features then you are probably better to check out the TechEd USA or TechEd Europe streams.

Why can’t I find my build settings on a Git based project on TFS Service?

Just wasted a bit of time trying to find the build tab on a TFS Team Project hosted on the hosted http://tfs.visualstudio.com using a Git repository. I was looking on team explorer expecting to see something like

image

But all I was seeing the the Visual Studio Git Changes option (just the top bit on the left panel above).

It took to me ages to realise that the issue was I had cloned the Git repository to my local PC using the Visual Studio Tools for Git. So I was just using just Git tools, not TFS tools. As far as Visual Studio was concerned this was just some Git repository it could have been local, GitHub, TFS Service or anything that hosts Git.

To see the full features of TFS Service you need to connect to the service using Team Explorer (the green bits), not just as a Git client (the red bits)

image

Of course if you only need Git based source code management tools, just clone the repository and use the Git tooling, where inside or outside Visual Studio. The Git repository in TFS is just a standard Git repro so all tools should work. From the server end TFS does not care what client you use, in fact it will still associate you commits, irrespective of client, with TFS work items if you use the #1234 syntax for work item IDs in your comments.

However if you are using hosted TFS from Visual Studio, it probably makes more sense to use a Team Explorer connection so all the other TFS feature light up, such as build. The best bit is that all the Git tools are still there as Visual Studio knows it is still just a Git repository. Maybe doing this will be less confusing when I come to try to use a TFS feature!

Error adding a new widget to our BlogEngine.NET 2.8.0.0 server

Background

if you use Twitter in any web you will probably have noticed that they have switched off the 1.0 API, you have to use the 1.1 version which is stricter over OAUTH. This meant the Twitter feeds into our blog server stopped working on the 10th of June. The old call of

http://api.twitter.com/1/statuses/user_timeline.rss?screen_name=blackmarble

did not work and just change 1 to 1.1 did not work.

So I decided to pull down a different widget for BlogEngine.NET to do the job, choosing Recent Tweets.

The Problem

However when I tried to access our root/parent blog site and go onto the customisation page to add the new widget I got

Ooops! An unexpected error has occurred.

This one's down to me! Please accept my apologies for this - I'll see to it that the developer responsible for this happening is given 20 lashes (but only after he or she has fixed this problem).

Error Details:

Url : http://blogs.blackmarble.co.uk/blogs/admin/Extensions/default.cshtml
Raw Url : /blogs/admin/Extensions/default.cshtml
Message : Exception of type 'System.Web.HttpUnhandledException' was thrown.
Source : System.Web.WebPages
StackTrace : at System.Web.WebPages.WebPageHttpHandler.HandleError(Exception e)
at System.Web.WebPages.WebPageHttpHandler.ProcessRequestInternal(HttpContextBase httpContext)
at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
TargetSite : Boolean HandleError(System.Exception)
Message : Item has already been added. Key in dictionary: 'displayname' Key being added: 'displayname'

Looking at the discussion forums it seem be had some DB issues.

The Fix

I could see nobody with the same problems, so I pulled down the source code from Codeplex and had a look at the DBBlogProvider.cs (line 2350) where the error was reported. I think the issue is that when a blog site is set ‘Is for site aggregation’, as our root site where I needed to install the new widget is, the SQL query that generates the user profile list was not filtered by blog, so it saw duplicates.

I disabled ‘Is for site aggregation’ for our root blog and was then able to load the customisation page and add my widget.

Interestingly, I then switched back on ‘Is for site aggregation’ and all was still OK. I assume the act of opening the customisation page once fixes the problem.

Update: Turns out this is not the case, after a reboot of my client PC the error returned, must have been some caching that made it work

Also worth noting ….

In case you had not seen it, I hadn’t, there is a patch for 2.8.0.0 that fixes a problem that the slug (the url generate for a post) was not being done correctly, so multiple posts on the same day got group as one. This cause search and navigation issues. Worth installing this if you are likely to write more than one post on a blog a day.

Claims, Explanations and Inferences, Oh My!

Last time, we spoke about how critical thinking was an aspect of data science that was often over looked. In this post, we’re going to examine some of the fundamental building blocks of critical thinking: claims, explanations and inferences.

Claims
Firstly, claims. Claims are really just assertions:

  • The crime rate has fallen this year.
  • Unemployment is up.
  • Major corporations aren’t paying enough tax.

And they come in four main flavours:

  1. Evidence based claims. These are claims that are stated as facts that can be checked by someone. Crime fell by 8% last year.
  2. Prediction base claims, which are claims that state something will happen in the future. The UK is condemned to a decade of washed out summers.
  3. Recommendation based claims, which are those that make recommendations. We should drink 1.2 litres of water per day.
  4. Principle base claims, are those that express an opinion on what ought (or not) to be done. Major corporations should pay more tax.

So what should we do when we come across these claims in our work as data scientists? Well as critical thinkers, we should always question them, asking ourselves questions like, “is this claim reasonable?”, “Is it significant?” and “What else do I need to know to make a judgement regarding this claim?”.

Explanations
Explanations are the things that sit between the claim and the inference. We want to get to the inference because that’s the thing that contains the action point or the conclusion to the argument, but without one or more explanations we can’t get there. Often the explanation sentence will start with “because…” or “due to…”, for example: the UK is condemned to a decade of washed out summers, due to global warming.

They often come in the form of a claim, as in this case. There are implied claims in this explanation, namely: global warming exists, global warming causes weather change and that the UK is affected by this weather change. The same questions asked of claims should be asked of these kinds of explanations too, and you should follow the claim –> explanation “rabbit hole” until it “bottoms out”, or until you satisfy yourself that the explanation is right or wrong.

Explanations can come in the form of single explanations, multiple independent explanations and joint explanations. We’ve covered single explanations; multiple independent explanations are just where more than one explanation can lead from the claim to the inference. For example: I will buy flowers because it is my wife’s birthday and because she likes flowers. Either explanation can be used to explain why flowers will be bought. Joint explanations is where two or more explanations are used, jointly, to explain a claim. However, in this case, the joint explanations are not independent and if one of the explanations are false, then the claim falls. For example: I am going to get wet when leaving work because it is going to rain and I have no umbrella. Here if it doesn’t rain, or someone lends me an umbrella, then the claim falls and I shan’t get wet.

Inference
An inference is the conclusion to an argument and often contains an action point, it follows the logical steps of claim –> explanation –> inference. Often the inference sentence will start with “So…” or “Therefore…”, for example: There is a huge demand for thingummies in the US, because legislation has been passed requiring every citizen to carry a thingummy, therefore we should increase thingummy sales to the US in the coming quarter.

So now you have been furnished with the basic steps that you should run through when you see claims made as the result of your own, or other’s, data science. If the output is in the form of a claim (sales are up in the north west region), look for explanations that can support the claim and an inference that can help the business move forward.

Next time we’ll continue our exploration of critical thinking, until then, crunch those numbers! Smile

Using SYSPREP’d VM images as opposed to Templates in a new TFS 2012 Lab Management Environment

An interesting change with Lab Management 2012 and SCVMM 2012 is that templates become a lot less useful. In the SCVMM 2008 versions you had a choice when you stored VMs in the SCVMM library. …

  • You could store a fully configured VM
  • or a generalised template.

When you added the template to a new environment you could enter details such as the machine name, domain to join and product key etc. If you try this with SCVMM 2012 you just see the message ‘These properties cannot be edited from Microsoft Test Manager’

image

So you are meant to use SCVMM to manage everything about the templates, not great if you want to do everything from MTM. However, is that the only solution?

An alternative is to store a SYSPREP’d VM as a Virtual Machine in the SCVMM library. This VM can be added as many times as is required to an environment (though if added more than once you are asked if you are sure)

image

This method does however bring problems of its own. When the environment is started, assuming it is network isolated, the second network adaptor is added as expected. However, as there is no agent on the VM it cannot be configured, usually for a template Lab Management would sort all this out, but because the VM is SYSPREP’d it is left sitting at the mini setup ‘Pick your region’ screen.

You need to manually configure the VM. So the best process I have found is

  1. Create the environment with you standard VMs and the SYSPRED’d one
  2. Boot the environment, the standard ready to use VMs get configured OK
  3. Manually connect to the SYSPREP’d VM and complete the mini setup. You will now have a PC on a workgroup
  4. The PC will have two network adapters, neither connected to you corporate network, both are connected to the network isolated virtual LAN. You have a choice
    • Connect the legacy adaptor to your corporate LAN, to get at a network share via SCVMM
    • Mount the TFS Test Agent ISO
  5. Either way you need to manually install the Test Agent and run the configuration (just select the defaults it should know where the test controller is). This will configure network isolated adaptor to the 192.168.23.x network
  6. Now you can manually join the isolated domain
  7. A reboot the VM (or the environment) and all should be OK

All a bit long winded, but does mean it is easier to build generalised VMs from MTM without having to play around in SCVMM too much. 

I think all would be a good deal easier of the VM had the agents on it before the SYSPREP, I have not tried this yet, but that is true in my option of all VMs used for Lab Management. Get the agents on early as you can, just speeds everything up.

Great experience moving my DotNetNuke site to PowerDNN

I posted recently about my experiences in upgrading DotNetNuke 5 to 7, what fun that was! Well I have now had to do the move for real. I expected to follow the same process, but had problems. Turns out the key was to go 5 > 6 > 7. Once I did this the upgrade worked, turns out this is the recommended route. Why my previous trial worked I don’t know?

Anyway I ended up with a local DNN 7 site running against SQL 2012. It still was using DNN 5 based skin (which has problems with IE 10) which I needed to alter, but was functional. So it was time to move my ISP.

Historically I had the site running on Zen Internet, but their Windows hosting is showing its age, they do not offer .NET 4,  and appear to have no plans to change this when I last asked. Also there is no means to do a scripted/scheduled backup on their servers.

The lack of .NET 4  meant I could not use Zen for DNN 7. So I choose to move to PowerDNN, which is a DNN specialist, offers the latest Microsoft hosting and was cheaper.

I had expect the migrate/setup to be awkward, but far from it. I uploaded my backups to PowerDNN’s FTP site and the site was live within 10 minutes. I had a good few questions over backup options, virtual directories for other .NET applications etc. all were answered via email virtually instantly. Thus far the service has been excellent, PowerDNN are looking a good choice.

The Missing Aspect of Data Science

Hello there, my name’s Gary Short. If you’re a follower of Black Marble news, you’ll already know that I’ve recently joined the company as Head of Data Science, with the task of creating a flourishing Data Science Practice within the company.

And that is all I plan to say about that. Smile

With the traditional “first post introduction” out of the way, I’d like to spend the rest of this post talking about something far more interesting… data science, and in particular an aspect of data science that I don’t hear a lot of people talking about, and that’s critical thinking. Head over to Wikipedia (no not now, after you’ve finished reading this!) and look up the definition of data science and you see it’s defined as the intersection of a lot of Cool StuffTM

Whilst all of these are valid, and I agree that they are all part of what makes a good data scientist, I do believe that this definition is missing the key aspect of critical thinking. So what is critical thinking? Well, if we pop over to Wikipedia again (not yet! What’s wrong with you people?), we can see that critical thinking is defined thusly:

Different sources define critical thinking variously as:

  • "reasonable reflective thinking focused on deciding what to believe or do"[2]
  • "the intellectually disciplined process of actively and skillfully conceptualizing, applying, analyzing, synthesizing, or evaluating information gathered from, or generated by, observation, experience, reflection, reasoning, or communication, as a guide to belief and action"[4]
  • "purposeful, self-regulatory judgment which results in interpretation, analysis, evaluation, and inference, as well as explanation of the evidential, conceptual, methodological, criteriological, or contextual considerations upon which that judgment is based"[5]
  • "includes a commitment to using reason in the formulation of our beliefs"[6]
  • Umm, yeah, well that’s clear then, isn’t it? Well no, not really. When I see something defined like this, and by “like this” I mean having several definitions, I immediately decide that the reason that there isn’t one clear definition is that the definition is context sensitive, that is, it very much depends on the point of view of the observer. Since we’re going to be focusing on critical thinking from the point of view of the data scientist, I don’t think it helps us to layer on yet another definition. Instead, it will be much more productive if we just jump in and look at critical thinking by example instead of definition. So, for the remainder of this post, and in a number of future posts, we’ll do just that.

    For us data scientists, critical thinking can can best be thought of as the process of examining the significance and meaning of the claims made by the results of our statistical analysis. That’s all well and good, but what exactly is a claim? Well, much like we programmers are used to dealing with claims based authentication, where a “claim” is made about a person or an organisation and a system must check the veracity of such a claim; in data science, a claim may arise from the output of an analysis, and we must ascertain the veracity of such a claim.

    Let’s take a recent example from the news, here we’re told that a “Joint study shows extent of Scottish under-employment”, and that..

    The extent of under-employment in Scotland has been revealed in a study published by the Scottish government. An analysis jointly prepared with the STUC shows more than 250,000 workers want to work longer hours. That is a rise of 80,000 on 2008, before the downturn got under way. That makes 256,000 people, or more than 10% of the entire workforce.

    Critical thinking will teach us to examine each of the claims in this article and to ask questions about them. So let’s do that now, by way of example:

    Firstly, the article claims “The extent of under-employment in Scotland has been revealed”. Has it? The article doesn’t say how the figures were obtained, so we can’t draw our own conclusions with regard to sample bias.

    The article claims that “250, 000 workers want to work longer hours”. Do these people actually want to work longer hours, or do they want to earn more money for the full time work they do? Do these people want to work longer hours, or do they want to move from part time work into full time work?

    After we have examined such explicit claims, we can also examine the implicit claims of the article. An item such as this, on the BBC news site, comes with an implied claim that there is no bias, just a straight reporting of facts. We can apply critical thinking here too, and ask ourselves such questions as, do the Scottish Government and the STUC gain from maximising or minimising the reported number? If they do, is there evidence, in the article or elsewhere, that they have done such a thing?

    The article quotes from the research, without linking to it, and from a spokesman from one of the report’s authors, but there is no balancing quote from the opposition, so we should ask ourselves, are these figures undisputed and therefore no balancing quote is required? If not, is the journalist showing bias here by excluding an opposing view or did the supporters of an opposing view decline to comment? If the former, does this journalist have a history of doing so, or is this a one off? If the latter, who holds the opposing view, and were they given enough time to formulate a response? The answers to these questions will go a long way in helping us to contextualise the output of this analysis and to give it the appropriate weight in our decision making process.

    As you can see, critical thinking is a key aspect of data science, the correct application of which will allow us to take full advantage of the output of our analysis and will allow us to interpret it properly for our end audience, or for our own benefit when consuming other data scientist’s output. In future blog posts we’ll delve further into this fascinating aspect of data science and I hope you’ll join me for those posts, until then, keep crunching those numbers! Smile

    Using git tf to migrate code between TFS servers retaining history

    Martin Hinshelwood did a recent post on moving source code between TFS servers using  git tf. He mentioned that you could use the --deep option to get the whole changeset check-in history.

    Being fairly new to using Git, in anything other than the simplest scenarios, it took me a while to get the commands right. This is what I used in the end (using the Brian Keller VM for sample data) …

    C:\tmp\git> git tf clone http://vsalm:8080/tfs/fabrikamfibercollection $/fabrikamfiber/Main oldserver --deep

    Connecting to TFS...

    Cloning $/fabrikamfiber/Main into C:\Tmp\git\oldserver: 100%, done.

    Cloned 5 changesets. Cloned last changeset 24 as 8b00d7d

    C:\tmp\git> git init newserver

    Initialized empty Git repository in C:/tmp/git/newserver/.git/

    C:\tmp\git> cd newserver

    C:\tmp\git\newserver [master]> git pull ..\oldserver --depth=100000000

    remote: Counting objects: 372, done.

    remote: Compressing objects: 100% (350/350), done.

    96% (358/372), 2.09 MiB | 4.14 MiB/s

    Receiving objects: 100% (372/372), 2.19 MiB | 4.14 MiB/s, done.

    Resolving deltas: 100% (110/110), done.

    From ..\oldserver

    * branch HEAD -> FETCH_HEAD

    C:\tmp\git\newserver [master]> git tf configure http://vsalm:8080/tfs/fabrikamfibercollection $/fabrikamfiber/NewLocation

    Configuring repository

    C:\tmp\git\newserver [master]> git tf checkin --deep --autosquash

    Connecting to TFS...

    Checking in to $/fabrikamfiber/NewLocation: 100%, done.

    Checked in 5 changesets, HEAD is changeset 30

    The key was I had missed the –autosquash option on the final checkin.

    Once this was run I could see my checking history, the process is quick and once you have the right command line straight forward. However, just like TFS Integration Platform time is compressed, and unlike TFS Integration Platform you also lose the ownership of the original edits.

    image

    This all said, another useful tool in the migration arsenal.