Agile FAQs
  About   Slides   Home  

 
Managed Chaos
Naresh Jain's Random Thoughts on Software Development and Adventure Sports
     
`
 
RSS Feed

Recent Thoughts
Tags
Recent Comments

Archive for the ‘Programming’ Category

Unit Testing Dilemma: Should I Invest or Not?

Tuesday, November 1st, 2011

Every single line of code must be unit tested!

This sound advice rather seems quite extreme to me. IMHO a skilled programmer pragmatically decides when to invest in unit testing.

After practicing (automated) unit testing for over a decade, I’m a strong believer and proponent of automated unit testing. My take on why developers should care about Unit Testing and TDD.

However over the years I’ve realized that automated unit tests do have four, very important, costs associated with them:

  • Cost of writing the unit tests in the first place
  • Cost of running the unit tests regularly to get feedback
  • Cost of maintaining and updating the unit tests as and when required
  • Cost of understanding other’s unit tests
One also starts to recognize some other subtle costs associated with unit testing:
  • Illusion of safety: While unit tests gives you a great safety net, at times, it can also create an illusion of safety leading to developers too heavily relying on just unit tests (possibly doing more harm than good.)
  • Opportunity cost: If I did not invest in this test, what else could I have done in that time? Flip side of this argument is the opportunity cost of repetitive manually testing or even worse not testing at all.
  • Getting in the way: While unit tests help you drive your design, at times, they do get in the way of refactoring. Many times, I’ve refrained from refactoring the code because I get intimidated by the sheer effort of refactor/rewrite a large number of my tests as well. (I’ve learned many patterns to reduce this pain over the years, but the pain still exists.)
  • Obscures a simpler design: Many times, I find myself so engrossed in my tests and the design it leads to, that I become ignorant to a better, more simpler design. Also sometimes half-way through, even if I realize that there might be an alternative design, because I’ve already invested in a solution (plus all its tests), its harder to throw away the code. In retrospect this always seems like a bad choice.
If we consider all these factors, would you agree with me that:
Automated unit testing is extremely important, but each developer has to make a conscious, pragmatic decision when to invest in unit testing.
Its easy to say always write unit tests, but it takes years of first-hand experience to judge where to draw the line.

Importance of Unit Testing and Test Driven Development (TDD)

Tuesday, November 1st, 2011

Why should developers care of automated unit tests?

  • Keeps you out of the (time hungry) debugger!
  • Reduces bugs in new features and in existing features
  • Reduces the cost of change
  • Improves design
  • Encourages refactoring
  • Builds a safety net to defend against other programmers
  • Is fun
  • Forces you to slow down and think
  • Speeds up development by eliminating waste
  • Reduces fear

And how TDD takes it to the next level?

  • Improves productivity by
    • minimizing time spent debugging
    • reduces the need for manual (monkey) checking by developers and tester
    • helping developers to maintain focus
    • reduce wastage: hand overs
  • Improves communication
  • Creating living, up-to-date specification
  • Communicate design decisions
  • Learning: listen to your code
  • Baby steps: slow down and think
  • Improves confidence
  • Testable code by design + safety net
  • Loosely-coupled design
  • Refactoring

Is Code Coverage, Cyclomatic Complexity or Defect Density a good Measure of Quality?

Sunday, October 9th, 2011

In Software, Quality is one of those badly abused term, which is getting harder and harder to define what it really means. I think we have a sense of quality. When we see something in a specific context, we can say its high quality or low quality, but its hard to define (and hence measure) what absolute quality really is.

You can measure somethings about quality, but don’t fool yourself to believe that IS quality.

Quality is subjective, relative and contextual.

Some might say things like code coverage, cyclomatic complexity and defect density is a good measure of quality. I would argue that those are attributes/aspects of quality, but not quality itself (symptoms not the disease itself.) Its a classic case of Fundamental Attribution Error. (If you go to France and see the first 50 Frenchmen wear glasses, you cannot conclude all Frenchmen wear glasses. Nor can you conclude that, if I wear glasses I’ll also be French.)

BTW people already differentiate between Internal/Intrinsic Quality and External/Extrinsic Quality. This is not enough to complicate things, evangelists would like to further slice and dice quality along different parameters (structural, functional, UX, etc.)

Some anecdotes:

Code Smell of the Week: Obsessed with Out Parameters

Saturday, August 6th, 2011

Consider the following code to retrieve a user’s profile:

public bool GetUserProfile(string userName, out Profile userProfile, out string msg)
{
    msg = string.Empty;
    userProfile = null;
    if (some_validations_here))
    {
        msg = string.Format("Insufficient data to get Profile for username: {0}.", userName);
        return false;
    } 
 
    IList<User> users = // retrieve from database 
 
    if (users.Count() > 1)
    {
        msg = string.Format("Username {0} has {1} Profiles", userName, users.Count());
        return false;
    }
 
    if (users.Count() == 0)
    {
       userProfile = Profiles.Guest;
    }
    else
    {
        userProfile = users.Get(0).Profile;
    }
    return true;
}

Notice the bool return value and the use of out parameters. This code is heavily influenced by COM & C Programming. We don’t operate under the same constraints these days.

If we were to write a test for this method, what would it look like?

[TestClass]
public class ProfileControllerTest
{
    private string msg;
    private Profile userProfile;
    //create a fakeDB
    private ProfileController controller = new ProfileController(fakeDB);
    private const string UserName = "Naresh.Jain";
 
    [TestMethod]
    public void ValidUserNameIsRequiredToGetProfile()
    {      
        var emptyUserName = "";
        Assert.IsFalse(controller.GetUserProfile(emptyUserName, out userProfile, out msg));
        Assert.IsNull(userProfile);
        Assert.AreEqual("Insufficient data to get Profile for username: " + UserName + ".", msg);
    }
 
    [TestMethod]
    public void UsersCannotHaveMultipleProfiles()
    {     
        //fake DB returns 2 records
        Assert.IsFalse(controller.GetUserProfile(UserName, out userProfile, out msg));
        Assert.IsNull(userProfile);
        Assert.AreEqual("Username "+ UserName +" has 2 Profiles.", msg);
    }
 
    [TestMethod]
    public void ProfileDefaultedToGuestWhenNoRecordsAreFound()
    {       
        //fake DB does not return any records
        Assert.IsTrue(controller.GetUserProfile(UserName, out userProfile, out msg));
        Assert.AreEqual(Profiles.Guest, userProfile);
        Assert.IsNull(msg);
    }
 
    [TestMethod]
    public void MatchingProfileIsRetrievedForValidUserName()
    {         
        //fake DB returns valid tester
        Assert.IsTrue(controller.GetUserProfile(UserName, out userProfile, out msg));
        Assert.AreEqual(Profiles.Tester, userProfile);
        Assert.IsNull(msg);
    }
}

This code really stinks.

What problems do you see with this approach?

  • Code like this lacks encapsulation. All the out parameters could be encapsulated into an object.
  • Encourages duplication in both client code and inside this method.
  • The caller of this method needs to check the return value first. If its false then they need to get the msg and do the needful. Its very easy to ignore the failure conditions. (In fact with this very code we saw that happen in 4 out of 6 places.)
  • Tests have to validate multiple things to ensure the code is functions correctly.
  • Overall more difficult to understand

We can refactor this code as follows:

public Profile GetUserProfile(string userName)
{
    if (some_validations_here))   
        throw new Exception(string.Format("Insufficient data to get Profile for username: {0}.", userName)); 
 
    IList<User> users = // retrieve from database
 
    if (users.Count() > 1)  
        throw new Exception(string.Format(""Username {0} has {1} Profiles", userName, users.Count()));
 
    if (users.Count() == 0) return Profiles.Guest;
 
    return users.Get(0).Profile;
}

and Test code as:

[TestClass]
public class ProfileControllerTest
{
    //create a fakeDB
    private ProfileController controller = new ProfileController(fakeDB);
    private const string UserName = "Naresh.Jain";
 
    [TestMethod]   
    [ExpectedException(typeof(Exception), "Insufficient data to get Profile for username: .")]
    public void ValidUserNameIsRequiredToGetProfile()
    {      
        var emptyUserName = "";
        controller.GetUserProfile(emptyUserName); 
    }
 
    [TestMethod]    
    [ExpectedException(typeof(Exception), "Username "+ UserName +" has 2 Profiles.")]
    public void UsersCannotHaveMultipleProfiles()
    {     
        //fake DB returns 2 records
        controller.GetUserProfile(UserName); 
    }
 
    [TestMethod]
    public void ProfileDefaultedToGuestWhenNoRecordsAreFound()
    {       
        //fake DB does not return any records  
        Assert.AreEqual(Profiles.Guest, controller.GetUserProfile(UserName));   
    }
 
    [TestMethod]
    public void MatchingProfileIsRetrievedForValidUserName()
    {         
        //fake DB returns valid tester
        Assert.AreEqual(Profiles.Tester, controller.GetUserProfile(UserName));
    }
}

See how simple the client code (tests are also client code) can be.

My heart sinks when I see the following code:

public bool GetDataFromConfig(out double[] i, out double[] x, out double[] y, out double[] z)...
 
public bool AdjustDataBasedOnCorrelation(double correlation, out double[] i, out double[] x, out double[] y, out double[] z)...
 
public bool Add(double[][] factor, out double[] i, out double[] x, out double[] y, out double[] z)...

I sincerely hope we can find a home (encapsulation) for all these orphans (i, x, y and z).

Duplicate Code and Ceremony in Java

Thursday, July 21st, 2011

How would you kill this duplication in a strongly typed, static language like Java?

private int calculateAveragePreviousPercentageComplete() {
    int result = 0;
    for (StudentActivityByAlbum activity : activities)
        result += activity.getPreviousPercentageCompleted();
    return result / activities.size();
}
 
private int calculateAverageCurrentPercentageComplete() {
    int result = 0;
    for (StudentActivityByAlbum activity : activities)
        result += activity.getPercentageCompleted();
    return result / activities.size();
}
 
private int calculateAverageProgressPercentage() {
    int result = 0;
    for (StudentActivityByAlbum activity : activities)
        result += activity.getProgressPercentage();
    return result / activities.size();
}

Here is my horrible solution:

private int calculateAveragePreviousPercentageComplete() {
    return new Average(activities) {
        public int value(StudentActivityByAlbum activity) {
            return activity.getPreviousPercentageCompleted();
        }
    }.result;
}
 
private int calculateAverageCurrentPercentageComplete() {
    return new Average(activities) {
        public int value(StudentActivityByAlbum activity) {
            return activity.getPercentageCompleted();
        }
    }.result;
}
 
private int calculateAverageProgressPercentage() {
    return new Average(activities) {
        public int value(StudentActivityByAlbum activity) {
            return activity.getProgressPercentage();
        }
    }.result;
}
 
private static abstract class Average {
    public int result;
 
    public Average(List<StudentActivityByAlbum> activities) {
        int total = 0;
        for (StudentActivityByAlbum activity : activities)
            total += value(activity);
        result = total / activities.size();
    }
 
    protected abstract int value(StudentActivityByAlbum activity);
}

if this were Ruby

@activities.inject(0.0){ |total, activity| total + activity.previous_percentage_completed? } / @activities.size
@activities.inject(0.0){ |total, activity| total + activity.percentage_completed? } / @activities.size
@activities.inject(0.0){ |total, activity| total + activity.progress_percentage? } / @activities.size

or even something more kewler

average_of :previous_percentage_completed?
average_of :percentage_completed?
average_of :progress_percentage?
 
def average_of(message)
	@activities.inject(0.0){ |total, activity| total + activity.send message } / @activities.size
end

Dynamic Typing is NOT Weak Typing

Monday, July 11th, 2011

Till very recently I did not know the clear distinguish between Static/Dynamic and Strong/Weak typing. Thanks to Venkat for enlightening me.

Dynamic typing: Variables’ type declarations are not mandatory and they will be generated/inferred on the fly, by their first use.

Static typing: Variable declarations are mandatory before usage, else results in a compile-time error.

Strong typing: Once a variable is declared as a specific data type, it will be bound to that particular data type. You can explicitly cast the data type though.

Weak typing: Variables are not of a specific data type. However it doesn’t mean that variables are not “bound” to a specific data type. In weakly typed languages, once a block of memory is associated with an object it can be reinterpreted as a different type of object.

One thing I’ve realized, Strong vs. Weak and Dynamic vs. Static is a continuum rather than an absolute measure. For instance, SmallTalk is more strongly typed compared to Python which is more strongly typed than JavaScript.

There seem to be two major lines along which strong/weak typing is defined:

  • The more type coercions (implicit conversions) for built-in operators the language offers, the weaker the typing. (This could also be viewed as more built-in overloading of built-in operators.)
  • The easier it is in a language, or the more ways a language offers, to reinterpret a memory block (associated with a data value) as a different type, the weaker the typing.

In strongly typed languages if you cast to the wrong type, you get a runtime cast exception. While in weakly typed languages, your program might crash if you are lucky. Usually it leads to wrong behavior.

In most static languages you need to specify the data type at declaration. However in languages like Scala, you don’t need to specify the data types, the compiler is smart enough to infer the data types based in the context in which its used.

Also if you don’t have a compiler, then the language is surely dynamic language. However the inverse is not true. For example, Groovy is compiled, yet its a dynamic language.

In Strongly typed Dynamic languages, the type inference is postponed till runtime. This has many advantages:

  • One can achieve greater degree of polymorphism
  • One does not need to keep fighting the compiler by doing trivial type casting
  • One gets greater flexibility by deferring the implementation to a later point. i.e. the actual type verification is postponed to runtime; allowing us to modify the structure of the program between compile time and runtime.

I always thought weakly typed, dynamic language would be a disaster. However both VB and PHP (amongst most popular languages in the last 2 decades) fall into this category.

Having said that, these days I see more and more languages are strongly typed. Also the ability to infer types is gaining a lot of traction.

What do you prefer in your programming language and why?

Example of Test Driving Code with Events

Wednesday, July 6th, 2011

Following is a stripped down version of the actual code.

Start with a simple Test:

private static void ContinueMethod(object sender, EventArgs arg)
{
    ((Eventful)sender).ContinueExecutionFor("ContinueMethod");
}
 
[Test]
public void EventHandlerCanRequestFurtherCodeExecution()
{
    var eventful = new Eventful();
    eventful.ContinuationHandler += ContinueMethod;
    eventful.Execute();
    Assert.AreEqual("ContinueMethod wants to execute more logic", eventful.State);
}

and then wrote the production code:

public class Eventful
{
    public string State;
    public event EventHandler<EventArgs> ContinuationHandler;
 
    public void Execute()
    {
        // logic to get ready
 
        var args = new EventArgs();
        //stuff args with some data
        ContinuationHandler(this, args); 
    }
 
    public void ContinueExecutionFor(string delegateName)
    {
        State = delegateName + " wants to execute more logic";
        // some logic
    }
}

The wrote the next Test:

private static void StopMethod(object sender, EventArgs arg)
{
    // no call back
}
 
[Test]
public void EventHandlerCanStopFurtherCodeExecution()
{
    eventful.ContinuationHandler += StopMethod;
    eventful.Execute();
    Assert.IsNull(eventful.State);
}

and this went Green.

Then I wondered what happened if there were no registered event handlers. So I wrote this test.

[Test]
public void NoOneIsInterestedInThisEvent()
{
    eventful.Execute();
    Assert.IsNull(eventful.State);
}

This resulted in:

System.NullReferenceException : Object reference not set to an instance of an object.
at Eventful.Execute() in Eventful.cs: line 17
at EventTest.NoOneIsInterestedInThisEvent() in EventTest.cs: line 46

Then I updated the Execute() method in Eventful class:

public void Execute()
{
    // logic to get ready
 
    if (ContinuationHandler != null)
    {
        var args = new EventArgs();
        //stuff args with some data
        ContinuationHandler(this, args);
    }
}

This made all the 3 tests go Green!

Double Dispatch Demystified

Tuesday, July 5th, 2011

Single Dispatch: In most object-oriented programming languages, the concrete method that is invoked at runtime depends on the dynamic (runtime) type of the object on which the method is invoked.

For Example: If we have 2 cars; Car and derived class RubberCar.

public class Car
{
    public virtual string CrashInto(Wall wall)
    { 
        return "Car crashed into the Wall";
    }
}
 
public class RubberCar: Car
{
    public override string CrashInto(Wall wall)
    {
        return "Rubber Car crashed into the Wall";
    }
}

and its test code

[Test]
public void SingleDispathWorksCorrectly()
{
    Wall wall = new Wall();
    Car currentCar = new Car();
    Assert.AreEqual("Car crashed into the Wall", currentCar.CrashInto(wall));
    currentCar = new RubberCar();
    Assert.AreEqual("Rubber Car crashed into the Wall", currentCar.CrashInto(wall));
}

First time we call the method CrashInto() on currentCar, we are holding the reference of Car and hence Car’s CrashInto() method is invoked. However the second time, we’re holding the reference of RubberCar in currentCar and CrashInto() method from RubberCar is invoked correctly. In other words this is referred to as Polymorphism.

Its called Single Dispatch because one object’s runtime type (object on which the method is invoked) is used to decide which concrete method to invoke.

Double Dispatch: A mechanism that dispatches a method call to different concrete methods depending on the runtime types of two objects involved in the call (object and its parameter)

Let’s say, we had 2 types of wall, Wall and the derived class MagicWall.

Now when we called currentCar.CrashInto(new MagicWall()) we want to execute different behavior compared to currentCar.CrashInto(new Wall()). One classic way to implement this is:

// Car class
public virtual string CrashInto(Wall wall)
{
    if (wall is MagicWall)
        return "Car crashed into the Magic Wall";
    return "Car crashed into the Wall";
}
// RubberCar class
public override string CrashInto(Wall wall)
{
    if (wall is MagicWall)
        return "Rubber Car crashed into the Magic Wall";
    return "Rubber Car crashed into the Wall";
}

This of-course works. However an alternative way to avoid the Switch Statement Smell and to truly use Double Dispatch is:

In Car:

public virtual string CrashInto(Wall wall)
{
    return "Car crashed into the Wall";
}
 
public virtual string CrashInto(MagicWall wall)
{
    return "Car crashed into the Magic Wall";
}

and in RubberCar

public override string CrashInto(Wall wall)
{
    return "Rubber Car crashed into the Wall";
}
 
public override string CrashInto(MagicWall wall)
{
    return "Rubber Car crashed into the Magic Wall";
}

and its respective test:

[Test]
public void CarCanCrashIntoTheWall()
{
    Wall wall = new Wall();
    Assert.AreEqual("Car crashed into the Wall", new Car().CrashInto(wall)); 
}
 
[Test]
public void CarCanCrashIntoTheMagicWall()
{
    MagicWall wall = new MagicWall();
    Assert.AreEqual("Car crashed into the Magic Wall", new Car().CrashInto(wall));
}

and

[Test]
public void RubberCarCanCrashIntoTheWall()
{
    Wall wall = new Wall();
    Assert.AreEqual("Rubber Car crashed into the Wall", new RubberCar().CrashInto(wall)); 
}
 
[Test]
public void RubberCarCanCrashIntoTheMagicWall()
{
    MagicWall wall = new MagicWall();
    Assert.AreEqual("RubberCar crashed into the Magic Wall", new RubberCar().CrashInto(wall));
}

Believe it or not, this is Double Dispatch in action. Which concrete method to invoke, depends on runtime type of 2 objects (car and wall.)

However there is a catch with this method. If you did the following:

[Test]
public void MethodOverloadingTakesPlaceAtCompileTime()
{
    Wall wall = new MagicWall();
    Assert.AreEqual("Car crashed into the Wall", new Car().CrashInto(wall));
    //   Instead of "Car Crashed into the Magic Wall"
}

Since MagicWall’s reference is held in wall, which is of type Wall at compile type and overloaded method are bound at compile time, this method behaves unexpectedly.

One way to fix this issue is to use vars in .Net.

[Test]
public void UseOfVarForcesMethodOverloadingToTakesPlaceAtRunTime()
{
    var wall = new MagicWall();
    Assert.AreEqual("Car crashed into the Magic Wall", new Car().CrashInto(wall));
}

In Languages that don’t support Double Dispatch, one needs to do the following:

[Test]
public void ToAvoidTheConfusion()
{
    Wall wall = new MagicWall();
    Assert.AreEqual("Car crashed into the Magic Wall", wall.CollidesWith(car));
    Assert.AreEqual(0, car.DentCount);
}
 
[Test]
public void DoubleDispatch_OnWall()
{
    Wall wall = new Wall();
    Assert.AreEqual("Car crashed into the Wall", wall.CollidesWith(car));
    Assert.AreEqual(1, car.DentCount);
}

Production Code:

public class Wall
{
    public virtual string CollidesWithCar(Car car)
    {
        return car.CrashIntoWall(this);
    }   
 
    public virtual string CollidesWithRubberCar(RubberCar car)
    {
        return car.CrashIntoWall(this);
    }
}
 
public class MagicWall : Wall
{ 
    public override string CollidesWithCar(Car car)
    {
        return car.CrashIntoMagicWall(this);
    } 
 
    public override string CollidesWithRubberCar(RubberCar car)
    {
        return car.CrashIntoMagicWall(this);
    }
}
public class Car
{ 
    public virtual string CrashIntoWall(Wall wall)
    { 
        return "Car crashed into the Wall";
    }
 
    public virtual string CrashIntoMagicWall(Wall magicWall)
    {
        return "Car crashed into the Magic Wall";
    }
}
 
public class RubberCar:Car
{
    public override string CrashIntoWall(Wall wall)
    {
        return "Rubber Car crashed into the Wall";
    }
 
    public override string CrashIntoMagicWall(Wall magicWall)
    {
        return "Rubber Car crashed into the Magic Wall";
    }      
}

We can expand the same technique to use more than 2 objects to decide which concrete method to invoke at run time. This mechanism is called Multiple Dispatch.

Stop Over-Relying on your Beautifully Elegant Automated Tests

Tuesday, June 21st, 2011

Time and again, I find developers (including myself) over-relying on our automated tests. esp. unit tests which run fast and  reliably.

In the urge to save time today, we want to automate everything (which is good), but then we become blindfolded to this desire. Only later to realize that we’ve spent a lot more time debugging issues that our automated tests did not catch (leave the embarrassment caused because of this aside.)

I’ve come to believe:

A little bit of manual, sanity testing by the developer before checking in code can save hours of debugging and embarrassment.

Again this is contextual and needs personal judgement based on the nature of the change one makes to the code.

In addition to quickly doing some manual, sanity test on your machine before checking, it can be extremely important to do some exploratory testing as well. However, not always we can test things locally. In those cases, we can test them on a staging environment or on a live server. But its important for you to discover the issue much before your users encounter it.

P.S: Recently we faced Error CS0234: The type or namespace name ‘Specialized’ does not exist in the namespace ‘System.Collections’ issue, which prompted me to write this blog post.

Error CS0234: The type or namespace name ‘Specialized’ does not exist in the namespace ‘System.Collections’

Tuesday, June 21st, 2011

Industrial Logic’s eLearning has a feature where students can upload their programming exercise and get automated, personalized feedback. We do various server-side analysis (automated critique) of the student’s code to score them and to give them feedback about how well they performed in their programming exercises. To do this we need to compile their code on our server.

Recently we upgraded our .Net Compiler from version 2.0 to version 3.5. In version 2.0 we had to provide a reference to System.dll file (located in c:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\System.dll) for compiling. In version 3.5, they don’t have System.dll file. It was replaced by System.core.dll (located in c:\Program Files\Reference Assemblies\Microsoft\Framework\v3.5\System.Core.dll)

After making this change, when we ran the compiler using c:\WINDOWS\Microsoft.NET\Framework\v3.5\Csc.exe we got the following error:

Microsoft (R) Visual C# 2008 Compiler version 3.5.30729.1
for Microsoft (R) .NET Framework version 3.5
Copyright (C) Microsoft Corporation. All rights reserved.

SomeCSharpClass.cs: error CS0234: The type or namespace name ‘Specialized’ does not exist in the namespace ‘System.Collections’ (are you missing an assembly reference?)

It turns out that ‘System.collections.specialized’ namespace used to exist in System.dll in .net 2.0. In 3.5 System.Core.dll (which replaced the System.dll) does not contain it. Hence the compile time error.

We’ve fixed this issue by adding both System.Core.dll (v3.5) and System.dll (v2.0) to the compiler reference path. Not sure if this is the right thing to do. But it seems to work.

    Licensed under
Creative Commons License