Agile FAQs
  About   Slides   Home  

 
Managed Chaos
Naresh Jain’s Random Thoughts on Software Development and Adventure Sports
     
`
 
Discovering...
Industrial Logic

Microblog Feed
    Previous Feeds...
    Recent Thoughts

    Recent Comments
    Categories
    Archives
    March 2010
    M T W T F S S
    « Feb    
    1234567
    891011121314
    15161718192021
    22232425262728
    293031  
    RSS Feed
    Add to Technorati Favorites

    Refactoring Teaser 1: Take 1

    Tuesday, July 14th, 2009

    Last week I posted a small code snippet for refactoring under the heading Refactoring Teaser.

    In this post I’ll try to show step by step how I would try to refactor this mud ball.

    First and foremost cleaned up the tests to communicate the intent. Also notice I’ve changed the test class name to ContentTest instead of StringUtilTest, which means anything and everything.

    public class ContentTest {
        private Content helloWorldJava = new Content("Hello World Java");
        private Content helloWorld = new Content("Hello World!");
     
        @Test
        public void ignoreContentSmallerThan3Words() {
            assertEquals("", helloWorld.toString());
        }
     
        @Test
        public void buildOneTwoAndThreeWordPhrasesFromContent() {
            assertEquals("'Hello', 'World', 'Java', 'Hello World', 'World Java', 'Hello World Java'", helloWorldJava.toPhrases(6));
        }
     
        @Test
        public void numberOfOutputPhrasesAreConfigurable() {
            assertEquals("'Hello'", helloWorldJava.toPhrases(1));
            assertEquals("'Hello', 'World', 'Java', 'Hello World'", helloWorldJava.toPhrases(4));
        }
     
        @Test
        public void returnsAllPhrasesUptoTheNumberSpecified() {
            assertEquals("'Hello', 'World', 'Java', 'Hello World', 'World Java', 'Hello World Java'", helloWorldJava.toPhrases(10));
        }
    }

    Next, I created a class called Content, instead of StringUtil. Content is a first-class domain object. Also notice, no more side-effect intense statics.

    public class Content {
        private static final String BLANK_OUTPUT = "";
        private static final String SPACE = " ";
        private static final String DELIMITER = "', '";
        private static final String SINGLE_QUOTE = "'";
        private static final int MIN_NO_WORDS = 2;
        private static final Pattern ON_WHITESPACES = Pattern.compile("\\p{Z}|\\p{P}");
        private List phrases = new ArrayList();
     
        public Content(final String content) {
            String[] tokens = ON_WHITESPACES.split(content);
            if (tokens.length > MIN_NO_WORDS) {
                buildAllPhrasesUptoThreeWordsFrom(tokens);
            }
        }
     
        @Override
        public String toString() {
            return toPhrases(Integer.MAX_VALUE);
        }
     
        public String toPhrases(final int userRequestedSize) {
            if (phrases.isEmpty()) {
                return BLANK_OUTPUT;
            }
            List requiredPhrases = phrases.subList(0, numberOfPhrasesRequired(userRequestedSize));
            return withInQuotes(join(requiredPhrases, DELIMITER));
        }
     
        private String withInQuotes(final String phrases) {
            return SINGLE_QUOTE + phrases + SINGLE_QUOTE;
        }
     
        private int numberOfPhrasesRequired(final int userRequestedSize) {
            return userRequestedSize > phrases.size() ? phrases.size() : userRequestedSize;
        }
     
        private void buildAllPhrasesUptoThreeWordsFrom(final String[] words) {
            buildSingleWordPhrases(words);
            buildDoubleWordPhrases(words);
            buildTripleWordPhrases(words);
        }
     
        private void buildSingleWordPhrases(final String[] words) {
            for (int i = 0; i < words.length; ++i) {
                phrases.add(words[i]);
            }
        }
     
        private void buildDoubleWordPhrases(final String[] words) {
            for (int i = 0; i < words.length - 1; ++i) {
                phrases.add(words[i] + SPACE + words[i + 1]);
            }
        }
     
        private void buildTripleWordPhrases(final String[] words) {
            for (int i = 0; i < words.length - 2; ++i) {
                phrases.add(words[i] + SPACE + words[i + 1] + SPACE + words[i + 2]);
            }
        }
    }

    This was a big step forward, but not good enough. Next I focused on the following code:

        private void buildAllPhrasesUptoThreeWordsFrom(final String[] words) {
            buildSingleWordPhrases(words);
            buildDoubleWordPhrases(words);
            buildTripleWordPhrases(words);
        }
     
        private void buildSingleWordPhrases(final String[] words) {
            for (int i = 0; i < words.length; ++i) {
                phrases.add(words[i]);
            }
        }
     
        private void buildDoubleWordPhrases(final String[] words) {
            for (int i = 0; i < words.length - 1; ++i) {
                phrases.add(words[i] + SPACE + words[i + 1]);
            }
        }
     
        private void buildTripleWordPhrases(final String[] words) {
            for (int i = 0; i < words.length - 2; ++i) {
                phrases.add(words[i] + SPACE + words[i + 1] + SPACE + words[i + 2]);
            }
        }

    The above code violates the Open-Closed Principle (pdf). It also smells of duplication. Created a somewhat generic method to kill the duplication.

        private void buildAllPhrasesUptoThreeWordsFrom(final String[] fromWords) {
            buildPhrasesOf(ONE_WORD, fromWords);
            buildPhrasesOf(TWO_WORDS, fromWords);
            buildPhrasesOf(THREE_WORDS, fromWords);
        }
     
        private void buildPhrasesOf(final int phraseLength, final String[] tokens) {
            for (int i = 0; i <= tokens.length - phraseLength; ++i) {
                String phrase = phraseAt(i, tokens, phraseLength);
                phrases.add(phrase);
            }
        }
     
        private String phraseAt(final int currentIndex, final String[] tokens, final int phraseLength) {
            StringBuilder phrase = new StringBuilder(tokens[currentIndex]);
            for (int i = 1; i < phraseLength; i++) {
                phrase.append(SPACE + tokens[currentIndex + i]);
            }
            return phrase.toString();
        }

    Now I had a feeling that my Content class was doing too much and also suffered from the primitive obsession code smell. Looked like a concept/abstraction (class) was dying to be called out. So created a Words class as an inner class.

        private class Words {
            private String[] tokens;
            private static final String SPACE = " ";
     
            Words(final String content) {
                tokens = ON_WHITESPACES.split(content);
            }
     
            boolean has(final int minNoWords) {
                return tokens.length > minNoWords;
            }
     
            List phrasesOf(final int length) {
                List phrases = new ArrayList();
                for (int i = 0; i <= tokens.length - length; ++i) {
                    String phrase = phraseAt(i, length);
                    phrases.add(phrase);
                }
                return phrases;
            }
     
            private String phraseAt(final int index, final int length) {
                StringBuilder phrase = new StringBuilder(tokens[index]);
                for (int i = 1; i < length; i++) {
                    phrase.append(SPACE + tokens[index + i]);
                }
                return phrase.toString();
            }
        }

    In the constructor of the Content class we instantiate a Words class as follows:

        public Content(final String content) {
            Words words = new Words(content);
            if (words.has(MIN_NO_WORDS)) {
                phrases.addAll(words.phrasesOf(ONE_WORD));
                phrases.addAll(words.phrasesOf(TWO_WORDS));
                phrases.addAll(words.phrasesOf(THREE_WORDS));
            }
        }

    Even though this code communicates well, there is duplication and noise that can be removed without compromising on the communication.

         phrases.addAll(words.phrasesOf(ONE_WORD, TWO_WORDS, THREE_WORDS));

    There are few more version after this, but I think this should give you an idea about the direction I’m heading.

    • Share/Bookmark

    Behavior (verbs) as Test Class Names

    Monday, June 15th, 2009

    For the last 3 odd years, I’ve been exploring the use of behavior (verbs, instead of Nouns, Test) as my test class names. The verb describes what behavior you expect from your system (program).

    For example for a Veterinarian Information System (system responsible for billing and patient history), I would have tests called:

    ChargeAccountForServices with the following test methods:

    • makePaymentsAgainstAnAccount()
    • completePaymentsResultInZeroAmountDueOnReceipt()
    • incompletePaymentsDisplaysDueAmountOnTheReceipt()

    Another test class: GenerateBillsForClientAccount with

    • notifyIfAccountIsNotDueForPayment()
    • billContainsTotalCostOfAllServicesTaken()

    And another test class: ManageClientVisits with

    • trackAllServicesTakenByThePatient()
    • skipVisitIfPatientDidNotTakeAnyService()

    These tests helped us flush out Objects like Account, Procedure, Visit and so on…. When we started we had no idea we’ll need these objects.

    This style is mostly influenced from a pairing session with Corey Haines post our discussion about “There is no Spoon” @ the SDTConf 2006.

    For more about this approach…read my last post…There is No Spoon (Objects)

    • Share/Bookmark

    There is No Spoon (Objects)

    Monday, June 15th, 2009

    In OO, frequently, we get caught up in modeling our objects based on our perspective of the real world. Since everyone’s perspective is different we end up with a lot of thrashing and confusion. We end up arguing about these models in the air, baselessly. For example in a Veterinarian Information System (system responsible for billing and patient history), we end up arguing whether the Patient class should have a getShot(Vaccination) or should the Doctor class have a giveShot(Vaccination) method. In the end (when we finally start implementing the system) we realize that it really does not matter who gives the shot or takes the shot. What really matters is that we have recorded what shot was given to which patient. So we end up creating a Visit class with a method called add(Procedure).

    Long story short, there are no real-world objects, its all how we perceive things. If we try to model our software based on our understanding of the real world, the software would also be as complex as our understanding of the real world. Since in software we are dealing with creating abstractions, we really want to focus on behavior rather than Objects. TDD is a great way to do this.

    We start off with a test which expects that our system will exhibit some Behavior. At this point, we don’t know or care about which Object/s will provide this behavior. To make the tests work, we start creating methods that we need from our objects on the test for the time being. Once we have our test working, then we ask yourself, where does this behavior really belong? Sometimes real world objects pop-up. Sometimes we create Objects that don’t exist in real world. This is important because it helps us make our design simpler and more expressive. Slowly we expect more behavior from our system (by writing new tests) which starts flushing out our objects.

    • Share/Bookmark

    Another Example of Thin Slice

    Sunday, May 3rd, 2009

    On the Protest project when I was building the “integration with Ant” feature, I adopted the same thin slice principle. Following are the thin slices I came up with:

    • Create an Ant task which can call Protest, it simply returns the tests in the same order as given to it. (Essentially was a copy of the JUnit Ant task)
    • Add support for a voter (happened to be Dependency Voter), so that we can actually prioritize the tests based on the dependency algorithm. At this point we went ahead and released this task
    • Add support for multiple voters. By now we had created 3 different voters and we wanted to use all the voters
    • Provide a way to specify weightage for each voter. Some voters should be able to influence the prioritized list of test more than the others
    • Once we have a prioritized list of tests, provide a way to specify what top percentage of that list should be executed. This provides the user tighter control over how much feedback they need depending on the type of change they have just made
    • And so on…

    Now we could have sat and first designed how the Ant task should look and later wrote the task and then integrated with Protest. But the problem with this approach is:

    • We won’t have anything functional and usable until we finish all the tasks. Too scary from a feedback standpoint
    • We won’t be able to test anything for real until we finish all the tasks. There is a huge risk involved in this approach
    • Essentially we are building an inventory with each of those tasks that are not used immediately. Turn around time for a feature is high
    • It requires a lot of upfront thinking, which I’m not generally good at. At least we’ll have to think through all the input each voter would need and so on. Right now we don’t even have all the voters in place and this forces us to think about them now or introduces an unwanted dependency now.
    • Lots of people argue that the evolutionary approach will be less efficient (more expensive and time consuming) because it gives an impression of thrashing and rework. In my experience, the big upfront design leads to more rework generally. It creates an illusion of streamlined process but in reality, it is actually lot more work and also leads to a rigid and over-engineered design
    • We can also add all the disadvantages of big upfront design here

    I hope this example demonstrates the technique of thin slicing and its advantages over its alternatives.

    • Share/Bookmark

    Thin Slice

    Sunday, May 3rd, 2009

    For a given feature, we can come up with multiple thin-slices which can be incrementally (sequentially) built.

    Thin Slice is the simplest possible functional, usable, end to end slice of functionality.

    Thin Slicing a feature is not a new concept. Generally development teams consider the simple happy path scenario as the first thin slice. The alternative flows are considered as subsequent slices. Today when I think of a thin slice, its slightly more sophisticated than just considering happy path and alternative paths.

    Let me explain with an example: Lets consider we are building a web app and we need a feature which allows the user to upload their profile picture.

    We can come up with the following thin slices:

    • User can upload any photo (predefined set of image format) - won’t look good coz the image size can vary and hence where ever its displayed won’t align well. Might not have AJAX support (depends on what is simple and quick to do). All the bells and whistles are pushed for later.
    • Build an image scaling facility so that we can reduce the image resolution and hence its size
    • Provide an image cropping facility so that users can crop their profile images
    • Instead of uploading an image from my disk, provide a facility to pull it from the web. The user provides the URL
    • and so on….

    Each slice is functional (end to end, we are not just doing the UI or backend bit). This is great for internal feedback. Might not be good enough for a public release. Esp. with just the first slice. In some sense we are incrementally add more power to the feature with each thin slice. From another perspective it feels like we are iterating over the feature and refining the feature as we go along. In either case, We build the feature by adding one by one the thin slice, till the point we feel we can release this feature. Post release we can continue to add more slices.

    It is also possible to come up with thin slices which are perfectly releasable. Lets take the same example and see how we could come up with thin slices which are releasable.

    • User can upload a photo (predefined set of image format + clear message showing the expected size and resolution). Any image which does not qualify this criteria is rejected.
    • Knock off the resolution constraint and build image scaling facility so that we can reduce the image resolution and hence its size
    • Knock off the image size constraint and provide an image cropping facility so that users can crop their images to fit the right size
    • And so on…

    Check out another example of Thin Slicing from the Protest Project.

    Please note that the concept of thin slice is also applicable at a much broader scope than just at a feature level. For Ex: On a given project we use the same concept to plan our small & frequent releases. At a portfolio level we apply the same principle to break projects into smaller loosely coupled projects and then prioritize them.

    • Share/Bookmark

    ToDos Gone Wild!

    Tuesday, March 31st, 2009

    Writing ToDos in your code as a reminder that some needs to be fixed is almost considered as a holy grail (universal best practice) to manage technical debt. No one seems to talk about issues using ToDos.

    While I’m not against ToDos, I think, in some cases developers build a tendency that:

    I’ve put a ToDo, I’ve done my job and now I can move on.

    I’ve a problem with this behavior. In some cases there might be tangential work that is worth marking as a ToDo, but in lot of cases what should be fixed now, gets pushed off for later. Slowly the code quality keeps dropping and technical debt sets in.

    Building on this thought process, ToDos can encourage wrong behavior of not fixing stuff at the source and pushing them in the future. It’s kind of the broken window syndrome:

    There are so many ToDos in this code, one more won’t make a big difference.

    Even if someone wants to fix some, looking at the sheer numbers, the person can get intimidated.

    Some teams have so many ToDos in their code, that the team develops immunity towards them by now (it losses its value).

    So please think before adding your next ToDo.

    • Share/Bookmark

    Alternative System Metaphor for ProTest

    Monday, March 9th, 2009

    On the ProTest project, instead of using the Election metaphor, we had also considered using the ‘Google Search Engine’ Metaphor.

    Each Test is like a web page on the internet with the content we are searching for. The ProTest Search Engine would go through all the test and give a TestRank (page rank) based on various relevance algorithms (strategies). Finally our search engine would do a rank aggregation and present the tests in a prioritized order.

    If we went down the Google’s search engine metaphor, we would have ended up with a highly parallel/distributed test ranking algorithms. We would have also cached the tests and associated data.

    As you can see, this design would be very different from what we have now by using the Election metaphor. System Metaphors are really powerful and helpful, if used well.

    • Share/Bookmark

    ProTest’s System Metaphor

    Saturday, March 7th, 2009

    On the ProTest project, we are using the Election as our System Metaphor to identify the key objects and their interactions. .i.e. to explain the logical design.

    ProTest is a library to prioritize your tests such that you get fastest feedback by executing tests that are most likely to fail first. We use different strategies like a Dependency strategy which orders the tests based on dependencies of recently changed classes. If class ‘A’ was changed, then it makes sense to run all the tests that have a dependency on class ‘A’ first. We plan to have other strategies like Last failed test strategy which will order tests based on all the tests that failed in the last run, first. Others using cyclomatic complexity, test coverage and so on.

    The Election Metaphor

    All the tests that need to be executed are candidates standing for election (trying to get executed first). Each strategy is a voter, who votes the candidates. (We had to slightly change this metaphor. In our case, a voter can vote for multiple candidates.) Once all the voters cast their votes, we do a rank aggregation to determine the winners and hence come up with a prioritized list of tests. We plan to further enhance the metaphor to provide different weightages for each voter. Basically some voters are more powerful than the others.

    Recently I was explaining the project to team from Bolivia and this metaphor really helped. I wonder if this metaphor would make sense to the Chinese. ;)

    • Share/Bookmark

    Another Project Rescue Report

    Monday, February 9th, 2009

    Some time back, I spent 1 Week helping a project (Server written in Java) clear its Technical Debt. The code base is tiny because it leverages lot of existing server framework to do its job. This server handles extremely high volumes of data & request and is a very important part of our server infrastructure. Here are some results:

    Topic Before After
    Project Size Production Code

    • Package =1
    • Classes =4
    • Methods = 15 (average 3.75/class)
    • LOC = 172 (average 11.47/method and 43/class)
    • Average Cyclomatic Complexity/Method = 3.27

    Test Code

    • Package =0
    • Classes = 0
    • Methods = 0
    • LOC = 0
    Production Code

    • Package = 4
    • Classes =13
    • Methods = 68 (average 5.23/class)
    • LOC = 394 (average 5.79/method and 30.31/class)
    • Average Cyclomatic Complexity/Method = 1.58

    Test Code

    • Package = 6
    • Classes = 11
    • Methods = 90
    • LOC =458
    Code Coverage
    • Line Coverage: 0%
    • Block Coverage: 0%

    Old Code Coverage Report

    • Line Coverage: 96%
    • Block Coverage: 97%

    New Code Coverage Report

    Cyclomatic Complexity

    Cyclomatic Complexity report before Refactoring

    Cyclomatic Complexity report after Refactoring

    Obvious Dead Code Following public methods:

    • class DatabaseLayer: releasePool()

    Total: 1 method in 1 class

    Following public methods:

    • class DFService: overloaded constructor

    Total: 1 method in 1 class

    Note: This method is required by the tests.

    Automation
    Version Control Usage
    • Average Commits Per Day = 0
    • Average # of Files Changed Per Commit = 12
    • Average Commits Per Day = 7
    • Average # of Files Changed Per Commit = 4
    Coding Convention Violation 96 0

    Another similar report.

    • Share/Bookmark

    Project Rescue Report

    Monday, February 2nd, 2009

    Recently I spent 2 Weeks helping a project clear its Technical Debt. Here are some results:

    Topic Before After
    Project Size Production Code

    • Package = 7
    • Classes = 23
    • Methods = 104 (average 4.52/class)
    • LOC = 912 (average 8.77/method and 39.65/class)
    • Average Cyclomatic Complexity/Method = 2.04

    Test Code

    • Package = 1
    • Classes = 10
    • Methods = 92
    • LOC = 410
    Production Code

    • Package = 4
    • Classes = 20
    • Methods = 89 (average 4.45/class)
    • LOC = 627 (average 7.04/method and 31.35/class)
    • Average Cyclomatic Complexity/Method = 1.79

    Test Code

    • Package = 4
    • Classes = 18
    • Methods = 120
    • LOC = 771
    Code Coverage
    • Line Coverage: 46%
    • Block Coverage: 43%

    Coverage report before Refactoring

    • Line Coverage: 94%
    • Block Coverage: 96%

    Coverage report after refactoring

    Cyclomatic Complexity

    Cyclomatic Complexity report before Refactoring

    Cyclomatic Complexity report after Refactoring

    Obvious Dead Code Following public methods:

    • class CryptoUtils: String getSHA1HashOfString(String), String encryptString(String), String decryptString(String)
    • class DbLogger: writeToTable(String, String)
    • class DebugUtils: String convertListToString(java.util.List), String convertStrArrayToString(String)
    • class FileSystem: int getNumLinesInFile(String)

    Total: 7 methods in 4 classes

    Following public methods:

    • class BackgroundDBWriter: stop()

    Total: 1 method in 1 class

    Note: This method is required by the tests.

    Automation
    Version Control Usage
    • Average Commits Per Day = 1
    • Average # of Files Changed Per Commit = 2
    • Average Commits Per Day = 4
    • Average # of Files Changed Per Commit = 9

    Note: Since we are heavily refactoring, lots of files are touched for each commit. But the frequency of commit is fairly high to ensure we are not taking big leaps.

    Coding Convention Violation 976 0

    Something interesting to watch out is how the production code becomes more crisp (fewer packages, classes and LOC) and how the amount of test code becomes greater than the production code.

    Another similar report.

    • Share/Bookmark
        Licensed under
    Creative Commons License
    Design by vikivix