Last week I posted a small code snippet for refactoring under the heading Refactoring Teaser.
In this post I’ll try to show step by step how I would try to refactor this mud ball.
First and foremost cleaned up the tests to communicate the intent. Also notice I’ve changed the test class name to ContentTest instead of StringUtilTest, which means anything and everything.
Next, I created a class called Content, instead of StringUtil. Content is a first-class domain object. Also notice, no more side-effect intense statics.
publicclass Content {privatestaticfinalString BLANK_OUTPUT ="";privatestaticfinalString SPACE =" ";privatestaticfinalString DELIMITER ="', '";privatestaticfinalString SINGLE_QUOTE ="'";privatestaticfinalint MIN_NO_WORDS =2;privatestaticfinal Pattern ON_WHITESPACES = Pattern.compile("\\p{Z}|\\p{P}");privateList phrases =newArrayList();public Content(finalString content){String[] tokens = ON_WHITESPACES.split(content);if(tokens.length> MIN_NO_WORDS){
buildAllPhrasesUptoThreeWordsFrom(tokens);}}
@Override
publicString toString(){return toPhrases(Integer.MAX_VALUE);}publicString toPhrases(finalint userRequestedSize){if(phrases.isEmpty()){return BLANK_OUTPUT;}List requiredPhrases = phrases.subList(0, numberOfPhrasesRequired(userRequestedSize));return withInQuotes(join(requiredPhrases, DELIMITER));}privateString withInQuotes(finalString phrases){return SINGLE_QUOTE + phrases + SINGLE_QUOTE;}privateint numberOfPhrasesRequired(finalint userRequestedSize){return userRequestedSize > phrases.size()? phrases.size(): userRequestedSize;}privatevoid buildAllPhrasesUptoThreeWordsFrom(finalString[] words){
buildSingleWordPhrases(words);
buildDoubleWordPhrases(words);
buildTripleWordPhrases(words);}privatevoid buildSingleWordPhrases(finalString[] words){for(int i =0; i < words.length;++i){
phrases.add(words[i]);}}privatevoid buildDoubleWordPhrases(finalString[] words){for(int i =0; i < words.length-1;++i){
phrases.add(words[i]+ SPACE + words[i +1]);}}privatevoid buildTripleWordPhrases(finalString[] words){for(int i =0; i < words.length-2;++i){
phrases.add(words[i]+ SPACE + words[i +1]+ SPACE + words[i +2]);}}}
This was a big step forward, but not good enough. Next I focused on the following code:
privatevoid buildAllPhrasesUptoThreeWordsFrom(finalString[] words){
buildSingleWordPhrases(words);
buildDoubleWordPhrases(words);
buildTripleWordPhrases(words);}privatevoid buildSingleWordPhrases(finalString[] words){for(int i =0; i < words.length;++i){
phrases.add(words[i]);}}privatevoid buildDoubleWordPhrases(finalString[] words){for(int i =0; i < words.length-1;++i){
phrases.add(words[i]+ SPACE + words[i +1]);}}privatevoid buildTripleWordPhrases(finalString[] words){for(int i =0; i < words.length-2;++i){
phrases.add(words[i]+ SPACE + words[i +1]+ SPACE + words[i +2]);}}
The above code violates the Open-Closed Principle (pdf). It also smells of duplication. Created a somewhat generic method to kill the duplication.
Now I had a feeling that my Content class was doing too much and also suffered from the primitive obsession code smell. Looked like a concept/abstraction (class) was dying to be called out. So created a Words class as an inner class.
privateclass Words {privateString[] tokens;privatestaticfinalString SPACE =" ";
Words(finalString content){
tokens = ON_WHITESPACES.split(content);}boolean has(finalint minNoWords){return tokens.length> minNoWords;}List phrasesOf(finalint length){List phrases =newArrayList();for(int i =0; i <= tokens.length- length;++i){String phrase = phraseAt(i, length);
phrases.add(phrase);}return phrases;}privateString phraseAt(finalint index, finalint length){
StringBuilder phrase =new StringBuilder(tokens[index]);for(int i =1; i < length; i++){
phrase.append(SPACE + tokens[index + i]);}return phrase.toString();}}
In the constructor of the Content class we instantiate a Words class as follows:
public Content(finalString content){
Words words =new Words(content);if(words.has(MIN_NO_WORDS)){
phrases.addAll(words.phrasesOf(ONE_WORD));
phrases.addAll(words.phrasesOf(TWO_WORDS));
phrases.addAll(words.phrasesOf(THREE_WORDS));}}
Even though this code communicates well, there is duplication and noise that can be removed without compromising on the communication.
With TDD, I’m a lot more confident about my solutions. If I spot a design improvement, I can quickly jump in and fix it with confidence. When given feedback, I’m able to respond to it quickly. I feel I’m in control.
It really helps me keep my design and code minimalistic and clean. No bells and whistles. No buy one get one free combo offers. <Perfection in design is achieved not when there is nothing more to add, but rather when there is nothing more to take away>
Turns out that my code is lot more easier to test. Because its designed to be testable. Lots of people argue that they will write tests after writing the code and it would be more efficient. The biggest problem I find with this approach is that the code ends up being something not readily testable. Then either I’ve to spend time making it testable or I skip testing saying its not possible or not required.
It helps me build a safety net of executable, living, up-to-date specification for my classes and features. Its a great starting point for new team members to understand my software. Tests are a great reference point for someone who wants to use my code.
TDD is a great teacher. It helps me listen to my code and learn from it. It forces me to pay close attention to what is happening. Its easy to spot bad things faster. Its all about feedback and visibility.
TDD forces me to slow down and think. It encourages me to take baby-steps. Sometimes I find people are in such a hurry that they spill mess all over the place. It takes soo much more time and effort to clean up the mess they leave behind.
My tests tend to communicate my design choices much longer after I’m gone.
I massively reduce the amount of time I spend in the debugger or trying to manually test (monkey test) from the UI. When something breaks, I no longer need to crawl through the logs to figure out where things are going wrong. I get pin-pointed feedback.
TDD helps me maintain focus on measurable outcome (producing software that accomplishes a concrete objective). I’m not longer drifting down ratholes.
TDD also helps me reduce the hand-overs between developers and tests and hence the wastage introduced because of all that overhead and context switching.
And so on…
Having said that, TDD alone is not sufficient to achieve the above. At times you need to spike/prototype things. One needs to have (or at least start focusing on building) a good knowledge of Design Principles and Patterns. Its easy to get lost, having a pair can really help.
Again I don’t want to sound dogmatic. I don’t think TDD is the only way to build great software. There are lots of great developers out there, building amazing software without TDD. However I think TDD is a much easier approach to achieve the same.
Recently, during a project rescue effort, I introduced Checkstyle as part of the automated build. This morning I saw the build failing because of the following Checkstyle violation:
As you can see, except the name of the class, the 2 files are identical. On one hand I’m thrilled that Checkstyle does a great job at catching such errors, on other hand I’m sad that there are developers who write such sloppy code.