Over the last year, I’ve been helping (part-time) Freeset build their ecommerce website. David Hussman introduced me to folks from Freeset.
Following is a list of random topics (most of them are Agile/XP practices) about this project:
Project Inception: We started off with a couple of meetings with folks from Freeset to understand their needs. David quickly created an initial vision document with User Personas and their use cases (about 2 page long on Google Docs). Naomi and John from Freeset, quickly created some screen mock-ups in Photoshop to show user interaction. I don’t think we spent more than a week on all of this. This helped us get started.
Technology Choice: When we started we had to decide what platform are we going to use to build the site. We had to choose between customer site using Rails v/s using CMS. I think David was leaning towards RoR. I talked to folks at Directi (Sandeep, Jinesh, Latesh, etc) and we thought instead of building a custom website from scratch, we should use a CMS. After a bit of research, we settled on CMS Made Simple, for the following reasons
We needed different templates for different pages on the site.
PHP: Easiest to set up a PHP site with MySQL on any Shared Host Service Provider
Planning: We started off with an hour long, bi-weekly planning meetings (conf calls on Skype) on every Saturday morning (India time). We had a massively distributed team. John was in New Zealand. David and Deborah (from BestBuy) were in US. Kerry was in UK for a short while. Naomi, Kelsea and other were in Kolkatta and I was based out of Mumbai. Because of the time zone difference and because we’re all working on this part time, the whole bi-weekly planning meeting felt awkward and heavy weight. So after about 3 such meetings we abandoned it. We created a spreadsheet on Google Docs, added all the items that had high priority and started signing up for tasks. Whenever anyone updated an item on the sheet, everyone would be notified about the change.
User Stories: We started off with User Persona and Stories, but soon we just fell back to simple tasks on a shared spreadsheet. We had quite a few user related tasks, but just one liner in the spread sheet was more than sufficient. We used this spreadsheet as a sudo-backlog. (by no means we had the rigor to try and build a proper backlog).
Short Releases: We (were) only working on production environment. Every change made by a developer was immediately live. Only recently we created a development environment (replica of production), on which we do all our development. (I asked John from Freeset, if this change helped him, he had mixed feelings. Recently he did a large website restructuring (added some new section and moved some pages around), and he found the development environment useful for that. But for other things, when he wants to make some small changes, he finds it an over kill to make changes to dev and then sync it up with production. There are also things like news, which makes sense to do on the production server. Now he has to do in both places). So I’m thinking may be, we move back to just production environment and then create a prod on demand if we are plan to make big changes.
Testing: Original we had plans of at least recording or scripting some Selenium tests to make sure the site is behaving the way we expected it to. This kind of took a back seat and never really became an issue. Recently we had a slight set back when we moved a whole bunch of pages around and their link from other parts of the site were broken. Other than that, so far, its just been fine.
Evolutionary Design: Always believed in and continue to believe in “Do the Simplest, Dumbest, thing that could Possibly work“. Since we started, the project had taken interesting turns, we used quite a lot of different JavaScript libraries, hacked a bit of PHP code here and there. All of this is evolving and is working fine.
Usability: We still have lots of usability and optimization issues on our site. Since we don’t have an expert with us and we can’t afford one, we are doing the best we can with what we have on hand. We are hoping we’ll find a volunteer some day soon to help us on this front.
Versioning: We explored various options for versioning, but as of today we don’t have any repository under which we version our site (content and code). This is a drawback of using an online CMS. Having said that so far (been over a year), we did not really find the need for versioning. As of now we have 4 people working on this site and it just seems to work fine. Reminds me of YAGNI. (May be in future when we have more collaborators, we might need this).
Continuous Integration: With out Versioning and Testing, CI is out of question.
Automated Deployment: Until recently we only had one server (production) so there was no need for deployment. Since now we have a dev and a prod environment, Devdas and I quickly hacked a simple shell scrip (with mysqldump & rsync) that does automated deployment. It can’t get simpler than this.
Hosting: We talked about hosting the site on its own slice v/s using an existing shared host account. We could always move the site to another location when our existing, cheap hosting option will not suit our needs. So as of today, I’m hosting the site under one of my shared host account.
Rich Media Content: We questioned serving & hosting rich media content like videos from our site or using YouTube to host them. We went with YouTube for the following reasons
We wanted to redirect any possible traffic to other sites which are more tuned to catering high bandwidth content
We wanted to use YouTube’s existing customer base to attract traffic to our site
Since we knew we’ll be moving to another hosting service, we did not want to keep all those videos on the server which then will have to be moved to the new server
Customer Feedback: So far we have received great feedback from users of this site. We’ve also seen a huge growth in traffic to our site. Currently hovering around 1500 hits per day. Other than getting feedback from users. We also look at Google Analytics to see how users are responding to changes we’ve made and so on.
We don’t really have/need a System Metaphor and we are not paying as much attention to refactoring. We have some light conventions but we don’t really have any coding standards. Nor do we have the luxury to pair program.
Distributed/Virtual Team: Since all of us are distributed and traveling, we don’t really have the concept of site. Forget on-site customer or product owner.
Since all of this is voluntary work, Sustainable pace takes a very different meaning. Sometimes what we do is not sustainable, but that’s the need of the hour. However all of us really like and want to work on this project. We have a sense of ownership. (collective ownership)
We’ve never really sat down and done a retrospective. May be once in a while we ask a couple of questions regarding how something were going.
Overall, I’ve been extremely happy with the choices we’ve made. I’m not suggesting every project should be run this way. I’m trying to highlight an example of what being agile really means.
When you smell complexity and lack of clarity in the air, look around, you’ll find your code swimming in a (smelly) soup of primitives (low level data-types, functions and language components). Unable to bare the stink, your code is screaming and screeching, asking you to rescue it.
This is my friend, primitive obsession, the stinkiest code smell. You can rescue your code (yes we can) by creating higher level abstractions (functions, data types, objects) and giving some sense to this anarchy.
Primitive Obsession is about lack of abstractions. In the OO world, Methods, Objects, Packages/Namespaces are ways of creating abstraction. Similarly functions, procedures, modules, etc are also valid ways of creating abstractions.
Adding more objects does not always lead to better abstraction. Sometimes removing objects is more useful.
There are many different refactorings that can be used as a remedies:
Its been a while since the Fourth Refactoring Teaser was posted. So far, I think this is one of the trickiest refactorings I’ve tried. Refactored half of the solution and rewrote the rest of it.
Particularly thrilled about shrinkage in the code base. Getting rid of all those convoluted Strategies and Child Strategies with 2 main classes was real fun (and difficult as well). Even though the solution is not up to the mark, its come a long long way from where it was.
Ended up renaming IdentityGenerator to EmailSuggester. Renamed the PartialAcceptanceTest to EmailSuggesterTest. Also really like how that test looks now:
I’m not happy with this method. This is the roughest part of this code. All the
if(seed != lastName){
seems dodgy. But at least all of it is in one place instead of being scattered around 10 different classes with tons of duplicate code.
For each potential email data, we try to create an email address, if its available, we add it, else we move to the next potential email data, till we exhaust the list.
Given two tokens (user name and domain name), the Email class tries to creates an email address without Restricted Words and Celebrity Names in it.
30
31
32
33
34
35
privateString buildIdWithoutRestrictedWordsAndCelebrityNames(){
Email current =this;if(isCelebrityName())
current = trimLastCharacter();return buildIdWithoutRestrictedWordsAndCelebrityNames(current, 1);}
37
38
39
40
41
42
43
44
45
46
privateString buildIdWithoutRestrictedWordsAndCelebrityNames(final Email last, finalint count){if(count == MAX_ATTEMPTS)thrownewIllegalStateException("Exceeded the Max number of tries");String userName = findClosestNonRestrictiveWord(last.userName, RestrictedUserNames, 0);String domainName = findClosestNonRestrictiveWord(last.domainName, RestrictedDomainNames, 0);
Email id =new Email(userName, domainName, dns);if(!id.isCelebrityName())return id.asString();return buildIdWithoutRestrictedWordsAndCelebrityNames(id.trimLastCharacter(), count +1);}
Influenced by Functional Programming, I’ve tried to use Tail recursion and Immutable objects here.
Also to get rid of massive duplication in code, I had to introduce a new Interface and 2 anonymous inner classes.
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
publicinterface RestrictedWords {
RestrictedWords RestrictedUserNames =new RestrictedWords(){
@Override
publicboolean contains(finalString word, final DomainNameService dns){return dns.isRestrictedUserName(word);}};
RestrictedWords RestrictedDomainNames =new RestrictedWords(){
@Override
publicboolean contains(finalString word, final DomainNameService dns){return dns.isRestrictedDomainName(word);}};boolean contains(finalString word, DomainNameService dns);}
When faced with Legacy Code, I’ve found 3 possible options to deal with them:
Leave it alone for now: Very rarely used, code seems to work fine.
Piecemeal Refactoring: When its difficult to understand what the code does and how it does what it does. Its time for safe, slow and cumbersome refactoring process.
Rewrite: When its clear what the code does, but it very difficult to understand how it does what it does, it time to rescue the code by rewriting it from scratch. This can be applied at various levels (whole code base, single module, class or method).
To Rewrite or to Refactor?
One can easily spend hours or days trying to refactor some code, when clearly (in retrospect) rewriting the code would be a better option. Sometimes you decide its better to rewrite the code and end up implementing something that does not work in all situations or we miss out something important. Unfortunately there is no clear guideline when I would choose to refactor code v/s rewrite the code. The key to me is, if I understand what the code does not necessarily how it does what it does, then its time to rewrite the code.
Rewriting code: Play it safe
The analogy I use is, rewriting code is like building bridges. You know that the bridge helps you get from point A to point B. It might be very complicated and risky to use the bridge any more. But that does not mean you’ll go and blow the bridge apart. Instead you would slowing start building a new bridge along side. When the new bridge is ready, you would divert a sample traffic on this bridge and see if it actually works. If it does, then you migrate all the traffic to the new bridge and blow the old bridge apart.
I use the very same technique when rewriting code. During the process, I might leave the code working but in a much more messier (worse) state. During CodeChef TechTalks in Bangalore, Sai told me that he refers to this as an “Expand and Contract” cycle. You are temporarily expanding your code base so that you can come back and clean it up.
When I’m rewriting code, I find black-box style automated tests very helpful. If you don’t have tests, it might be worth investing the time to write a few.
Where to begin Refactoring Code
Outside-In: Start from a higher-level and refactor (delve) into the crux
Inside-Out: Start refactoring the crux and work your way out
At times its difficult to identify the crux and I spend some time exploring (via refactoring) before I can choose an approach. Tests can be a great probe to understand the code.
When refactoring legacy code, I usually use the Scaffolding Technique to break the Catch 22 situation (To refactor we need tests, to write tests we need to refactor). Scaffolding tests don’t necessarily have to be UI tests, I’ve used Unit tests as scaffolding tests as well.The key thing is they are temporary and meant to help you get started.
Thanks to the folks @ the Legacy Code BoF @ CodeChef TechTalks in Bangalore who prompted me to write this blog.
Then we look at who is constructing this class, and turns out that we have this deadly SuggestionsUtil class (love the name). This class suffers with various code smells:
So far, most of the refactoring teasers we’ve looked at, have suffered because of lack of modularity and with primitive obsession. This refactoring teaser is quite the opposite. Overall the code base is decent sized. So instead of trying to solve the whole problem in one go, let’s take it one step at a time.
Following is the crux of the Sender Edge Server IP Extraction Algo:
public IPAddressExtractor(finalList receivedHeaders, final Domain recepientDomain){
Domain recepientMXRecord = recepientDomain.retrieveFirstMXRecord();for(MatchingCriterion critierion : asList(MATCHING_DOMAIN, MATCHING_IP, MATCHING_SECOND_LEVEL_DOMAIN)){
Match result =foreach(receivedHeaders).and(recepientMXRecord).match(critierion);if(result.success()){
storeSenderIPWithDistance(result);break;}}}
To do the whole Fluent interfaces on line number 21, I had to create a private method:
Other than the switch statement smell and conditional complexity, the original code was obsessed with Primitive Obsession code smell. To fix this issue, the first thing I had to do was great first class citizens (Objects). So I ended up creating
Also notice that for testing purpose we don’t want to hit the network, so I created a FakeNetwork class which stubs out all Network calls. Network is injected into all Domain classes through the DomainFactory. (I’m not very happy with this design, it feels like a bit of a hack to inject Network this way.)
Last week I posted a small code snippet for refactoring under the heading Refactoring Teaser.
In this post I’ll try to show step by step how I would try to refactor this mud ball.
First and foremost cleaned up the tests to communicate the intent. Also notice I’ve changed the test class name to ContentTest instead of StringUtilTest, which means anything and everything.
Next, I created a class called Content, instead of StringUtil. Content is a first-class domain object. Also notice, no more side-effect intense statics.
publicclass Content {privatestaticfinalString BLANK_OUTPUT ="";privatestaticfinalString SPACE =" ";privatestaticfinalString DELIMITER ="', '";privatestaticfinalString SINGLE_QUOTE ="'";privatestaticfinalint MIN_NO_WORDS =2;privatestaticfinal Pattern ON_WHITESPACES = Pattern.compile("\\p{Z}|\\p{P}");privateList phrases =newArrayList();public Content(finalString content){String[] tokens = ON_WHITESPACES.split(content);if(tokens.length> MIN_NO_WORDS){
buildAllPhrasesUptoThreeWordsFrom(tokens);}}
@Override
publicString toString(){return toPhrases(Integer.MAX_VALUE);}publicString toPhrases(finalint userRequestedSize){if(phrases.isEmpty()){return BLANK_OUTPUT;}List requiredPhrases = phrases.subList(0, numberOfPhrasesRequired(userRequestedSize));return withInQuotes(join(requiredPhrases, DELIMITER));}privateString withInQuotes(finalString phrases){return SINGLE_QUOTE + phrases + SINGLE_QUOTE;}privateint numberOfPhrasesRequired(finalint userRequestedSize){return userRequestedSize > phrases.size()? phrases.size(): userRequestedSize;}privatevoid buildAllPhrasesUptoThreeWordsFrom(finalString[] words){
buildSingleWordPhrases(words);
buildDoubleWordPhrases(words);
buildTripleWordPhrases(words);}privatevoid buildSingleWordPhrases(finalString[] words){for(int i =0; i < words.length;++i){
phrases.add(words[i]);}}privatevoid buildDoubleWordPhrases(finalString[] words){for(int i =0; i < words.length-1;++i){
phrases.add(words[i]+ SPACE + words[i +1]);}}privatevoid buildTripleWordPhrases(finalString[] words){for(int i =0; i < words.length-2;++i){
phrases.add(words[i]+ SPACE + words[i +1]+ SPACE + words[i +2]);}}}
This was a big step forward, but not good enough. Next I focused on the following code:
privatevoid buildAllPhrasesUptoThreeWordsFrom(finalString[] words){
buildSingleWordPhrases(words);
buildDoubleWordPhrases(words);
buildTripleWordPhrases(words);}privatevoid buildSingleWordPhrases(finalString[] words){for(int i =0; i < words.length;++i){
phrases.add(words[i]);}}privatevoid buildDoubleWordPhrases(finalString[] words){for(int i =0; i < words.length-1;++i){
phrases.add(words[i]+ SPACE + words[i +1]);}}privatevoid buildTripleWordPhrases(finalString[] words){for(int i =0; i < words.length-2;++i){
phrases.add(words[i]+ SPACE + words[i +1]+ SPACE + words[i +2]);}}
The above code violates the Open-Closed Principle (pdf). It also smells of duplication. Created a somewhat generic method to kill the duplication.
Now I had a feeling that my Content class was doing too much and also suffered from the primitive obsession code smell. Looked like a concept/abstraction (class) was dying to be called out. So created a Words class as an inner class.
privateclass Words {privateString[] tokens;privatestaticfinalString SPACE =" ";
Words(finalString content){
tokens = ON_WHITESPACES.split(content);}boolean has(finalint minNoWords){return tokens.length> minNoWords;}List phrasesOf(finalint length){List phrases =newArrayList();for(int i =0; i <= tokens.length- length;++i){String phrase = phraseAt(i, length);
phrases.add(phrase);}return phrases;}privateString phraseAt(finalint index, finalint length){
StringBuilder phrase =new StringBuilder(tokens[index]);for(int i =1; i < length; i++){
phrase.append(SPACE + tokens[index + i]);}return phrase.toString();}}
In the constructor of the Content class we instantiate a Words class as follows:
public Content(finalString content){
Words words =new Words(content);if(words.has(MIN_NO_WORDS)){
phrases.addAll(words.phrasesOf(ONE_WORD));
phrases.addAll(words.phrasesOf(TWO_WORDS));
phrases.addAll(words.phrasesOf(THREE_WORDS));}}
Even though this code communicates well, there is duplication and noise that can be removed without compromising on the communication.