Its been a while since the Fourth Refactoring Teaser was posted. So far, I think this is one of the trickiest refactorings I’ve tried. Refactored half of the solution and rewrote the rest of it.
Particularly thrilled about shrinkage in the code base. Getting rid of all those convoluted Strategies and Child Strategies with 2 main classes was real fun (and difficult as well). Even though the solution is not up to the mark, its come a long long way from where it was.
Ended up renaming IdentityGenerator to EmailSuggester. Renamed the PartialAcceptanceTest to EmailSuggesterTest. Also really like how that test looks now:
I’m not happy with this method. This is the roughest part of this code. All the
if(seed != lastName){
seems dodgy. But at least all of it is in one place instead of being scattered around 10 different classes with tons of duplicate code.
For each potential email data, we try to create an email address, if its available, we add it, else we move to the next potential email data, till we exhaust the list.
Given two tokens (user name and domain name), the Email class tries to creates an email address without Restricted Words and Celebrity Names in it.
30
31
32
33
34
35
privateString buildIdWithoutRestrictedWordsAndCelebrityNames(){
Email current =this;if(isCelebrityName())
current = trimLastCharacter();return buildIdWithoutRestrictedWordsAndCelebrityNames(current, 1);}
37
38
39
40
41
42
43
44
45
46
privateString buildIdWithoutRestrictedWordsAndCelebrityNames(final Email last, finalint count){if(count == MAX_ATTEMPTS)thrownewIllegalStateException("Exceeded the Max number of tries");String userName = findClosestNonRestrictiveWord(last.userName, RestrictedUserNames, 0);String domainName = findClosestNonRestrictiveWord(last.domainName, RestrictedDomainNames, 0);
Email id =new Email(userName, domainName, dns);if(!id.isCelebrityName())return id.asString();return buildIdWithoutRestrictedWordsAndCelebrityNames(id.trimLastCharacter(), count +1);}
Influenced by Functional Programming, I’ve tried to use Tail recursion and Immutable objects here.
Also to get rid of massive duplication in code, I had to introduce a new Interface and 2 anonymous inner classes.
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
publicinterface RestrictedWords {
RestrictedWords RestrictedUserNames =new RestrictedWords(){
@Override
publicboolean contains(finalString word, final DomainNameService dns){return dns.isRestrictedUserName(word);}};
RestrictedWords RestrictedDomainNames =new RestrictedWords(){
@Override
publicboolean contains(finalString word, final DomainNameService dns){return dns.isRestrictedDomainName(word);}};boolean contains(finalString word, DomainNameService dns);}
//instance variableprivatefinal List<String> restrictedWords = retrieveRestrictedWordsFromDB();publicString findClosestNonRestrictiveWord(String word){for(int i =0; i < MAX_ATTEMPTS && word.length()>1; i++){if(!restrictedWords.contains(word))return word;
word = word.substring(0, word.length()-1);}thrownewIllegalArgumentException(word +" is a restricted word");}
One would say this code is performant and uses memory optimally (time and space complexity is low) but is not the most expressive (intent revealing) code.
publicString findClosestNonRestrictiveWord(finalString word, finalint count){if(count == MAX_ATTEMPTS || word.length()<=1)thrownewIllegalArgumentException(word +" is a restricted word");if(!restrictedWords.contains(word))return word;return findClosestNonRestrictiveWord(trimLastChar(word), count +1);}privateString trimLastChar(finalString word){return word.substring(0, word.length()-1);}
I would argue that this code is much easier to understand and has very similar space and time complexity as the iterator based approach in most languages. (I know some JVMs don’t do Tail Call Optimization.)
In this specific case, there is no big win either ways because the MAX_ATTEMPTS is 5. Lets take a slightly more complete example where we have two mutable (non-final) instance variables (note this is a dumbed down version of a complex class.):
publicclass EmailAddress {privatestaticfinalint MAX_ATTEMPTS =5;privateString userName;privateString domainName;public EmailAddress(finalString userName, finalString domainName){if(isEmpty(userName)|| isEmpty(domainName))thrownewRuntimeException("User Name or Domain Name is empty or null");this.userName= userName.toLowerCase();this.domainName= domainName.toLowerCase();}publicString asString(){for(int loopCounter =0; loopCounter < MAX_ATTEMPTS;++loopCounter)if(isEmpty(userName)|| isEmpty(domainName))thrownewRuntimeException();elseif(!EmailService.isAvailable(userName, domainName)){
userName = userName.substring(0, userName.length()-1);
domainName = domainName.substring(0, domainName.length()-1);}elsereturn userName +"@"+ domainName;thrownewRuntimeException();}privateboolean isEmpty(finalString token){return token ==null|| token.length()==0;}}
Now let’s imagine you had 2 threads calling the asString method. [Blast], this code would blow apart.
Well, not a problem, we can simply make the method synchronized. But wait a second, what does this mean? I can only execute this whole method by one thread. Which means for a single CPU, single threaded application this code performs well. But the moment you have a multi-core concurrent (multi-threaded) application, this code will show no performance improvement. That sucks!
So what might look performant (esp. after compromising on expressiveness) is not really scalable. So what option do we have?
We could create a new instance of EmailAddress for every thread. But what if that’s not under your control?
Another thing we can try is reduce the scope of the two instance variables to method parameter or local copies, but what if you cannot do that trivially?
Notice the two instance variables are final. Which means the EmailAddress object is immutable. Also notice that instead of trimming the original instance variables, we create a new EmailAddress with the trimmed values, which means we don’t manipulate any object level state and two threads are not sharing any data. Hence we avoid the whole synchronization issue.
Please note that in languages that do not support Tail Call Optimization, using tail recursion has similar space and time complexity as recursion. (For long recursive calls you run the risk of stack_over_flow). Also your Stack and Heap memory will grow linearly with every recursion. More about Iteration v/s Recursion and Mutable v/s Immutable.
Most people seem to use these 2 terms interchangeably. The key difference between the two is that functions are Referentially Transparent. Which means I can safely replace the return value of the function with its method call.
Why is this important? This is very important because, functions by definition are safe and have no side-effects. No matter how many times you call that function, in any random order, its always guaranteed to return the same value. While procedures typically update the state of a class/global scope variable or interacts with an external sub-system. Hence they have side-effects which might not be very obvious by looking at the procedure’s definition.
Functional Interfaces make your code very readable and removes any order dependencies and related complexity. Also from a testing point of view and from a decoupling point of view, functions are always preferred over procedures.
One of the visual cues you can adopt to differentiate between functions and procedures is to have functions with a well defined return type, but procedures won’t have a return type. For Ex: In Java, if the return type of a method is “void”, then its a procedure.
Interesting when you have visual cues, most of your methods become functions and you have a very small set of methods (procedures) with side-effects. And procedures are very clearly highlighted and separated out.
Steve has an interesting blog on how he discovered Refactoring. He also highlights how developers have become so dependent on automated refactoring tool that they refuse to accept dynamic languages like Ruby; because it does not have automated refectoring tools yet.
Personally if I’m using a Static language like Java or C#, I really appreciate the automated refactoring tool support. But I think its lame not to embrace dynamic or funcational languages because they don’t have automated refactoring tools. In my experience the amout of refactoring tool support you need in these languages is drastically reduced because its a different programming style/pradigm.