For Freeset, I’ve always been in the quest of Simplest Thing that Could Possibly Work. In a previous post, I explained how we’ve embraced an ultra-light process (call it lean, if you like) to build their e-commerce site.
In that post, I’ve talked about our wish to create a Selenium test suite for regression testing. But it never got high enough on our priority list. (esp. coz we mostly have static content served from a CMS as of now).
While that is something I wanted to tackle, last night, when I was moving Industrial Logic and Industrial XP’s site over to a new server hardware, I wanted some quick way to test if all the pages were correctly displayed after the move. This was important since we switched from Apache to Nginx. Nginx has slightly different way to handle secure pages, etc.
So I asked on Twitter, if anyone knew of a tool that could compare 2 deployments of the same website. Few people responding saying I could use curl/wget with diff recursively. That seemed like the simplest thing that could work for now. So this morning I wrote a script.
rm-Rf*&& mkdir live && cd live && wget-rkp-l5-q-np-nH http://freesetglobal.com && cd .. && mkdir dev && cd dev && wget-rkp-l5-q-np-nH http://dev.freesetglobal.com && cd .. && for i in`grep-l dev.freesetglobal.com \`find ./dev -name'*'\`` ; dosed-e's/dev.freesetglobal.com/freesetglobal.com/g'$i> $i.xx && mv$i.xx $i; done&& diff-r-y--suppress-common-lines-w-I'^.*' dev live
I’m planning to use this script to do simple regression test of our Freeset site. We have a live and a dev environment. We make changes on dev and frequently sync it up with live. I’m thinking before we sync up, we can check if we’ve made the correct changes to the intended pages. If some other pages show up in this diff that we did not expect, it’s a good way to catch such issue before the sync.
Note: One could also use diff with -q option, if all they are interested to know is which pages changes. Also note that under Mac, the sed command’s -i (inline edit) option is broken. It simply does not work as explained. If you give sed -i -e …., it ends up creating backup files with -e extension. #fail.
I have a treat for crappy code scavengers. Here is some code which has a Cyclomatic Complexity of 68 and NPath Complexity of 34,632 (this method is ONLY 189 lines long (154 NCSS)).
/*
* Main reading method
*/publicvoid read(final ByteBuffer byteBuffer)throwsException{
invalidateBuffer();// Check that the buffer is not bigger than 1 Megabyte. For security reasons// we will abort parsing when 1 Mega of queued chars was found.if(buffer.length()> maxBufferSize)thrownewException("Stopped parsing never ending stanza");
CharBuffer charBuffer = encoder.decode(byteBuffer);char[] buf = charBuffer.array();int readByte = charBuffer.remaining();// Just return if nothing was readif(readByte ==0)return;// Verify if the last received byte is an incomplete double byte characterchar lastChar = buf[readByte -1];if(lastChar >= 0xfff0){// Rewind the position one place so the last byte stays in the buffer// The missing byte should arrive in the next iteration. Once we have both// of bytes we will have the correct character
byteBuffer.position(byteBuffer.position()-1);// Decrease the number of bytes read by one
readByte--;// Just return if nothing was readif(readByte ==0)return;}
buffer.append(buf, 0, readByte);// Do nothing if the buffer only contains white spacesif(buffer.charAt(0)<=' '&& buffer.charAt(buffer.length()-1)<=' ')if("".equals(buffer.toString().trim())){// Empty the buffer so there is no memory leak
buffer.delete(0, buffer.length());return;}// Robot.char ch;boolean isHighSurrogate =false;for(int i =0; i < readByte; i++){
ch = buf[i];if(ch < 0x20 && ch != 0x9 && ch != 0xA && ch != 0xD && ch != 0x0)// Unicode characters in the range 0x0000-0x001F other than 9, A, and D are not allowed in XML// We need to allow the NULL character, however, for Flash XMLSocket clients to work.thrownewException("Disallowed character");if(isHighSurrogate){if(Character.isLowSurrogate(ch))// Everything is fine. Clean up traces for surrogates
isHighSurrogate =false;else// Trigger error. Found high surrogate not followed by low surrogatethrownewException("Found high surrogate not followed by low surrogate");}elseif(Character.isHighSurrogate(ch))
isHighSurrogate =true;elseif(Character.isLowSurrogate(ch))// Trigger error. Found low surrogate char without a preceding high surrogatethrownewException("Found low surrogate char without a preceding high surrogate");if(status == XMLLightweightParser.TAIL){// Looking for the close tagif(depth <1&& ch == head.charAt(tailCount)){
tailCount++;if(tailCount == head.length()){// Close stanza found!// Calculate the correct start,end position of the message into the bufferint end = buffer.length()- readByte + i +1;String msg = buffer.substring(startLastMsg, end);// Add message to the list
foundMsg(msg);
startLastMsg = end;}}else{
tailCount =0;
status = XMLLightweightParser.INSIDE;}}elseif(status == XMLLightweightParser.PRETAIL){if(ch == XMLLightweightParser.CDATA_START[cdataOffset]){
cdataOffset++;if(cdataOffset == XMLLightweightParser.CDATA_START.length){
status = XMLLightweightParser.INSIDE_CDATA;
cdataOffset =0;continue;}}else{
cdataOffset =0;
status = XMLLightweightParser.INSIDE;}if(ch =='/'){
status = XMLLightweightParser.TAIL;
depth--;}elseif(ch =='!')// This is a <! (comment) so ignore it
status = XMLLightweightParser.INSIDE;else
depth++;}elseif(status == XMLLightweightParser.VERIFY_CLOSE_TAG){if(ch =='>'){
depth--;
status = XMLLightweightParser.OUTSIDE;if(depth <1){// Found a tag in the form <tag />int end = buffer.length()- readByte + i +1;String msg = buffer.substring(startLastMsg, end);// Add message to the list
foundMsg(msg);
startLastMsg = end;}}elseif(ch =='<'){
status = XMLLightweightParser.PRETAIL;
insideChildrenTag =true;}else
status = XMLLightweightParser.INSIDE;}elseif(status == XMLLightweightParser.INSIDE_PARAM_VALUE){if(ch =='"')
status = XMLLightweightParser.INSIDE;}elseif(status == XMLLightweightParser.INSIDE_CDATA){if(ch == XMLLightweightParser.CDATA_END[cdataOffset]){
cdataOffset++;if(cdataOffset == XMLLightweightParser.CDATA_END.length){
status = XMLLightweightParser.OUTSIDE;
cdataOffset =0;}}else
cdataOffset =0;}elseif(status == XMLLightweightParser.INSIDE){if(ch == XMLLightweightParser.CDATA_START[cdataOffset]){
cdataOffset++;if(cdataOffset == XMLLightweightParser.CDATA_START.length){
status = XMLLightweightParser.INSIDE_CDATA;
cdataOffset =0;continue;}}else{
cdataOffset =0;
status = XMLLightweightParser.INSIDE;}if(ch =='"')
status = XMLLightweightParser.INSIDE_PARAM_VALUE;elseif(ch =='>'){
status = XMLLightweightParser.OUTSIDE;if(insideRootTag
&&("stream:stream>".equals(head.toString())||"?xml>".equals(head.toString())||"flash:stream>".equals(head
.toString()))){// Found closing stream:streamint end = buffer.length()- readByte + i +1;// Skip LF, CR and other "weird" characters that could appearwhile(startLastMsg < end &&'<'!= buffer.charAt(startLastMsg))
startLastMsg++;String msg = buffer.substring(startLastMsg, end);
foundMsg(msg);
startLastMsg = end;}
insideRootTag =false;}elseif(ch =='/')
status = XMLLightweightParser.VERIFY_CLOSE_TAG;}elseif(status == XMLLightweightParser.HEAD){if(ch ==' '|| ch =='>'){// Append > to head to allow searching </tag>
head.append(">");if(ch =='>')
status = XMLLightweightParser.OUTSIDE;else
status = XMLLightweightParser.INSIDE;
insideRootTag =true;
insideChildrenTag =false;continue;}elseif(ch =='/'&& head.length()>0){
status = XMLLightweightParser.VERIFY_CLOSE_TAG;
depth--;}
head.append(ch);}elseif(status == XMLLightweightParser.INIT){if(ch =='<'){
status = XMLLightweightParser.HEAD;
depth =1;}else
startLastMsg++;}elseif(status == XMLLightweightParser.OUTSIDE)if(ch =='<'){
status = XMLLightweightParser.PRETAIL;
cdataOffset =1;
insideChildrenTag =true;}}if(head.length()>0&&("/stream:stream>".equals(head.toString())||"/flash:stream>".equals(head.toString())))// Found closing stream:stream
foundMsg("</stream:stream>");}
What does this code actually do?
This method is inside a LightWeightXMLParser. It reads data from a socket channel (java nio) and collects data until data is available on the channel. When a message is complete (fully formed XML), you can retrieve messages by invoking the getMsgs() method and you can invoke areThereMsgs() method to know if at least a message is presents.
86
87
88
89
90
91
92
93
94
95
96
/*
* @return an array with all messages found
*/publicString[] getMsgs(){String[] res =newString[msgs.size()];for(int i =0; i < res.length; i++)
res[i]= msgs.get(i);
msgs.clear();
invalidateBuffer();return res;}
Following Tests might help you understand the code slightly better:
16
17
18
19
20
21
22
23
@Override
protectedvoid setUp()throwsException{super.setUp();// Create parser
parser =new LightWeightXMLParser(CHARSET);// Crete byte buffer and append text
in = ByteBuffer.allocate(4096);}
publicvoid testHeader()throwsException{String msg1 ="<stream:stream to=\"localhost\" xmlns=\"jabber:client\" xmlns:stream=\"http://etherx.jabber.org/streams\" version=\"1.0\">";
in.put(msg1.getBytes());
in.flip();// Fill parser with byte buffer content and parse it
parser.read(in);// Make verifications
assertTrue("Stream header is not being correctly parsed", parser.areThereMsgs());
assertEquals("Wrong stanza was parsed", msg1, parser.getMsgs()[0]);}
43
44
45
46
47
48
49
50
51
52
53
54
55
56
publicvoid testHeaderWithXMLVersion()throwsException{String msg1 ="<?xml version=\"1.0\"?>";String msg2 ="<stream:stream to=\"localhost\" xmlns=\"jabber:client\" xmlns:stream=\"http://etherx.jabber.org/streams\" version=\"1.0\">";
in.put((msg1 + msg2).getBytes());
in.flip();// Fill parser with byte buffer content and parse it
parser.read(in);// Make verifications
assertTrue("Stream header is not being correctly parsed", parser.areThereMsgs());String[] values = parser.getMsgs();
assertEquals("Wrong number of parsed stanzas", 2, values.length);
assertEquals("Wrong stanza was parsed", msg1, values[0]);
assertEquals("Wrong stanza was parsed", msg2, values[1]);}
publicvoid testCompleteStanzas()throwsException{String msg1 ="<stream:stream to=\"localhost\" xmlns=\"jabber:client\" xmlns:stream=\"http://etherx.jabber.org/streams\" version=\"1.0\">";String msg2 ="<starttls xmlns=\"urn:ietf:params:xml:ns:xmpp-tls\"/>";String msg3 ="<stream:stream to=\"localhost\" xmlns=\"jabber:client\" xmlns:stream=\"http://etherx.jabber.org/streams\" version=\"1.0\">";String msg4 ="<iq id=\"428qP-0\" to=\"localhost\" type=\"get\"><query xmlns=\"jabber:iq:register\"></query></iq>";String msg5 ="<stream:stream to=\"localhost\" xmlns=\"jabber:client\" xmlns:stream=\"http://etherx.jabber.org/streams\" version=\"1.0\">";String msg6 ="<presence id=\"428qP-5\"></presence>";String msg7 ="</stream:stream>";
in.put(msg1.getBytes());
in.put(msg2.getBytes());
in.put(msg3.getBytes());
in.put(msg4.getBytes());
in.put(msg5.getBytes());
in.put(msg6.getBytes());
in.put(msg7.getBytes());
in.flip();// Fill parser with byte buffer content and parse it
parser.read(in);// Make verifications
assertTrue("Stream header is not being correctly parsed", parser.areThereMsgs());String[] values = parser.getMsgs();
assertEquals("Wrong number of parsed stanzas", 7, values.length);
assertEquals("Wrong stanza was parsed", msg1, values[0]);
assertEquals("Wrong stanza was parsed", msg2, values[1]);
assertEquals("Wrong stanza was parsed", msg3, values[2]);
assertEquals("Wrong stanza was parsed", msg4, values[3]);
assertEquals("Wrong stanza was parsed", msg5, values[4]);
assertEquals("Wrong stanza was parsed", msg6, values[5]);
assertEquals("Wrong stanza was parsed", msg7, values[6]);}
117
118
119
120
121
122
123
124
125
126
127
publicvoid testIQ()throwsException{String iq ="<iq type=\"set\" to=\"lachesis\" from=\"0sups/Connection Worker - 1\" id=\"360-22348\"><session xmlns=\"http://jabber.org/protocol/connectionmanager\" id=\"0sups87b1694\"><close/></session></iq>";
in.put(iq.getBytes());
in.flip();// Fill parser with byte buffer content and parse it
parser.read(in);// Make verifications
assertTrue("Stream header is not being correctly parsed", parser.areThereMsgs());String parsedIQ = parser.getMsgs()[0];
assertEquals("Wrong stanza was parsed", iq, parsedIQ);}
129
130
131
132
133
134
135
136
137
138
139
140
publicvoid testNestedElements()throwsException{String msg1 ="<message><message xmlns=\"e\">1</message></message>";
in.put(msg1.getBytes());
in.flip();// Fill parser with byte buffer content and parse it
parser.read(in);// Make verifications
assertTrue("Stream header is not being correctly parsed", parser.areThereMsgs());String[] values = parser.getMsgs();
assertEquals("Wrong number of parsed stanzas", 1, values.length);
assertEquals("Wrong stanza was parsed", msg1, values[0]);}
142
143
144
145
146
147
148
149
150
publicvoid testIncompleteStanza()throwsException{String msg1 ="<message><something xmlns=\"http://idetalk.com/namespace\">12";
in.put(msg1.getBytes());
in.flip();// Fill parser with byte buffer content and parse it
parser.read(in);// Make verifications
assertFalse("Found messages in incomplete stanza", parser.areThereMsgs());}
publicvoid testCompletedStanza()throwsException{String msg1 ="<message><something xmlns=\"http://idetalk.com/namespace\">12";
in.put(msg1.getBytes());
in.flip();// Fill parser with byte buffer content and parse it
parser.read(in);// Make verifications
assertFalse("Found messages in incomplete stanza", parser.areThereMsgs());String msg2 ="</something></message>";
ByteBuffer in2 = ByteBuffer.allocate(4096);
in2.put(msg2.getBytes());
in2.flip();// Fill parser with byte buffer content and parse it
parser.read(in2);
in2.clear();
assertTrue("Stream header is not being correctly parsed", parser.areThereMsgs());String[] values = parser.getMsgs();
assertEquals("Wrong number of parsed stanzas", 1, values.length);
assertEquals("Wrong stanza was parsed", msg1 + msg2, values[0]);}
196
197
198
199
200
201
202
203
204
205
206
207
publicvoid testStanzaWithComments()throwsException{String msg1 ="<iq from=\"lg@jabber.org/spark\"><query xmlns=\"jabber:iq:privacy\"><!-- silly comment --></query></iq>";
in.put(msg1.getBytes());
in.flip();// Fill parser with byte buffer content and parse it
parser.read(in);// Make verifications
assertTrue("No messages were found in stanza", parser.areThereMsgs());String[] values = parser.getMsgs();
assertEquals("Wrong number of parsed stanzas", 1, values.length);
assertEquals("Wrong stanza was parsed", msg1, values[0]);}
4. One other technique I find useful sometimes is to have my test implement or extend the dependency (class or interface). So the test acts as the real dependency.
Its been a while since the Fourth Refactoring Teaser was posted. So far, I think this is one of the trickiest refactorings I’ve tried. Refactored half of the solution and rewrote the rest of it.
Particularly thrilled about shrinkage in the code base. Getting rid of all those convoluted Strategies and Child Strategies with 2 main classes was real fun (and difficult as well). Even though the solution is not up to the mark, its come a long long way from where it was.
Ended up renaming IdentityGenerator to EmailSuggester. Renamed the PartialAcceptanceTest to EmailSuggesterTest. Also really like how that test looks now:
I’m not happy with this method. This is the roughest part of this code. All the
if(seed != lastName){
seems dodgy. But at least all of it is in one place instead of being scattered around 10 different classes with tons of duplicate code.
For each potential email data, we try to create an email address, if its available, we add it, else we move to the next potential email data, till we exhaust the list.
Given two tokens (user name and domain name), the Email class tries to creates an email address without Restricted Words and Celebrity Names in it.
30
31
32
33
34
35
privateString buildIdWithoutRestrictedWordsAndCelebrityNames(){
Email current =this;if(isCelebrityName())
current = trimLastCharacter();return buildIdWithoutRestrictedWordsAndCelebrityNames(current, 1);}
37
38
39
40
41
42
43
44
45
46
privateString buildIdWithoutRestrictedWordsAndCelebrityNames(final Email last, finalint count){if(count == MAX_ATTEMPTS)thrownewIllegalStateException("Exceeded the Max number of tries");String userName = findClosestNonRestrictiveWord(last.userName, RestrictedUserNames, 0);String domainName = findClosestNonRestrictiveWord(last.domainName, RestrictedDomainNames, 0);
Email id =new Email(userName, domainName, dns);if(!id.isCelebrityName())return id.asString();return buildIdWithoutRestrictedWordsAndCelebrityNames(id.trimLastCharacter(), count +1);}
Influenced by Functional Programming, I’ve tried to use Tail recursion and Immutable objects here.
Also to get rid of massive duplication in code, I had to introduce a new Interface and 2 anonymous inner classes.
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
publicinterface RestrictedWords {
RestrictedWords RestrictedUserNames =new RestrictedWords(){
@Override
publicboolean contains(finalString word, final DomainNameService dns){return dns.isRestrictedUserName(word);}};
RestrictedWords RestrictedDomainNames =new RestrictedWords(){
@Override
publicboolean contains(finalString word, final DomainNameService dns){return dns.isRestrictedDomainName(word);}};boolean contains(finalString word, DomainNameService dns);}
Of late I’ve been toying around with a new way of using Fluent Interfaces with a Context Object for my Tests. Esp. when I’m using Mockito.
In this post (Fluent Interfaces improve readability of my Tests), I’ve taken an example and demonstrated how I’ve evolved my tests to be more expressive. In my quest for getting my tests to communicate precisely to-the-point by hiding everything else which is noise, I’ve stared exploring another way of using Fluent Interfaces.
lets and on are both Context objects which provide fluent, domain specific api to make the test very easy to read (communicative and expressive). It also helps me hide all my mocking/stubbing related code.
If you compare this with the original code, you can get a sense of what I’m talking about:
Recently I was working on some code. The code was trying to tell me many things, but I was not sure if I was understanding what it was trying to communicate. It just felt irrelevant or noise at that moment. Somehow the right level of abstraction was missing.
As you can see, my first reaction after looking at this code was that there is too much going on, most of which is duplicate. So cleaned it up a bit and made it more expressive by
By introducing a new class called Context and moving all the mocking code into that, my test looked lot more clear. I was also able to create an abstraction that could communicate intent much more easily.
Next I reduced the clutter further by creating another level of abstraction as follows
But at this point, even though the code ended up being very dense, it was very difficult to understand what was going on and why so. In a desperate search for simplicity and better communication, I ended up with
What is interesting about this is that I made some simple assumption saying:
every name is not a celebrity name unless specified
every user name is a valid (non-restricted) user name unless specified
every domain name is a valid (non-restricted) domain name unless specified
every identity is available unless specified
All these assumptions are now capture in my Context object and rest of my tests can happily focus on what really matters. I really liked the way this reduced the clutter in my tests without compromising on communication.
When faced with Legacy Code, I’ve found 3 possible options to deal with them:
Leave it alone for now: Very rarely used, code seems to work fine.
Piecemeal Refactoring: When its difficult to understand what the code does and how it does what it does. Its time for safe, slow and cumbersome refactoring process.
Rewrite: When its clear what the code does, but it very difficult to understand how it does what it does, it time to rescue the code by rewriting it from scratch. This can be applied at various levels (whole code base, single module, class or method).
To Rewrite or to Refactor?
One can easily spend hours or days trying to refactor some code, when clearly (in retrospect) rewriting the code would be a better option. Sometimes you decide its better to rewrite the code and end up implementing something that does not work in all situations or we miss out something important. Unfortunately there is no clear guideline when I would choose to refactor code v/s rewrite the code. The key to me is, if I understand what the code does not necessarily how it does what it does, then its time to rewrite the code.
Rewriting code: Play it safe
The analogy I use is, rewriting code is like building bridges. You know that the bridge helps you get from point A to point B. It might be very complicated and risky to use the bridge any more. But that does not mean you’ll go and blow the bridge apart. Instead you would slowing start building a new bridge along side. When the new bridge is ready, you would divert a sample traffic on this bridge and see if it actually works. If it does, then you migrate all the traffic to the new bridge and blow the old bridge apart.
I use the very same technique when rewriting code. During the process, I might leave the code working but in a much more messier (worse) state. During CodeChef TechTalks in Bangalore, Sai told me that he refers to this as an “Expand and Contract” cycle. You are temporarily expanding your code base so that you can come back and clean it up.
When I’m rewriting code, I find black-box style automated tests very helpful. If you don’t have tests, it might be worth investing the time to write a few.
Where to begin Refactoring Code
Outside-In: Start from a higher-level and refactor (delve) into the crux
Inside-Out: Start refactoring the crux and work your way out
At times its difficult to identify the crux and I spend some time exploring (via refactoring) before I can choose an approach. Tests can be a great probe to understand the code.
When refactoring legacy code, I usually use the Scaffolding Technique to break the Catch 22 situation (To refactor we need tests, to write tests we need to refactor). Scaffolding tests don’t necessarily have to be UI tests, I’ve used Unit tests as scaffolding tests as well.The key thing is they are temporary and meant to help you get started.
Thanks to the folks @ the Legacy Code BoF @ CodeChef TechTalks in Bangalore who prompted me to write this blog.