Agile FAQs
  About   Slides   Home  

 
Managed Chaos
Naresh Jain's Random Thoughts on Software Development and Adventure Sports
     
`
 
RSS Feed

Recent Thoughts
Tags
Recent Comments

Archive for the ‘Deployment’ Category

Setting up Virtual Hosts on Mac OS X

Saturday, March 23rd, 2013

If you are building a web-app, which needs to use OAuth for user authentication across Facebook, Google, Twitter and other social media, testing the app locally, on your development machine, can be a real challenge.

On your local machine, the app URL might look like http://localhost/my_app/login.xxx while in the production environment the URL would be http://my_app.com/login.xxx

Now, when you try to test the OAuth integration, using Facebook (or any other resource server) it will not work locally. Because when you create the facebook app, you need to give the URL where the code will be located. This is different on local and production environment.

So how do you resolve this issue?

One way to resolve this issue is to set up a Virtual Host on your machine, such that your local environment have the same URL as the production code.

To achieve this, following the 4 simple steps:

1. Map your domain name to your local IP address
Add the following line to /etc/hosts file
127.0.0.1 my_app.com

Now when you request for http://my_app.com in your browser, it will direct the request to your local machine.

2. Activate virtual hosts in apache

Uncomment the following line (remove the #) in /private/etc/apache2/httpd.conf

#Include /private/etc/apache2/extra/httpd-vhosts.conf

3. Add the virtual host in apache

Add the following VHost entry to the /private/etc/apache2/extra/httpd-vhosts.conf file

<VirtualHost *:80>
    DocumentRoot "/Users/username/Sites/my_app"
    ServerName my_app.com
</VirtualHost>

4. Restart Apache
System preferences > “Sharing” > Uncheck the box “Web Sharing” – apache will stop & then check it again – apache will start.

Now, http://my_app.com/login.xxx will be served locally.

Inverting the Testing Pyramid

Tuesday, March 19th, 2013

As more and more companies are moving to the Cloud, they want their latest, greatest software features to be available to their users as quickly as they are built. However there are several issues blocking them from moving ahead.

One key issue is the massive amount of time it takes for someone to certify that the new feature is indeed working as expected and also to assure that the rest of the features will continuing to work. In spite of this long waiting cycle, we still cannot assure that our software will not have any issues. In fact, many times our assumptions about the user’s needs or behavior might itself be wrong. But this long testing cycle only helps us validate that our assumptions works as assumed.

How can we break out of this rut & get thin slices of our features in front of our users to validate our assumptions early?

Most software organizations today suffer from what I call, the “Inverted Testing Pyramid” problem. They spend maximum time and effort manually checking software. Some invest in automation, but mostly building slow, complex, fragile end-to-end GUI test. Very little effort is spent on building a solid foundation of unit & acceptance tests.

This over-investment in end-to-end tests is a slippery slope. Once you start on this path, you end up investing even more time & effort on testing which gives you diminishing returns.

In this session Naresh Jain will explain the key misconceptions that has lead to the inverted testing pyramid approach being massively adopted, main drawbacks of this approach and how to turn your organization around to get the right testing pyramid.

How to upgrade CMS Made Simple from 1.9.x.x to 1.10.x

Monday, July 30th, 2012

Recently I had the “pleasure” of upgrading from CMSMS 1.9.3 to 1.10.3.

  • Downloaded the cmsmadesimple-1.10.3-full.tar.gz
  • Unzipped it overwriting some of the existing files from the older version (1.9.3) [tar -xvf cmsmadesimple-1.10.3-full.tar.gz -C my_existing_site_installation_folder]
  • Ran the upgrade script by opening http://my-site.com/install/upgrade.php

I was constantly getting stuck at step 3, it was complaining:

Fatal error: Call to undefined method cms_config :: save () in /install/lib/classes/CMSUpgradePage3.class.php on line 30

Digging around a little bit realized cms_config is no longer available.

Then tried downloading cmsmadesimple-1.9.4.3-full.tar.gz

Luckily this time I was able to go past step 3 without any problem.

So now I was on version 1.9.4.3, but I wanted to get to 1.10.3. So

  • As per their advice, upgraded all my modules to the latest version
  • Downloaded cmsmadesimple-1.10.3-full.tar.gz,
  • Copied its contents
  • Tried to run the upgrade script.

Everything went fine, it even updated my database scheme to version 35 successfully. But then when I hit continue on step 6, it was stuck there for ever. Eventually came back with Internal Error 500. Looking at the log file, all I could see is

“2012/07/28 06:28:35 [error] 23816#0: *3319000 upstream timed out (110: Connection timed out) while reading response header from upstream”

Turns out that in 1.10, the CMSMS dev team broke a whole bunch of backward compatibility. In Step 6 of the upgrade, it tries to upgrade and install installed modules. But during this process it just conks out.

Then I tried to uninstall all my modules and run the upgrade script. Abra-kadabra the upgrade went just fine.

  • Then I had to go in and install those modules again.
  • Also had to update most of the modules to the latest version which is compatible with 1.10.
  • And restore the data used by the modules.

Only had I known all of this, it could have saved me a few hours of my precious life.

P.S: Just when I finished all of this, I saw the CMSMS dev team released the latest stable version 1.11

Various Prefixes for Ngxin’s Location Directive

Thursday, November 3rd, 2011

Often we need to create short, more expressive URLs. If you are using Nginx as a reverse proxy, one easy way to create short URLs is to define different locations under the respective server directive and then do a permanent rewrite to the actual URL in the Nginx conf file as follows:

http { 
    ....
    server {
        listen          80;
        server_name     www.agilefaqs.com agilefaqs.com;
        server_name_in_redirect on;
        port_in_redirect        on; 
 
        location ^~ /training {
            rewrite ^ http://agilefaqs.com/a/long/url/$uri permanent;  
        }
 
        location ^~ /coaching {
            rewrite ^ http://agilecoach.in$uri permanent;  
        }
 
        location = /blog {
            rewrite ^ http://blogs.agilefaqs.com/show?action=posts permanent;  
        }
 
        location / {
            root   /path/to/static/web/pages;
            index   index.html; 
        }
 
        location ~* ^.+\.(gif|jpg|jpeg|png|css|js)$ {
            add_header Cache-Control public;
            expires max;
            root   /path/to/static/content;
        }
    } 
}

I’ve been using this feature of Nginx for over 2 years, but never actually fully understood the different prefixes for the location directive.

If you check Nginx’s documentation for the syntax of the location directive, you’ll see:

location [=|~|~*|^~|@] /uri/ { ... }

The URI can be a literal string or a regular expression (regexp).

For regexps, there are two prefixes:

  • “~” for case sensitive matching
  • “~*” for case insensitive matching

If we have a list of locations using regexps, Nginx checks each location in the order its defined in the configuration file. The first regexp to match the requested url will stop the search. If no regexp matches are found, then it uses the longest matching literal string.

For example, if we have the following locations:

location ~* /.*php$ {
   rewrite ^ http://content.agilefaqs.com$uri permanent; 
}
 
location ~ /.*blogs.* {
    rewrite ^ http://blogs.agilefaqs.com$uri permanent;    
}  
 
location /blogsin {
    rewrite ^ http://agilecoach.in/blog$uri permanent;    
} 
 
location /blogsinphp {
    root   /path/to/static/web/pages;
    index   index.html; 
}

If the requested URL is http://agilefaqs.com/blogs/index.php, Nginx will permanently redirect the request to http://content.agilefaqs.com/blogs/index.php. Even though both regexps (/.*php$ and /.*blogs.*) match the requested URL, the first satisfying regexp (/.*php$) is picked and the search is terminated.

However let’s say the requested URL was http://agilefaqs.com/blogsinphp, Nginx will first consider /blogsin location and then /blogsinphp location. If there were more literal string locations, it would consider them as well. In this case, regexp locations would be skipped since /blogsinphp is the longest matching literal string.

If you want to slightly speed up this process, you should use the “=” prefix. .i.e.

location = /blogsinphp {
    root   /path/to/static/web/pages;
    index   index.html; 
}

and move this location right at the top of other locations. By doing so, Nginx will first look at this location, if its an exact literal string match, it would stop right there without looking at any other location directives.

However note that if http://agilefaqs.com/my/blogsinphp is requested, none of the literal strings will match and hence the first regexp (/.*php$) would be picked up instead of the string literal.

And if http://agilefaqs.com/blogsinphp/my is requested, again, none of the literal strings will match and hence the first matching regexp (/.*blogs.*) is selected.

What if you don’t know the exact string literal, but you want to avoid checking all the regexps?

We can achieve this by using the “^~” prefix as follows:

location = /blogsin {
    rewrite ^ http://agilecoach.in/blog$uri permanent;    
}
 
location ^~ /blogsinphp {
    root   /path/to/static/web/pages;
    index   index.html; 
}
 
location ~* /.*php$ {
   rewrite ^ http://content.agilefaqs.com$uri permanent; 
}
 
location ~ /.*blogs.* {
    rewrite ^ http://blogs.agilefaqs.com$uri permanent;    
}

Now when we request http://agilefaqs.com/blogsinphp/my, Nginx checks the first location (= /blogsin), /blogsinphp/my is not an exact match. It then looks at (^~ /blogsinphp), its not an exact match, however since we’ve used ^~ prefix, this location is selected by discarding all the remaining regexp locations.

However if http://agilefaqs.com/blogsin is requested, Nginx will permanently redirect the request to http://agilecoach.in/blog/blogsin even without considering any other locations.

To summarize:

  1. Search stops if location with “=” prefix has an exact matching literal string.
  2. All remaining literal string locations are matched. If the location uses “^~” prefix, then regexp locations are not searched. The longest matching location with “^~” prefix is used.
  3. Regexp locations are matched in the order they are defined in the configuration file. Search stops on first matching regexp.
  4. If none of the regexp matches, the longest matching literal string location is used.

Even though the order of the literal string locations don’t matter, its generally a good practice to declare the locations in the following order:

  1. start with all the “=” prefix,
  2. followed by “^~” prefix,
  3. then all the literal string locations
  4. finally all the regexp locations (since the order matters, place them with the most likely ones first)

BTW adding a break directive inside any of the location directives has not effect.

Continuous Deployment Demystified – Agile India 2012 Proposal

Tuesday, November 1st, 2011

“Release Early, Release Often” is a proven mantra, but what happens when you push this practice to it’s limits? .i.e. deploying latest code changes to the production servers every time a developer checks-in code?

At Industrial Logic, developers are deploying code dozens of times a day, rapidly responding to their customers and reducing their “code inventory”.

This talk will demonstrate our approach, deployment architecture, tools and culture needed for CD and how at Industrial Logic, we gradually got there.

Process/Mechanics

This will be a 60 mins interactive talk with a demo. Also has a small group activity as an icebreaker.

Key takeaway: When we started about 2 years ago, it felt like it was a huge step to achieve CD. Almost a all or nothing. Over the next 6 months we were able to break down the problem and achieve CD in baby steps. I think that approach we took to CD is a key take away from this session.

Talk Outline

  1. Context Setting: Need for Continuous Integration (3 mins)
  2. Next steps to CI (2 mins)
  3. Intro to Continuous Deployment (5 mins)
  4. Demo of CD at Freeset (for Content Delivery on Web) (10 mins) – a quick, live walk thru of how the deployment and servers are set up
  5. Benefits of CD (5 mins)
  6. Demo of CD for Industrial Logic’s eLearning (15 mins) – a detailed walk thru of our evolution and live demo of the steps that take place during our CD process
  7. Zero Downtime deployment (10 mins)
  8. CD’s Impact on Team Culture (5 mins)
  9. Q&A (5 mins)

Target Audience

  • CTO
  • Architect
  • Tech Lead
  • Developers
  • Operations

Context

Industrial Logic’s eLearning context? number of changes, developers, customers , etc…?

Industrial Logic’s eLearning has rich multi-media interactive content delivered over the web. Our eLearning modules (called Albums) has pictures & text, videos, quizes, programming exercises (labs) in 5 different programming languages, packing system to validate & produce the labs, plugins for different IDEs on different platforms to record programming sessions, analysis engine to score student’s lab work in different languages, commenting system, reporting system to generate different kind of student reports, etc.

We have 2 kinds of changes, eLearning platform changes (requires updating code or configuration) or content changes (either code or any other multi-media changes.) This is managed by 5 distributed contributors.

On an average we’ve seen about 12 check-ins per day.

Our customers are developers, managers and L&D teams from companies like Google, GE Energy, HP, EMC, Philips, and many other fortune 100 companies. Our customers have very high expectations from our side. We have to demonstrate what we preach.

Learning outcomes

  • General Architectural considerations for CD
  • Tools and Cultural change required to embrace CD
  • How to achieve Zero-downtime deploys (including databases)
  • How to slice work (stories) such that something is deployable and usable very early on
  • How to build different visibility levels such that new/experimental features are only visible to subset of users
  • What Delivery tests do
  • You should walk away with some good ideas of how your company can practice CD

Slides from Previous Talks

Locked Yourself Out? Rescue your IP from CSF’s Temporary Blacklist

Sunday, October 9th, 2011

We have a few Red Hat Enterprise Linux servers, all run ConfigServer and Security (CSF), which is a Stateful Packet Inspection (SPI) firewall, Login/Intrusion Detection and Security application for Linux servers. Amongst various other things, it looks for port scans, multiple login failures and other things that it thinks are ominous, and locks out the originating IP address by rewriting the iptables firewall rules.

For example, if you try to connect to the same server via http, https, ssh and svn within some short window of time, you are quite likely to incur its wrath. Developers at Industrial Logic often lock themselves out by getting blacklisted.

Generally when this happens, we ssh into one of our other server, connect to the server that has blacklisted us, and execute the following command to see what is going on:

$ sudo /usr/sbin/csf -t

A/D IP address Port Dir Time To Live Comment
DENY 117.193.150.62 * in 9m 58s lfd – *Port Scan* detected from 117.193.150.62 (IN/India/-). 11 hits in the last 36 seconds

As you can see, csf blacklisted my IP for port scanning.

If your IP is the only record, you can flush the whole temporary block list by executing:

$ sudo /usr/sbin/csf -tf
DROP all opt — in !lo out * 117.193.150.62 -> 0.0.0.0/0
csf: 117.193.150.62 temporary block removed
csf: There are no temporary IP allows

Alternatively you can execute the following command to just remove a specific IP:

$ sudo /usr/sbin/csf -tr

The easiest way to find your (external) IP address is to visit http://www.whatsmyip.org/

If you have a static IP, then you can whitelist yourself by:

$ sudo /usr/sbin/csf -a

Upstream Connection Time Out Error in Nginx

Thursday, July 21st, 2011

Currently at Industrial Logic we use Nginx as a reverse proxy to our Tomcat web server cluster.

Today, while running a particular report with large dataset, we started getting timeouts errors. When we looked at the Nginx error.log, we found the following error:

[error] 26649#0: *9155803 upstream timed out (110: Connection timed out) 
while reading response header from upstream, 
client: xxx.xxx.xxx.xxx, server: elearning.industriallogic.com, request: 
"GET our_url HTTP/1.1", upstream: "internal_server_url", 
host: "elearning.industriallogic.com", referrer: "requested_url"

After digging around for a while, I discovered that our web server is taking more than 60 secs to respond. Nginx has a directive called proxy_read_timeout which defaults to 60 secs. It determines how long nginx will wait to get the response to a request.

In nginx.conf file, setting proxy_read_timeout to 120 secs solved our problem.

server {
    listen       80;
    server_name  elearning.industriallogic.com;
    server_name_in_redirect off;
    port_in_redirect        off;
 
    location / {
        proxy_set_header  X-Real-IP  $remote_addr;
        proxy_set_header  X-Real-Host  $host;
        proxy_read_timeout 120;
        ...
    }
    ...
}

Reverse DNS Lookup freaking out on Windows Server for Chinese IP Address

Sunday, July 10th, 2011

Recently an important client of Industrial Logic’s eLearning reported that access to our Agile eLearning website was extremely slow (23+ secs per page load.) This came as a shock; we’ve never seen such poor performance from any part of the world. Besides, a 23+ secs page load basically puts our eLearning in the category of “useless junk”.

From China From India

Notice that from China its taking 23.34 secs, while from any other country it takes less than 3 secs to load the page. Clearly the problem was when the request originated from China. We suspected network latency issues. So we tried a traceroute.

Sure enough, the traceroute does look suspicious. But then soon we realized the since traceroute and web access (http) uses different protocols, they could use completely different routes to reach the destination. (In fact, China has a law by which access to all public websites should go through the Chinese Firewall [The Great Wall]. VPN can only be used for internal server access.)

Ahh..The Great Wall! Could The Great Wall have something to do with this issue?

To nail the issue, we used a VPN from China to test our site. Great, with the VPN, we were getting 3 secs page load.

After cursing The Great Wall; just as we were exploring options for hosting our server inside The Great Wall, we noticed something strange. Certain pages were loading faster than others consistently. On further investigation, we realized that all pages served from our Windows servers were slower by at least 14 secs compared to pages served by our Linux servers.

Hmmm…somehow the content served by our Windows Server is triggering a check inside the Great Wall.

What keywords could the Great Wall be checking for?

Well, we don’t have any option other than brute forcing the keywords.

Wait a sec….we serve our content via HTTPS, could the Great Wall be looking for keywords inside a HTTPS stream? Hope not!

May be it has to do with some difference in the headers, since most firewalls look at header info to take decisions.

But after thinking a little more, it occurred to me that there cannot be any header difference (except one parameter in the URL and may be something in the Cookie.) That’s because we use Nginx as our reverse proxy. The actual content being served from Windows or Linux servers should be transparent to clients.

Just to be sure that something was not slipping by, we decided to do a small experiment. Have the exact same content served by both Windows and Linux box and see if it made any difference. Interestingly the exact same content served from Windows server is still slow by at least 14 secs.

Let’s look at the server response from the browser again:

Notice the 15 secs for the initial response to the submit request. This happens only when the request is served by the Windows Server.

We had to look deeper into where those 15 secs are coming from. So we decided to take a deeper look, by using some network analysis tool. And look what we found:

A 14+ sec response from our server side. However this happens only when the request is coming from China. Since our application does not have any country specific code, who else could be interfering with this? There are 3 possibilities:

  • Firewall settings on the Windows Server: It was easy to rule this out, since we had disabled the firewall for all requests coming from our Reverse Proxy Server.
  • Our Datacenter Network Settings: To prevent against DDOS Attacks from Chinese Hackers. A possibility.
  • Low level Windows Network Stack: God knows what…

We opened a ticket with our Datacenter. They responded back with their standard response (from a template) saying: “Please check with your client’s ISP.”

Just as I was loosing hope, I explained this problem to Devdas. When he heard 14 secs delay, he immediately told me that it sounds like a standard Reverse DNS Lookup timeout.

I was pretty sure we did not do any reverse DNS lookup. Besides if we did it in our code, both Windows and Linux Servers should have the same delay.

To verify this, we installed Wire Shark on our Windows servers to monitor Reverse DNS Lookup. Sure enough, nothing showed up.

I was loosing hope by the minute. Just out of curiosity, one night, I search our whole code base for any reverse DNS lookup code. Surprise! Surprise!

I found a piece of logging code, which was taking the User IP and trying to find its host name. That has to be the culprit. But then why don’t we see the same delay on Linux server?

On further investigation, I figured that our Windows Server did not have any DNS servers configured for the private Ethernet Interface we were using, while Linux had it.

Eliminated the useless logging code and configured the right DNS servers on our Windows Servers. And guess what, all request from Windows and Linux now are served in less than 2 secs. (better than before, because we eliminated a useless reverse DNS lookup, which was timing out for China.)

This was fun! Great learning experience.

Stop Over-Relying on your Beautifully Elegant Automated Tests

Tuesday, June 21st, 2011

Time and again, I find developers (including myself) over-relying on our automated tests. esp. unit tests which run fast and  reliably.

In the urge to save time today, we want to automate everything (which is good), but then we become blindfolded to this desire. Only later to realize that we’ve spent a lot more time debugging issues that our automated tests did not catch (leave the embarrassment caused because of this aside.)

I’ve come to believe:

A little bit of manual, sanity testing by the developer before checking in code can save hours of debugging and embarrassment.

Again this is contextual and needs personal judgement based on the nature of the change one makes to the code.

In addition to quickly doing some manual, sanity test on your machine before checking, it can be extremely important to do some exploratory testing as well. However, not always we can test things locally. In those cases, we can test them on a staging environment or on a live server. But its important for you to discover the issue much before your users encounter it.

P.S: Recently we faced Error CS0234: The type or namespace name ‘Specialized’ does not exist in the namespace ‘System.Collections’ issue, which prompted me to write this blog post.

Error CS0234: The type or namespace name ‘Specialized’ does not exist in the namespace ‘System.Collections’

Tuesday, June 21st, 2011

Industrial Logic’s eLearning has a feature where students can upload their programming exercise and get automated, personalized feedback. We do various server-side analysis (automated critique) of the student’s code to score them and to give them feedback about how well they performed in their programming exercises. To do this we need to compile their code on our server.

Recently we upgraded our .Net Compiler from version 2.0 to version 3.5. In version 2.0 we had to provide a reference to System.dll file (located in c:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\System.dll) for compiling. In version 3.5, they don’t have System.dll file. It was replaced by System.core.dll (located in c:\Program Files\Reference Assemblies\Microsoft\Framework\v3.5\System.Core.dll)

After making this change, when we ran the compiler using c:\WINDOWS\Microsoft.NET\Framework\v3.5\Csc.exe we got the following error:

Microsoft (R) Visual C# 2008 Compiler version 3.5.30729.1
for Microsoft (R) .NET Framework version 3.5
Copyright (C) Microsoft Corporation. All rights reserved.

SomeCSharpClass.cs: error CS0234: The type or namespace name ‘Specialized’ does not exist in the namespace ‘System.Collections’ (are you missing an assembly reference?)

It turns out that ‘System.collections.specialized’ namespace used to exist in System.dll in .net 2.0. In 3.5 System.Core.dll (which replaced the System.dll) does not contain it. Hence the compile time error.

We’ve fixed this issue by adding both System.Core.dll (v3.5) and System.dll (v2.0) to the compiler reference path. Not sure if this is the right thing to do. But it seems to work.

    Licensed under
Creative Commons License