Imagine you’re surfing through the web and all the pages show you the wait spinner for 15 seconds. How long would you continue to surf without losing your patience? How long until you move to doing something else? What if one website does this to you? Would you visit that page again without complaining? Would you recommend the website to anyone else?
Most users reject web applications when they take more than 4 or 5 seconds to load. Performance is a key factor for user adoption. And one of the key determinants of performance is geography. Let’s say the site works perfectly well for our country. Do we even think about its performance in other regions? Or on different mobile networks?
A rather sluggish app
I am the QA for a ThoughtWorks' sales application. Our users are out in the field most of the time and they need a quick, mobile-friendly way to input data into our sales systems. After announcing the initial release of our app, some of our users complained that the app’s response time was too slow and consequently they were unable to use the app. A series of video calls revealed the performance issues that these users were facing. We saw that it actually took around 3 minutes to load each page, which was obviously too slow for anyone to make use of the app.
Diagnosing the issue
Having seen the issues, we brainstormed some potential causes for poor performance. Here’s what we thought could be the case:
- We might have been testing on unrealistically high speed internet. Until then, we had tested the app on office wi-fi networks. Out in the field, the application would probably be run on mobile 2G/3G or average speed wi-fi networks.
- The servers were close to our region and there might be a chance that more remote regions were all suffering from latency and / or packet-loss.
- It could be that a country-specific firewall was interfering with some of our requests.
We had a bunch of possibilities in front of us, but we couldn’t be sure of the real cause yet.
Getting to the real culprit
Among all the above possibilities, we had to figure out the actual cause of this performance problem. We adopted the method of elimination to get to the root of this.
We first tested locally on 2G and 3G mobile networks. Our users helped us a lot in this by testing in different locations; indoors, out in the cities, as well as some remote areas. Most of them seemed quite satisfied with the results. In some areas where the mobile network coverage was not very good, we saw a few problems. Overall though, the application’s response time was close to acceptable limits.
In order to rule out latency and packet loss issues, we set up calls with our users in every region; Africa, Asia, Australia, Europe, North and South America. As it turned out, no region experienced any performance issues and everyone, with the exception of one country, was quite comfortable using the application. It was soon evident that our issue was specific to that country’s firewall.
Solving the problem
We spoke to other ThoughtWorks teams who support applications in that country to understand how they deal with performance problems there. It turned out that most of these apps were on internal office networks, so their setup was not equivalent.
We started testing the connection with pings to the app from that region. The response times were good, which confused us. After some investigation, we learned that the firewall filters traffic at the TCP layer. Pings send lower-level ICMP (IP) packets. These pass right through the firewall with little or no delay. This meant that we could use ping to test that a network path existed, but could not assume that communication would be fast or even permitted.
We then set up two computers, one to run the app and one to video the other computer. We connected the video computer to a separate network from the application computer so that the video stream didn't interfere with the application’s performance. This was particularly important when testing on mobile networks where bandwidth is constrained.
We studied the network traffic for each type of network; office-private wifi, public wifi, and mobile 3G. We learned that it is important to test on multiple mobile providers as they each implement the government’s blacklist in their own way.
Our learnings and optimisations follow.
Small changes, big impacts
- Google properties were blocked in this country. Hence, fonts.googleapis.com was not accessible. This meant that the page waited until the request timed out (60s) before loading the rest of the page. We mitigated the issue by downloading the fonts and serving them from our server. This greatly improved performance.
- Our analytics tool was not blocked, but was slow to download from its own site and delayed the loading of other assets. We made two improvements here. We now serve it up from our server and we do so in parallel with other assets so as not to delay their loading.
- Previously, static assets were minified but not compressed, which slowed load time. Enabling compression in our web server compressed them to 1/4 their previous size.
- Many other AJAX calls were run sequentially. Most of these now run in parallel.
After making the above changes the page response time dropped to 3-5 seconds depending on the internet connection. Our users were very happy with our quick turnaround time. We continued to check in with them periodically to ensure that nothing has fallen apart. Furthermore, while analysing stories we kept these points in mind and implemented them in such a way that our users won't have to go through this kind of trouble again.
And so, we learnt...
From this experience, we learned to consider a whole new dimension that we’d clearly missed at the start.
- Depending on the needs of the application, the analysis and requirements should consider from the start how geography could affect performance and other cross-functional requirements.
- User testing should be an integral part of the Software Development Lifecycle. As time passes, cost of fixing tends to get increasingly expensive.
- Testing on 2G, 3G and other mobile data networks is essential for mobile apps, especially in the locations where the users will use them.