Imagine you’re surfing through the web and all the pages show you the wait spinner for 15 seconds. How long would you continue to surf without losing your patience? How long until you move to doing something else? What if one website does this to you? Would you visit that page again without complaining? Would you recommend the website to anyone else?
Most users reject web applications when they take more than 4 or 5 seconds to load. Performance is a key factor for user adoption. And one of the key determinants of performance is geography. Let’s say the site works perfectly well for our country. Do we even think about its performance in other regions? Or on different mobile networks?
I am the QA for a Thoughtworks' sales application. Our users are out in the field most of the time and they need a quick, mobile-friendly way to input data into our sales systems. After announcing the initial release of our app, some of our users complained that the app’s response time was too slow and consequently they were unable to use the app. A series of video calls revealed the performance issues that these users were facing. We saw that it actually took around 3 minutes to load each page, which was obviously too slow for anyone to make use of the app.
Having seen the issues, we brainstormed some potential causes for poor performance. Here’s what we thought could be the case:
We had a bunch of possibilities in front of us, but we couldn’t be sure of the real cause yet.
Among all the above possibilities, we had to figure out the actual cause of this performance problem. We adopted the method of elimination to get to the root of this.
We first tested locally on 2G and 3G mobile networks. Our users helped us a lot in this by testing in different locations; indoors, out in the cities, as well as some remote areas. Most of them seemed quite satisfied with the results. In some areas where the mobile network coverage was not very good, we saw a few problems. Overall though, the application’s response time was close to acceptable limits.
In order to rule out latency and packet loss issues, we set up calls with our users in every region; Africa, Asia, Australia, Europe, North and South America. As it turned out, no region experienced any performance issues and everyone, with the exception of one country, was quite comfortable using the application. It was soon evident that our issue was specific to that country’s firewall.
We spoke to other Thoughtworks teams who support applications in that country to understand how they deal with performance problems there. It turned out that most of these apps were on internal office networks, so their setup was not equivalent.
We started testing the connection with pings to the app from that region. The response times were good, which confused us. After some investigation, we learned that the firewall filters traffic at the TCP layer. Pings send lower-level ICMP (IP) packets. These pass right through the firewall with little or no delay. This meant that we could use ping to test that a network path existed, but could not assume that communication would be fast or even permitted.
We then set up two computers, one to run the app and one to video the other computer. We connected the video computer to a separate network from the application computer so that the video stream didn't interfere with the application’s performance. This was particularly important when testing on mobile networks where bandwidth is constrained.
We studied the network traffic for each type of network; office-private wifi, public wifi, and mobile 3G. We learned that it is important to test on multiple mobile providers as they each implement the government’s blacklist in their own way.
Our learnings and optimisations follow.
After making the above changes the page response time dropped to 3-5 seconds depending on the internet connection. Our users were very happy with our quick turnaround time. We continued to check in with them periodically to ensure that nothing has fallen apart. Furthermore, while analysing stories we kept these points in mind and implemented them in such a way that our users won't have to go through this kind of trouble again.
From this experience, we learned to consider a whole new dimension that we’d clearly missed at the start.