Sunday, October 7, 2018

Information Technology - Performance - The Killer - Don't let your customers down



Ding... Ding.. Ding.. 
Customer calls Support team (ST):

Customer : Hey Guys, it seems that the web application (ZZZ) is not responding (or very slow). 
                   Could you check this and fix it for us?
ST            : Sure, please wait for a moment so that I will pull up all of your information
Customer:  Yeah, sure
ST            : Is your network speed good?
Customer : Yes, speed shows 100 Mbps
ST            : OK, which screen are you facing this issue?
Customer : I am seeing this issue exactly where all of our customers are being listed and 
                   I am trying to pull one of our customer's information. 
                   So, when I search that customer information, it is not loading up. 
ST            : I am trying to do the same thing here and the screen is loading instantaneously.
                   I will escalate this to the Level 2 support team and they will help you solve this problem within 24 hours

ST (L2)    : We checked the screen's loading duration to evaluate performance (This guy would have checked it during off-peak time [2 in the morning] and all of the server's diagnostic parameters are quite normal and am not seeing any anomalies) and closes the ticket (request).
From there on, customers are made to believe that they are responsible for this performance related issue and they will have to live with this delay (problem).
For a majority of the entry or junior level professionals, they ain't put themselves in the shoes of a customer and because of the SLA pressures, they close the ticket as soon as possible, without actually resolving it. 
This behaviour causes a lot of difficulties for customers to continue with their day to day work.
I have heard from users of system something like this (given below), which will be a surprise (alas!!!), but not for them.
  1. When I run the report, this report will take 10 minutes to load and I will use this time as my coffee break (wow!!!)
  2. During the shift change period, as part of hand over both shift employees will be present and hence we assumed system would perform slow due to the increase in number of users at that time (phew!!!)
    1. So, users put themselves in the shoes of a developer, empathise and are living with problems
So, it is very important for developers to know that people won't easily give feedback (in terms of reporting issues), especially when something related to performance. However, internally their  opinion about system will change, which is:  "System will always be slow and I will have to live with it".  

Always, remember this. 
Customers don’t expect you to be perfect. They do expect you to fix things when they go wrong. - Donald Porter
If you are not listening, somebody will 
Customers don't lie and will have to take their words as is 
I have been in this place at earlier stages of my career. But, when customers say that they have a problem, then they have a problem. We will have to agree (believe) to that and move on. Only then, we could help customers. By defending ourselves, we will only let down our customers, which nobody wants.

As a service provider, you will have to monitor a little performance related metrics to keep your customers happy
  1. DDOS (Distributed Denial Of Service) attacks recorded in firewall at that time
  2. Network bandwidth usage/ latency at that time
  3. Have you done reverse DNS at your server network?
    1. Use mtr (Matt's trace route) to check the network connectivity is properly working from/ to mutually
  4. Do you see any similarity when problems are reported?
    1. Specific days (When invoices are generated for that month)
    2. Specific timing of the day (When people login for the day/ logout for the day)
    3. When specific modules are accessed (A work allocation planner program that generates task schedule for all machines in a manufacturing plant) 
    4. When new projects (orders) are being created? [This is a killer, as this is what generates revenue for your customer and for your business]
    5. When specific reports are generated
  5. The points mentioned above will help narrow down the area where performance is an issue
 What other system performance metrics to look into in a timely manner?
 Set a cron (scheduler) to capture the following parameters once in 5 minutes/ 0 minutes, so that provisions are available for resolving performance related issues
  1. OS log
  2. Application log
  3. No. of open files
  4. Load average of server
  5. CPU queue length
  6. Free RAM size
  7. I/O throughput
  8. Whether database tables (NoSQL collections) have relevant indexes
  9. Slow query log (Just enable)
  10. Deadlocks in programming (if any)

If you are measuring all of these information and if all of these metrics are satisfactory, 
then if your customer report problems, you could provide this information to your customer, that there is something that they need to fix

No, I'm kidding,
When you start monitoring all of these minimal parameters, then you will start realise that there are additional parameters that you should start look into.
Eventually, you will know where the problem is and will work on fixing issue and will keep your customers happy!!! 

 Any performance improvement related program is an ongoing (never ending) journey and performance will have to be checked whenever new modules (or) new enhancements are hosted in a production environment.
​​
As a practice,
 always use latest version (Stable!!!) of Operating System, Application server and Database server.

Last but not the least,
  We succeed when our customers succeed :-)