Anthony Fajri

I am The Story of This Blog

Tips to Troubleshoot

Since most of my works deal with network and system, either in implementation and supporting, i often have to do troubleshoot.

I have some tips in troubleshooting to be shared with you my blog readers (if any hahahahah).

Steps to troubleshoot
step-troubleshoot.png
1. Reproduce the Problem
Usually, this tips will help you in supporting phase, not in project phase. If you don’t face the problem directly, or somebody reports the issue to you, reproduce the problem will help you understand the flow of the issue, later it will help you to find out the root cause of the issue.
once you can reproduce the problem, you can say that you solve 50% of the problem.

tips: if you deals with many party to solve the issue, never say that “nothing wrong happen in your system” and pass back the issue to the customer. in my opinion, it’s not polite. better to advise to customer what need to be done to solve the issue (i.e: escalate to other party). however, it depend of the SLA.

2. Isolate the Problem
this tips help you both in supporting phase or project phase. once you got the problem (by facing the problem or by reproduce the problem), try to isolate the problem.
imagine you have many blocks of system, which involve in the issue happen. try to find out the issue comes from which block.

This step might be little bit difficult and need good understanding about the system/network. once you can isolate the problem (getting the right block), you solve another 20% of the issue.

3. Find the Root Cause
After you got the block which the issue may come from, you need to find out the root cause of the problem.
i.e. you know that something wrong with windows networking.
you need to findout what is the root cause. is it the ethernet, driver, switch, etc.

once you pass this step, you will solve another 20% of the issue.

4. Find the solution
Once you got the root cause, you can try to solve the issue. i.e.: by doing reconfiguration, scripting, reinstallation, hardware replacement, etc.

well, i can say that find the solution is only 10% of the step. may be after hours or days of troubleshooting, the solution you need to add is only 1 line of command line.
well, yes.. the objective is to find out the solution.
and sometime, you can findout the solution without knowing the root cause. this one is ok, but be careful of time bomb, because same issue might be happen later.

Some time you also need to findout temporary solution and permanent solution, since permanent solution may have relation with commercial issue.

To do all the steps above, you need to know:
1. Use your skill.
Well, in my opinion, a network engineer should have the following skill: network skill, system skill, and scripting skill. on top of them, analytical skill is a must.
To do troubleshooting, you need to know which area needs to be solved. then you know how to deal with the issue.

2. Google (if possible)
Most of the time, google will help you to do the solution. but you need to know what you are looking for. google is a tool. it’s not the one to do the troubleshoot.
But sometime, you are not possible to access google. Imagine that you are in southpole, and an undersea cable was cut because of earthquake, and you need to solve this. no internet connection. of course you are not possible to use google.

3. You are working alone
When you are doing troubleshoot, especially in project phase, you are alone. nobody will help you but your self.
the reasons is: this is your responsibily, as well as only you know the design of the project.
i.e: you have routing issue. if you escalated to someone understand routing, it will be useless, unless he understand the design and the field situation.

escalating to supervisor sometime will help you. but please be remind that escalating to supervisor is not technical escalation, but more in to non-technical escalation. the supervisor will tell you what to do (i.e.: rollback, escalate to vendor, wait for something), not how to solve the problem.

if you are working in a team, remember that you are in team, in this case, alone mean your team, not you yourself.

4. Try to identify the problem as soon as possible
If you identify the problem earlier, you will have more time to do troubleshoot, or even you can prevent the issue to come.
in project phase, a good design will help to recognize the issue earlier.
while in support phase, a monitoring tool will help to identify or recognize the issue earlier.

5. Use tool (if possible)
tools will help you to the the troubleshooting step. i.e.: ethereal (wireshark) or tcpdump will help to understand the traffic. cacti/nagios/jffnms/mrtg will help to monitor.
but sometime the problem only happen 1 time, but critical.
i.e: every 3 months the routers hang.
so at this case, tool might not help, because you don’t know when it will happen. unless you can reproduce the problem.

6. focus to findout the root cause of the big issue, ignore other small problem.
I remember last month when i was in melbourne, we got server crashed. since the server is not my responsibility, i escalate the problem to the other party (which located in a country in southeast asia). i reported by email about the server crashed. but the reply he sent was asking about the timing of my email (ps: i sent email from melbourne(GMT+11), my mail server is in singapore (GMT+8), and the recipient is GMT+7), which is wasting time.

7. stop talking and complaining, start troubleshoot
some issue might be related to “whose mistake”.
if the issue happen (either in project phase or support phase), in my opinion, it’s not good to find someone to be blamed.
the most important is find the solution, either temporary solution and permanent solution, don’t waste the time to blame others.

then if the issue is solved, you may need to make a complain list to other party. the objective of this complain list is not to blame others, but to prevent the same issue (mostly non-technical issue) happen in the future. It will be fast for me to understand what is my mistake in the team if you tell me my mistake.

I face this situation before, when we do project, i was project engineer and my colleague was project manager. suddenly, we face serious issue. then my colleague called my vendor to report the issue. the problem is, my colleague and the vendor were in phone conversation around 5 hours (from 8pm to 1am), and my colleague was just complaining and complaining. fortunately, i and my other colleague as well as the other member of our vendor managed to findout the problem, and solve a 90% of the problem.

well, my point is, talking without act is bullshit.

March 16th, 2008 Posted by Anthony Fajri| information technology | one comment

Calculating Voip Bandwidth

This site provides a good tool to calculate voip bandwidth. Cisco website also provide nice explanation about the theory.

February 3rd, 2008 Posted by Anthony Fajri| information technology | no comments

Survival Tips in Datacenter

Sometimes, you have to work hours or days in Datacenter. Since you might work in operational environment, or you also might work in lab or pre-operational environment, you have to work in high concentration.

You also might work in hours or days (or even you can not leave the datacenter unless your works had been done), then you need to keep your stamina in good condition.

Below is some survival tips from me:
1. Datacenter is very cold and very noisy. So better to prepare jacket. If you have noise canceling, it’s ok to bring it.
2. Since datacenter is very cold and dry (because of aircon), better you drink mineral water every hours. Because you will not feel thirsty in datacenter, but actually your body need water.
3. Make sure how to get toilet (where is toilet, is there any key to go out to the toilet, etc). because you might go to toilet so often
4. Look around, and find something warm. The rear side of Sunfire 6800 is a good place to go to warm up your body
5. Do not expect you will find comfortable place to work, you may sit on the floor or stand for hours, depend where you can put your laptop. So, don’t forget to take a rest every hour just to walk around for few minutes.

These are some tips. I will update later.

January 1st, 2008 Posted by Anthony Fajri| information technology | one comment

Indonesian Blogger