Troubleshooting Process CCNA
Troubleshooting Process CCNA

Troubleshooting Process

Troubleshooting Process
5

Summary

This topic compare troubleshooting methods that use a systematic, layered approach. Start learning CCNA 200-301 for free right now!!

Note: Welcome: This topic is part of Module 12 of the Cisco CCNA 3 course, for a better follow up of the course you can go to the CCNA 3 section to guide you through an order.

General Troubleshooting Procedures

Troubleshooting can be time consuming because networks differ, problems differ, and troubleshooting experience varies. However, experienced administrators know that using a structured troubleshooting method will shorten overall troubleshooting time.

Therefore, the troubleshooting process should be guided by structured methods. This requires well defined and documented troubleshooting procedures to minimize wasted time associated with erratic hit-and-miss troubleshooting. However, these methods are not static. The troubleshooting steps taken to solve a problem are not always the same or executed in the exact same order.

There are several troubleshooting processes that can be used to solve a problem. The figure displays the logic flowchart of a simplified three-stage troubleshooting process. However, a more detailed process may be more helpful to solve a network problem.

General Troubleshooting Procedures
General Troubleshooting Procedures

Seven-Step Troubleshooting Process

The figure displays a more detailed seven-step troubleshooting process. Notice how some steps interconnect. This is because, some technicians may be able to jump between steps based on their level of experience.

Seven-Step Troubleshooting Process
Seven-Step Troubleshooting Process

Click each button for a detailed description of the steps to solve a network problem.

The goal of this stage is to verify that there is a problem and then properly define what the problem is. Problems are usually identified by a symptom (e.g., the network is slow or has stopped working). Network symptoms may appear in many different forms, including alerts from the network management system, console messages, and user complaints.

While gathering symptoms, it is important to ask questions and investigate the issue in order to localize the problem to a smaller range of possibilities. For example, is the problem restricted to a single device, a group of devices, or an entire subnet or network of devices?

In an organization, problems are typically assigned to network technicians as trouble tickets. These tickets are created using trouble ticketing software that tracks the progress of each ticket. Trouble ticketing software may also include a self-service user portal to submit tickets, access to a searchable trouble tickets knowledge base, remote control capabilities to solve end-user issues, and more.

In this step, targets (i.e., hosts, devices) to be investigated must be identified, access to the target devices must be obtained, and information gathered. During this step, the technician may gather and document more symptoms, depending on the characteristics that are identified.

If the problem is outside the boundary of the organization’s control (e.g., lost internet connectivity outside of the autonomous system), contact an administrator for the external system before gathering additional network symptoms.

Possible causes must be identified. The gathered information is interpreted and analyzed using network documentation, network baselines, searching organizational knowledge bases, searching the internet, and talking with other technicians.

If multiple causes are identified, then the list must be reduced by progressively eliminating possible causes to eventually identify the most probable cause. Troubleshooting experience is extremely valuable to quickly eliminate causes and identify the most probable cause.

When the most probable cause has been identified, a solution must be formulated. At this stage, troubleshooting experience is very valuable when proposing a plan.

Before testing the solution, it is important to assess the impact and urgency of the problem. For instance, could the solution have an adverse effect on other systems or processes? The severity of the problem should be weighed against the impact of the solution. For example, if a critical server or router must be offline for a significant amount of time, it may be better to wait until the end of the workday to implement the fix. Sometimes, a workaround can be created until the actual problem is resolved.

Create a rollback plan identifying how to quickly reverse a solution. This may prove to be necessary if the solution fails.

Implement the solution and verify that it has solved the problem. Sometimes a solution introduces an unexpected problem. Therefore, it is important that a solution be thoroughly verified before proceeding to the next step.

If the solution fails, the attempted solution is documented and the changes are removed. The technician must now go back to the Gathering Information step and isolate the issue.

When the problem is solved, inform the users and anyone involved in the troubleshooting process that the problem has been resolved. Other IT team members should be informed of the solution. Appropriate documentation of the cause and the fix will assist other support technicians in preventing and solving similar problems in the future.

Question End Users

Many network problems are initially reported by an end user. However, the information provided is often vague or misleading. For example, users often report problems such as “the network is down”, “I cannot access my email”, or “my computer is slow”.

In most cases, additional information is required to fully understand a problem. This usually involves interacting with the affected user to discover the “who”, “what”, and “when” of the problem.

The following recommendations should be employed when communicate with user:

  • Speak at a technical level they can understand and avoid using complex terminology.
  • Always listen or read carefully what the user is saying. Taking notes can be helpful when documenting a complex problem.
  • Always be considerate and empathize with users while letting them know you will help them solve their problem. Users reporting a problem may be under stress and anxious to resolve the problem as quickly as possible.

When interviewing the user, guide the conversation and use effective questioning techniques to quickly ascertain the problem. For instance, use open questions (i.e., requires detailed response) and closed questions (i.e., yes, no, or single word answers) to discover important facts about the network problem.

The table provides some questioning guidelines and sample open ended end-user questions.

When done interviewing the user, repeat your understanding of the problem to the user to ensure that you both agree on the problem being reported.

Guidelines Example Open Ended End-User Questions
Ask pertinent questions.
  • What does not work?
  • What exactly is the problem?
  • What are you trying to accomplish?
Determine the scope of the problem.
  • Who does this issue affect? Is it just you or others?
  • What device is this happening on?
Determine when the problem occurred / occurs.
  • When exactly does the problem occur?
  • When was the problem first noticed?
  • Were there any error message(s) displayed?
Determine if the problem is constant or intermittent.
  • Can you reproduce the problem?
  • Can you send me a screenshot or video of the problem?
Determine if anything has changed. What has changed since the last time it did work?
Use questions to eliminate or discover possible problems.
  • What works?
  • What does not work?

Gather Information

To gather symptoms from suspected networking device, use Cisco IOS commands and other tools such as packet captures and device logs.

The table describes common Cisco IOS commands used to gather the symptoms of a network problem.

Command Description
ping {host | ip-address}
  • Sends an echo request packet to an address, then waits for a reply
  • The host or ip-address variable is the IP alias or IP address of the target system
traceroute destination
  • Identifies the path a packet takes through the networks
  • The destination variable is the hostname or IP address of the target system
telnet {host | ip-address}
  • Connects to an IP address using the Telnet application
  • Use SSH whenever possible instead of Telnet
ssh -l user-id ip-address
  • Connects to an IP address using SSH
  • SSH is more secure than Telnet
show ip interface brief 
show ipv6 interface brief
  • Displays a summary status of all interfaces on a device
  • Useful for quickly identifying IP addressing on all interfaces.
show ip route
show ipv6 route
Displays the current IPv4 and IPv6 routing tables, which contains the routes to all known network destinations
show protocols
Displays the configured protocols and shows the global and interface-specific status of any configured Layer 3 protocol
debug
Displays a list of options for enabling or disabling debugging events

Note: Although the debug command is an important tool for gathering symptoms, it generates a large amount of console message traffic and the performance of a network device can be noticeably affected. If the debug must be performed during normal working hours, warn network users that a troubleshooting effort is underway, and that network performance may be affected. Remember to disable debugging when you are done.

Troubleshooting with Layered Models

The OSI and TCP/IP models can be applied to isolate network problems when troubleshooting. For example, if the symptoms suggest a physical connection problem, the network technician can focus on troubleshooting the circuit that operates at the physical layer.

The figure shows some common devices and the OSI layers that must be examined during the troubleshooting process for that device.

Troubleshooting with Layered Models
Troubleshooting with Layered Models

Notice that routers and multilayer switches are shown at Layer 4, the transport layer. Although routers and multilayer switches usually make forwarding decisions at Layer 3, ACLs on these devices can be used to make filtering decisions using Layer 4 information.

Structured Troubleshooting Methods

There are several structured troubleshooting approaches that can be used. Which one to use will depend on the situation. Each approach has its advantages and disadvantages. This topic describes methods and provides guidelines for choosing the best method for a specific situation.

Click each button for a description of the different troubleshooting approaches that can be used.

In bottom-up troubleshooting, you start with the physical components of the network and move up through the layers of the OSI model until the cause of the problem is identified, as shown in the figure.

Bottom-up troubleshooting is a good approach to use when the problem is suspected to be a physical one. Most networking problems reside at the lower levels, so implementing the bottom-up approach is often effective.

The disadvantage with the bottom-up troubleshooting approach is it requires that you check every device and interface on the network until the possible cause of the problem is found. Remember that each conclusion and possibility must be documented so there can be a lot of paper work associated with this approach. A further challenge is to determine which devices to start examining first.

Bottom-Up
Bottom-Up

In the figure, top-down troubleshooting starts with the end-user applications and moves down through the layers of the OSI model until the cause of the problem has been identified.

End-user applications of an end system are tested before tackling the more specific networking pieces. Use this approach for simpler problems, or when you think the problem is with a piece of software.

The disadvantage with the top-down approach is it requires checking every network application until the possible cause of the problem is found. Each conclusion and possibility must be documented. The challenge is to determine which application to start examining first.

Top-Down

The figure shows the divide-and-conquer approach to troubleshooting a networking problem.

The network administrator selects a layer and tests in both directions from that layer.

In divide-and-conquer troubleshooting, you start by collecting user experiences of the problem, document the symptoms and then, using that information, make an informed guess as to which OSI layer to start your investigation. When a layer is verified to be functioning properly, it can be assumed that the layers below it are functioning. The administrator can work up the OSI layers. If an OSI layer is not functioning properly, the administrator can work down the OSI layer model.

For example, if users cannot access the web server, but they can ping the server, then the problem is above Layer 3. If pinging the server is unsuccessful, then the problem is likely at a lower OSI layer.

Divide-and-Conquer
Divide-and-Conquer

This is one of the most basic troubleshooting techniques. The approach first discovers the actual traffic path all the way from source to destination. The scope of troubleshooting is reduced to just the links and devices that are in the forwarding path. The objective is to eliminate the links and devices that are irrelevant to the troubleshooting task at hand. This approach usually complements one of the other approaches.

This approach is also called swap-the-component because you physically swap the problematic device with a known, working one. If the problem is fixed, then the problem is with the removed device. If the problem remains, then the cause may be elsewhere.

In specific situations, this can be an ideal method for quick problem resolution, such as with a critical single point of failure. For example, a border router goes down. It may be more beneficial to simply replace the device and restore service, rather than to troubleshoot the issue.

If the problem lies within multiple devices, it may not be possible to correctly isolate the problem.

This approach is also called the spot-the-differences approach and attempts to resolve the problem by changing the nonoperational elements to be consistent with the working ones. You compare configurations, software versions, hardware, or other device properties, links, or processes between working and nonworking situations and spot significant differences between them.

The weakness of this method is that it might lead to a working solution, without clearly revealing the root cause of the problem.

This approach is also called the shoot-from-the-hip troubleshooting approach. This is a less-structured troubleshooting method that uses an educated guess based on the symptoms of the problem. Success of this method varies based on your troubleshooting experience and ability. Seasoned technicians are more successful because they can rely on their extensive knowledge and experience to decisively isolate and solve network issues. With a less-experienced network administrator, this troubleshooting method may be more like random troubleshooting.

Guidelines for Selecting a Troubleshooting Method

To quickly resolve network problems, take the time to select the most effective network troubleshooting method.

The figure illustrates which method could be used when a certain type of problem is discovered.

Guidelines for Selecting a Troubleshooting Method
Guidelines for Selecting a Troubleshooting Method

For instance, software problems are often solved using a top-down approach while hardware-based problem are solved using the bottom-up approach. New problems may be solved by an experienced technician using the divide-and-conquer method. Otherwise, the bottom-up approach may be used.

Troubleshooting is a skill that is developed by doing it. Every network problem you identify and solve gets added to your skill set.

Glossary: If you have doubts about any special term, you can consult this computer network dictionary.

Ready to go! Keep visiting our networking course blog, give Like to our fanpage; and you will find more tools and concepts that will make you a networking professional.