Home

<%@ Control %>

Network and System Monitoring

 

In this day and age the users of our computing environment are expecting seamless connectivity to their network, servers, and applications. If an incident occurs and a system begins having difficulties users expect swift turn around times to resolve the difficulties.

 

Unless an administrator has some insight into the network it is a time consuming process to resolve many of the technical glitches that can occur. Implementing a network monitoring solution offers a proactive approach to managing your IT Infrastructure.

 

What can I do with Network Monitoring?

 There are three main areas to monitoring:

 

Fault Management/Uptime

Network monitoring with an eye towards fault management and uptime provides a user view of how the network or system is performing. Administrators can identify lag time, pre-empt crashed services, identify appropriate network layer having the issue, and more. This enables the administrator to correct the situation with a minimum delay to the users. Network monitoring provides the insight necessary for administrators to troubleshoot their network, applications, and systems in an effective and timely manner.

 

Performance Management

In order to meet Service Level Agreement (SLA) expectations of three or five 9’s of uptime a proactive approach is necessary. Even if your SLA is not so stringent, Network monitoring allows for the collection of data to assure you are in compliance with any and all SLAs and is performing as expected. As with other areas of monitoring you can monitor your network, systems, or applications and identify trends that may lead to future difficulties or issues within your IT Systems.

 

Capacity Planning

Many administrators today are routinely asked whether or not their networks or servers can handle new applications, servers, or even an increased number of users. It is still not uncommon for a small organization to experience a growth spurt and suddenly have users with a very unhappy experience as the network slows down to a crawl and complain loudly and vicariously to one another. Collecting data over time allows the administrator to see how well a system or a network is running and plug-in new information to test capacity and load information of added users, new systems, or new programs. Capacity planning is the key to maintaining a healthy, scalable network.

 

What Data can I Gather?

When monitoring your network, the data gathered is only limited to the information transmitted in every single packet on every segment. Each network is different with different concerns and issues and any data gathered will need to make sense for your organization.

 

Some of the more useful data you can monitor include:

·        Segment utilization data is gathered to generate trends for capacity planning, base-lining, and performance information.

·        Latency measurements taken through echo tests provide insight into performance trends.

·        Error rates offer performance indicators. Base-lining the error rate of the network; correlated with utilization provides further indicators of physical layer network problems.

·        Protocol distribution generates trends for changing application mixes. Monitoring the usage of new applications and the effect of new applications on the network.

·        Identifying the top network talkers allows an administrator to identify network performance concerns, system performance, system and application configuration errors, the application load and service load on the network. Identifying the Top talkers may lead to the discovery of new or unauthorized systems and applications on the network.

·        You are also able to gather system specific performance data such as disk capacity, temperature, pagefile activity, and processor and memory utilization. There is a wealth of device and application specific data.

 

How does it work?

For a remote monitoring solution there is typically a central console that communicates with agent probes located on remote segments. These probes collect information of the traffic that it sees. Further if application or server specific information is desired, these probes may receive information from agents installed on the specific systems or they may contact the system and pull the information directly from the device.

 

There are many factors to consider when deciding on the most appropriate architectural solution, including security, bandwidth utilization and the impact of a server hosting a network monitoring agent vs. a probe contacting the server and removing the information on a timed schedule.

 

Regardless of how the data is collected, via push or pull, that data is transferred on the network through the Simple Network Management Protocol’s (SNMP) Remote Monitoring specification. (RMON)

 

SNMP is an information-oriented protocol and not command oriented like HTTP with it’s get requests. SNMP operations are implemented using variables that are maintained in managed devices such as routers, firewalls, services, and tape drives. Rather than issuing commands, a monitoring probe checks the status of a device by reading these variables and, if permitted, controls the operation of the device by writing new information into the  variables.

 

These variables are stored in what is called a Management Information Base (MIB). The MIBs identify the management data variables that are available in the device.


The Remote Network Monitoring (RMON) specification, which is a part of SNMP, identifies a series of groups such as alarms, events, statistics, buffer capture, and more. The MIB variables then exist within in these groups. The difficulty is that not all manufacturers adhere to the same set of MIBS. The Cisco MIB set on a router is not necessarily the same as the MIB set utilized on an HP Router. 

 

Alarms and events are the RMON construct of most concern in this discussion, since these groups are what allow the immediate communication of important information to our monitoring probe.

 

The administrator has full control over what conditions will cause an alarm to be “sounded” and how an event is generated. This includes specifying what variables or statistics to monitor, how often to check them, and what values will trigger an alarm. A log entry may also be recorded when an event occurs. If an event results in transmission of a trap message, the administrator will thus be notified and can decide how to respond, depending on the severity of the event.

 




For questions, comments or for more information you can contact us any time via email or phone (816-471-3553).