Learning NAGIOS 3.0
上QQ阅读APP看书,第一时间看更新

Resource Monitoring

For servers or workstations to be responsive and to be kept from being overloaded, it is also worth monitoring system usage using various additonal measures. Nagios offers several plugins to monitor resource usage and to report if the limits set for these checks are exceeded.

System Load

The first thing that should always be monitored is the system load. This value reflects the number of processes and the amount of CPU capacity that they are utilizing. This means that if one process is using up to 50% of the CPU capacity, the value will be around 0.5; and if four processes try to utilize the maximum CPU capacity, the value will be around 4.0. The system load is measured in three values—the average loads in the last minute, last 5 minutes, and the last 15 minutes. The syntax of the command is as follows:

check_swap [-r] –w wload1,wload5,wload15 –c cload1,cload5,cload15

Values for the -w and -c options should be in the form of three values separated by commas. If any of the load averages exceeds the specified limits, a warning, or critical status will be returned, respectively. Here is a sample command definition that uses warning and critical load limits as arguments:

  define command
  {
    command_name  check_load
    command_line  $USER1$/check_load –w $ARG1$ -c $ARG2$
  }

Checking Processes

Nagios also offers a way to monitor the total number of processes. Nagios can be configured to monitor all processes, only running ones, those consuming CPU, those consuming memory, or a combination of these criteria. The syntax and options are as follows:

check_procs -w <range> -c <range> [-m metric] [-s state]
            [-p ppid] [-u user] [-r rss] [-z vsz] [-P %cpu]
            [-a argument-array] [-C command] [-t timeout] [-v]

Values for the -w and -c options can either take a single value, or take the form of <min>:<max>. In the first case, a warning or critical state is returned if the value (number of processes by default) exceeds the specified number. In the second case, the appropriate status is returned if the value is lower than <min> or higher than <max>. Sample commands to monitor the total number of processes and to monitor the number of specific processes are as follows. The second code, for example, can be used to check to see if the specific server is running, and has not created too many processes. In this case, warning or critical values should be specified ranging from 1.

  define command
  {
    command_name  check_procs_num
    command_line  $USER1$/check_procs –m PROCS –w $ARG1$ -c $ARG2$
  }
  define command
  {
    command_name  check_procs_cmd
    command_line  $USER1$/check_procs –C $ARG1$ –w $ARG1$ -c $ARG2$ 
  }

Monitoring Logged-in Users

It is also possible to use Nagios to monitor the number of users currently logged in to a particular machine. The syntax is very simple and there are the no options, except for warning and critical limits.

check_users -w limit -c limit

A command definition that uses warning or critical limits specified in the arguments is as follows:

  define command
  {
    command_name  check_users
    command_line  $USER1$/check_users –w $ARG1$ -c $ARG2$
  }