
Resource Monitoring
For servers or workstations to be responsive and to be kept from being overloaded, it is also worth monitoring system usage using various additonal measures. Nagios offers several plugins to monitor resource usage and to report if the limits set for these checks are exceeded.
System Load
The first thing that should always be monitored is the system load. This value reflects the number of processes and the amount of CPU capacity that they are utilizing. This means that if one process is using up to 50% of the CPU capacity, the value will be around 0.5; and if four processes try to utilize the maximum CPU capacity, the value will be around 4.0. The system load is measured in three values—the average loads in the last minute, last 5 minutes, and the last 15 minutes. The syntax of the command is as follows:
check_swap [-r] –w wload1,wload5,wload15 –c cload1,cload5,cload15

Values for the -w
and -c
options should be in the form of three values separated by commas. If any of the load averages exceeds the specified limits, a warning, or critical status will be returned, respectively. Here is a sample command definition that uses warning and critical load limits as arguments:
define command { command_name check_load command_line $USER1$/check_load –w $ARG1$ -c $ARG2$ }
Checking Processes
Nagios also offers a way to monitor the total number of processes. Nagios can be configured to monitor all processes, only running ones, those consuming CPU, those consuming memory, or a combination of these criteria. The syntax and options are as follows:
check_procs -w <range> -c <range> [-m metric] [-s state] [-p ppid] [-u user] [-r rss] [-z vsz] [-P %cpu] [-a argument-array] [-C command] [-t timeout] [-v]

Values for the -w
and -c
options can either take a single value, or take the form of <min>:<max>
. In the first case, a warning
or critical
state is returned if the value (number of processes by default) exceeds the specified number. In the second case, the appropriate status is returned if the value is lower than <min>
or higher than <max>
. Sample commands to monitor the total number of processes and to monitor the number of specific processes are as follows. The second code, for example, can be used to check to see if the specific server is running, and has not created too many processes. In this case, warning or critical values should be specified ranging from 1.
define command { command_name check_procs_num command_line $USER1$/check_procs –m PROCS –w $ARG1$ -c $ARG2$ }
define command { command_name check_procs_cmd command_line $USER1$/check_procs –C $ARG1$ –w $ARG1$ -c $ARG2$ }
Monitoring Logged-in Users
It is also possible to use Nagios to monitor the number of users currently logged in to a particular machine. The syntax is very simple and there are the no options, except for warning
and critical
limits.
check_users -w limit -c limit
A command definition that uses warning
or critical
limits specified in the arguments is as follows:
define command { command_name check_users command_line $USER1$/check_users –w $ARG1$ -c $ARG2$ }