Zabbix Network Monitoring(Second Edition)
上QQ阅读APP看书,第一时间看更新

Monitoring quickstart

Now that we have a basic understanding of the frontend navigation, it's time to look at the basis for data gathering in Zabbix—items. In general, anything you want to gather data about will eventually go into an item.

Note

An item in Zabbix is a configuration entity that holds information on gathered metrics. It is the very basis of information flowing into Zabbix, and without items, nothing can be retrieved. An item does not hold any information on thresholds—that functionality is covered by triggers.

If items are so important in Zabbix, we should create some. After all, if no data retrieval is possible without items, we can't monitor anything without them. To get started with item configuration, open Configuration | Hosts. If it's not selected by default, choose Zabbix servers in the Group drop-down menu (in the top-right corner). This is a location we will visit quite a lot, as it provides easy access to other entity configurations, including Items and Triggers. Let's figure out what's what in this area. The most interesting functionality is the host list:

Primarily, it provides access to host details in the very first column, but that's not all. The usefulness of this screen comes from the other columns, which not only provide access to elements that are associated with hosts but also list the count of those elements. Further down the host entry, we can see a quick overview of the most important host configuration parameters as well as status information, which we will explore in more detail later:

We came here looking for items, so click on Items next to the Zabbix server. You should see a list similar to the one in the following screenshot:

Note the method we used to reach the items list for a particular host—we used convenience links for host elements, which is a fairly easy way to get there and the reason why we will use Configuration | Hosts often.

Back to what we were after, we can see a fairly long list of already existing items. But wait, didn't the Zabbix status screen that we saw in the first screenshot claim there's a single host and no items? That's clearly wrong! Return to Reports | Status of Zabbix (or Monitoring | Dashboard, which shows the same data). It indeed shows zero items. Now move the mouse cursor over the text that reads Number of items (enabled/disabled/not supported), and take a look at the tooltip:

Aha! so it counts only those items that are assigned to enabled hosts. As this example host, Zabbix server, is disabled, it's now clear why the Zabbix status report shows zero items. This is handy to remember later, once you try to evaluate a more complex configuration.

Creating a host

Instead of using this predefined host configuration, we want to understand how items work. But items can't exist in an empty space—each item has to be attached to a host.

Note

In Zabbix, a host is a logical entity that groups items. The definition of what a host is can be freely adapted to specific environments and situations. Zabbix in no way limits this choice; thus, a host can be a network switch, a physical server, a virtual machine, or a website.

If a host is required to attach items to, then we must create one. Head over to Configuration | Hosts and click on the Create host button, located in the top-right corner. You will be presented with a host creation screen. This time, we won't concern ourselves with the details, so let's input only some relevant information:

  • Name: Enter A test host.
  • Groups: Select Linux servers from the right-hand listbox, named Other groups; press the Creating a host button to add this group. Then, select Zabbix servers from the In groups listbox and press the Creating a host button to remove our new host from this predefined group.

Note

Why did we have to select a group for this host? All permissions are assigned to host groups, not individual hosts, and thus, a host must belong to at least one group. We will cover permissions in more detail in Chapter 5, Managing Hosts, Users and Permissions.

The fields that we changed for our host should look as follows:

When you are ready, click on the Add button at the bottom.

Creating an item

So, we have created our very own first host. But given that items are the basis of all the data, it's probably of little use right now. To give it more substance, we should create items, so select Linux servers from the Groups dropdown, and then click on Items next to the host we just created, A test host. This host has no items to list—click on the Create item button in the upper-right corner.

There's a form, vaguely resembling the one for host creation, so let's fill in some values:

  • Name: Enter CPU load into this field. This is how the item will be named—basically, the name that you will use to refer to the item in most places.
  • Key: The value in this field will be system.cpu.load. This is the "technical name" of the item that identifies what information it gathers.
  • Type of information: Choose Numeric (float). This defines which formatting and type the incoming data will have.

After filling in all the required information, you will be presented with the following screen:

We will look at the other defaults in more detail later, so click on the Add button at the bottom.

Note

More information on item keys is provided in Chapter 3, Monitoring with Zabbix Agents and Basic Protocols.

You should now see your new item in the list. But we are interested in the associated data, so navigate to Monitoring | Latest data. Notice the filter that takes up half the page? This time, we will want to use it right away.

Starting with Zabbix 2.4, the Latest data page does not show any data by default for performance reasons; thus, we have to set the filter first:

In the Filter, type test in the Hosts field. Our new host should appear. Click on it, then click on the Filter button. Below the filter, expand the - other - section if it's collapsed. You might have to wait for up to a minute to pass after saving the item, and then you should see that this newly created item has already gathered some data:

What should you do if you don't see any entries at all? This usually means that data has not been gathered, which can happen for a variety of reasons. If this is the case, check for these common causes:

  • Did you enter item configuration exactly as in the screenshot? Check the item key and type of information.
  • Are both the agent and the server running? You can check this by executing the following as root:
    # netstat -ntpl | grep zabbix
    
  • The output should list both the server and agent daemons running on the correct ports:
    tcp 0 0 0.0.0.0:10050 0.0.0.0:*LISTEN 23569/zabbix_agentd
    tcp 0 0 0.0.0.0:10051 0.0.0.0:*LISTEN 23539/zabbix_server
    

    If any one of them is missing, make sure to start it.

  • Can the server connect to the agent? You can verify this by executing the following from the Zabbix server:
    $ telnet localhost 10050
    

    If the connection fails, it could mean that either the agent is not running or some restrictive firewall setting is preventing the connection. In some cases, SELinux might prevent that connection, too.

    If the connection succeeds but is immediately closed, then the IP address that the agent receives the connection from does not match the one specified in zabbix_agentd.conf configuration file for the Server directive. On some distributions, this can be caused by IPv6 being used by default, so you should try to add another comma-delimited value to the same line for the IPv6 localhost representation to this directive, ::1.

The Zabbix server reads into cache all the information on items to monitor every minute by default. This means that configuration changes such as adding a new item might show an effect in the data collected after one minute. This interval can be tweaked in zabbix_server.conf by changing the CacheUpdateFrequency parameter.

Once data starts arriving, you might see no value in the Change column. This means you moved to this display quickly, and the item managed to gather a single value only; thus, there's no change yet. If that is the case, waiting a bit should result in the page automatically refreshing (look at the page title—remember the 30-second refresh we left untouched in the user profile?), and the Change column will be populated. So, we are now monitoring a single value: the UNIX system load. Data is automatically retrieved and stored in the database. If you are not familiar with the concept, it might be a good idea to read the overview at https://en.wikipedia.org/wiki/Load_(computing).

Introducing simple graphs

If you went away to read about system load, several minutes should have passed. Now is a good time to look at another feature in Zabbix—Graphs. Graphs are freely available for any monitored numeric item, without any additional configuration.

You should still be on the Latest data screen with the CPU load item visible, so click on the link named Graph. You'll get something like this:

While you will probably get less data, unless reading about system load took you more than an hour, your screen should look very similar overall. Let's explore some basic graph controls.

Tip

If you don't see any data even after several minutes have passed, try dragging the scrollbar above the graph to the rightmost position.

The Zoom controls in the upper-left corner allow you to quickly switch the displayed period. Clicking on any of the entries will make the graph show data for the chosen duration. At first, Zabbix is a bit confused about us having so little data; it thus shows all the available time periods here. As more data is gathered, only the relevant periods will be shown, and longer zoom periods will gradually become available.

Below these controls are options that seek through time periods; clicking on them will move the displayed period by the exact time period that was clicked on.

The scrollbar at the top allows you to make small changes to the displayed period: drag it to the left (and notice the period at the top of the graph changing) and then release, and the graph is updated to reflect the period changes. Notice the arrows at both ends of the scrollbar: they allow you to change the duration displayed. Drag these with your mouse just like the scrollbar. You can also click on the buttons at both ends for exact adjustments. Using these buttons moves the period back and forth by the time period that we currently have displayed.

The date entries in the upper-right corner show the start and end times for the currently displayed data, and they also provide calendar widgets that allow a wider range of arbitrary period settings. Clicking on one of these time periods will open a calendar, where you can enter the time and date and have either the start or end time set to this choice:

Try entering a time in the past for the starting (leftmost) calendar. This will move the displayed period without changing its length. This is great if we are interested in a time period of a specific length, but what if we want to look at a graph for the previous day, from 08:30 till 17:00? For this, the control (fixed) in the lower-right corner of the scrollbar will help. Click on it once—it changes to (dynamic). If you now use calendar widgets to enter the start or end time for the displayed period, only this edge of the period will be changed.

For example, if a 1-hour period from 10:00 to 11:00 is displayed, setting the first calendar to 09:00 while in (fixed) mode will display the period from 09:00 till 10:00. If the same is done while in (dynamic) mode, a two-hour period from 09:00 till 11:00 will be displayed. The end edge of the period is not moved in the second case.

Tip

Depending on the time at which you are looking at the graphs, some areas of the graph might have a gray background. This is the time outside of working hours, as defined in Zabbix. We will explore this in more detail later.

Clicking and dragging over the graph area will zoom in to the selected period once the mouse button is released. This is handy for a quick drilldown to some problematic or interesting period:

The yellow area denotes the time period we selected by clicking, holding down the mouse button, and dragging over the graph area. When we release the mouse button, the graph is zoomed to the selected period.

Note

The graph period can't be shorter than one minute in Zabbix. Attempting to set it to a smaller value will do nothing. Before version 3.0, the shortest possible time period was one hour.

Creating triggers

Now that we have an item successfully gathering data, we can look at it and verify whether it is reporting as expected (in our case, that the system is not overloaded). Sitting and staring at a single parameter would make for a very boring job. Doing that with thousands of parameters doesn't sound too entertaining, so we are going to create a trigger. In Zabbix, a trigger is an entry containing an expression to automatically recognize problems with monitored items.

Note

An item alone does nothing more than collect the data. To define thresholds and things that are considered a problem, we have to use triggers.

Navigate to Configuration | Hosts, click on Triggers next to A test host, and click on Create trigger.

Here, only two fields need to be filled in:

  • Name: Enter CPU load too high on A test host for last 3 minutes
  • Expression: Enter {A Test Host:system.cpu.load.avg(180)}>1

It is important to get the expression correct down to the last symbol. Once done, click on the Add button at the bottom. Don't worry about understanding the exact trigger syntax yet; we will get to that later.

Notice how our trigger expressions refer to the item key, not the name. Whenever you have to reference an item inside Zabbix, it will be done by the item key.

The trigger list should be now displayed, with a single trigger—the one we just created. Let's take a look at what we just added: open Monitoring | Triggers. You should see our freshly added trigger, hopefully already updated, with a green OK flashing in the STATUS column:

You might see PROBLEM in the STATUS field. This means exactly what the trigger name says—the CPU load too high on A test host for last 3 minutes.

Notice the big filter?

We can filter displayed triggers, but why is our OK trigger displayed even though the default filter says Recent problem? The thing is, by default, Zabbix shows triggers that have recently changed their state with the status indicator flashing. Such triggers show for 30 minutes, and then they obey normal filtering rules. Click on Filter to close the filter. We will explore this filter in more detail later.

You could take a break now and notice how, in 30 minutes, there are no triggers displayed. With the filter set to only show problems, this screen becomes quite useful for a quick overview of all issues concerning monitored hosts. While that sounds much better than staring at plain data, we would still want to get some more to-the-point notifications delivered.

Configuring e-mail parameters

The most common notification method is e-mail. Whenever something interesting happens in Zabbix, some action can be taken, and we will set it up so that an e-mail is sent to us. Before we decide when and what should be sent, we have to tell Zabbix how to send it.

To configure the parameters for e-mail sending, open Administration | Media types and click on Email in the NAME column. You'll get a simple form to fill in with appropriate values for your environment:

Change the SMTP server, SMTP helo, and SMTP email fields to use a valid e-mail server. The SMTP email address will be used as the From address, so make sure it's set to something your server will accept. If needed, configure the SMTP authentication, and then click on the Update button.

We have configured the server to send e-mail messages and set what the From address should be, but it still doesn't know the e-mail addresses that our defined users have, which is required to send alerts to them. To assign an e-mail address to a user, open Administration | Users. You should see only two users: Admin and Guest. Click on Admin in the ALIAS column and switch to the Media tab:

Click on the Add button:

The only thing you have to enter here is a valid e-mail address in the Send to textbox—preferably yours. Once you are done, click on Add and then Update in the user properties screen.

That finishes the very basic configuration required to send out notifications through e-mail for this user.

Creating an action

And now, it's time to tie all this together and tell Zabbix that we want to receive e-mail notifications when our test box is under heavy load.

Things that tell the Zabbix server to do something upon certain conditions are called actions. An action has three main components:

  • Main configuration: This allows us to set up general options, such as the e-mail subject and the message.
  • Action operations: These specify what exactly has to be done, including who to send the message to and what message to send.
  • Action conditions: These allow us to specify when this action is used and when operations are performed. Zabbix allows us to set many detailed conditions, including hosts, host groups, time, specific problems (triggers) and their severity, as well as others.

To configure actions, open Configuration | Actions and click on Create action. A form is presented that lets you configure preconditions and the action to take:

First, enter a NAME for our new action, such as Test action, and check the Recovery message checkbox. Next, we should define the operation to perform, so switch to the Operations tab. In the Operation tab, insert 3600 in Default operation step duration, as shown in the following screenshot:

In here, click on New in the Action operations block. This will open the Operation details block:

In the Send to Users section, click on the Add control. In the resulting popup, click on the Admin user. Now, locate the Add control for the Operation details block. This can be a bit confusing as the page has four controls or buttons called Add right now. The correct one is highlighted here:

Click on the highlighted Add control. Congratulations! You have just configured the simplest possible action, so click on the Add button in the Action block.