Nagios is a widespread open-source monitoring tool used for supervising infrastructure, such as servers, network devices etc. It is highly customizable and has a huge community of users worldwide. This guide will not go into depth about Nagios itself.
The basic steps to a Nagios installation are the following:
For each device which is to be monitored by Nagios, a host has to be added to the config. Hosts are the main objects within Nagios. Each host has a host-alive-check, which is a basic check done by Nagios to see if the device is physically reachable (usually a 'ping'). All additional checks are called services and need to be configured separately, and then assigned to hosts.
Nagios allows hosts and services to be grouped to hostgroups or servicegroups for a better overview. Nagios also allows hosts and services to have parent-hosts or parent-services. If one of the parents fails, any events triggered by underlying objects will be correlated (e.g. ignored).
There is a physical limit to the amount of services that one Nagios instance can monitor. Using certain plugins, it is possible to set up a distributed Nagios hierarchy. Certain servers within the structure are then resposible for actively checking the devices. These servers are called collectors.
The results they gather are then forwarded to a central server, which runs the web-interface. This server is called the monitor server. It is typically also responsible for sending out alarms. The monitor can also be configured to check for stale services, meaning that alarms will be trigerred, if a collector fails to send new information about a device for a specific amount of time (passive checking).