User Guide

Architecture

This section introduces you to the various possibilities to install NetEye: single node, which is the most basic setup, and Cluster.

Within a cluster, which provides redundancy and dedicated communication and management channels among nodes, dedicated Elasticsearch nodes and Voting only nodes can be configured. See Section Cluster Nodes for more information. Additionally, Satellite nodes can complete a Cluster setup.

Single Node

NetEye can run in a Single Node Architecture, that is, as a self-contained server. This setup is ideal for small environments and infrastructure, where limited resources are necessary, and requires only to install it, carry out the initial configuration, and then start working on it: define services, hosts, and so on and so forth.

On NetEye Single Node installations, NetEye services are managed by systemd, see next section.

However, when dealing with large infrastructures, in which hundreds of hosts and services are present, and many more of NetEye’s functionalities are required, a clustered NetEye installation will prove more effective.

NetEye Systemd Targets

A single systemd target is responsible for managing start and stop operations of services which belong to NetEye. Systemd Targets are special systemd units, which provide no service by themselves, but are needed to both group sub-services and serve as reference point for other systemd services and systemd targets.

In NetEye all systemd services that depend on one of the several neteye-[...].target systemd units, are connected in such a way, that whenever the systemd target is started or stopped, each dependent systemd service will also be started or stopped.

Note

Even though there exist more than one neteye-[...].target, the NetEye autosetup scripts take care of enabling only one of them based on the contents of /etc/neteye-cluster.

If you want to discover which NetEye Systemd Target is currently active, you can use the following code snippet:

# systemctl list-units "neteye*.target"
UNIT                        LOAD   ACTIVE SUB    DESCRIPTION
neteye-cluster-local.target loaded active active NetEye Cluster Local Services Target

For single node installations, this systemd target is called neteye.target. You can verify which services are bound to the target by either using systemctl list-dependencies command, even if the systemd target is currently not enabled, like so:

# systemctl list-dependencies neteye.target
neteye.target
● ├─elasticsearch.service
● ├─eventhandlerd.service
● ├─grafana-server.service
● ├─httpd.service
● ├─icinga2-master.service
[...]

The commands /usr/sbin/neteye start, /usr/bin/neteye stop and /usr/bin/neteye status are wrapper scripts around this systemd functionality to expose an ergonomic interface.

Cluster

The clustering service of NetEye 4 is based on the RedHat 7 High Availability Clustering technologies, including Corosync, Pacemaker, and DRBD, used to set up an HA cluster composed of a combination of operating nodes, elastic-only nodes, and voting-only nodes. NetEye cluster is a failover cluster at service level, meaning that it provides redundancy to avoid any downtime or service disruption whenever one node in the cluster goes offline. In such a case, indeed, services are moved to another node if necessary.

Reasons for a node to be offline include–but are not limited to:

  • A networking issue (failure of a network interface or in the connectivity) which prevents a node to communicate with the other nodes

  • A hardware or software issue which freezes or blocks a node

  • A problem with the synchronisation of the data

All the cluster services run on a dedicated network called Corporate Network: every cluster node has therefore two IP addresses: A public one, accessible by the running service (including e.g., SSH), and a private one, used by Corosync, Pacemaker, DRBD, and Elastic-only nodes.

Cluster resources are typically quartets consisting of a floating IP in the Corporate Network, a DRBD device, a filesystem, and a (systemd) service. Fig. 1 shows the general case for High Availability, where cluster services are distributed across nodes, while other services (e.g., Icinga 2, Elasticsearch) handle their own clustering requirements. The remainder of this section details the architecture and implementation of a NetEye cluster.

NetEye cluster architecture

Fig. 1 The NetEye cluster architecture.

If you have not yet installed clustering services, please turn to the Cluster Installation page for setup instructions.

Type of Nodes

Within a NetEye cluster, different types of nodes can be setup:

Operative node

On an operative node runs any services offered by NetEye, like e.g., Tornado, Icinga 2, slmd, and so on. They can be seen as single nodes, connected by the clustering technologies mentioned above.

Elastic-only node

Elastic-only nodes host only the DB component of the Elastic Stack, while FileBeat, Kibana, and other Elastic Stack components are still clusterised resources and run on operative nodes. Elastic-only nodes are used for either data storage or to add to the cluster more resources and processing abilities of elasticsearch data. In the latter case, the following are typical use cases:

  • Process log data in some way, for example with Machine Learning tools

  • Implement an hot-warm-cold architecture

  • Increase data retention, redundancy, or storage to archive old data

Note

An operative node may also run services of the Elastic Stack, including its DB component. In other words, it is not necessary to have a dedicated node for Elastic services.

Voting-only node

Nodes of this type are a kind of silent nodes: They do not run any service and therefore require limited computational resources compared to the other nodes. They are needed only in case of a node failure to establish the quorum and avoid cluster disruption.

See also

Voting-only nodes and their use are described with great details in a NetEye blog post: https://www.neteye-blog.com/2020/03/neteye-voting-only-node/

Clustering and Special Nodes

The following services use their own native clustering capabilities rather than Red Hat HA Clustering. NetEye will also take advantage of their inbuilt load balancing capabilities.

Icinga 2 Cluster

An Icinga 2 cluster is composed by one master instance holding configuration files and by a variable number of satellites and agents.

See also

Icinga 2 clusters are described in great detail in the official Icinga documentation

Elasticsearch

Each cluster node runs a local master-eligible Elasticsearch service, connected to all other nodes. Elasticsearch itself chooses which nodes can form a quorum (note that all NetEye cluster nodes are master eligible by default), and so manual quorum setup is no longer required.

See also

Elastic clusters and Elastic-only nodes are described with more details in the General Elasticsearch Cluster Information section.

Clustering Services

The combination of the following software is at the core of the NetEye’s clustering functionalities:

  • Corosync: Provides group communication between a set of nodes, application restart upon failure, and a quorum system.

  • Pacemaker: Provides cluster management, lock management, and fencing.

  • DRBD: Provides data redundancy by mirroring devices (hard drives, partitions, logical volumes, etc.) between hosts in real time.

“Local” NetEye services running simultaneously on each NetEye node ( i.e. not managed by Pacemaker and Corosync ), are managed by a dedicated systemd target unit called neteye-cluster-local.target. This reduced set of local services is managed exactly alike the Single Node neteye target:

# systemctl list-dependencies neteye-cluster-local.target
neteye-cluster-local.target
● └─drbd.service
● └─elasticsearch.service
● └─icinga2.service
[...]

Cluster Management

There are several CLI commands to be used in the management and troubleshooting of clusters, most notably drbdmon, drbdadm, and pcs.

The first one, drbdmon is used to monitor the status of DRBD, i.e., to verify if the nodes of a cluster communicate flawlessly or if there is some ongoing issue, like e.g., a node or network failure, or a split brain.

The second command, drbdadm allows to carry out administrative tasks on DRBD.

Finally, the pcs command is used to manage resources on a pcs cluster only; its main purpose is to move services between the cluster nodes when required.

In particular, pcs status retrieves the current status of the nodes and services, while pcs node standby and pcs node unstandby put a node offline and back online, respectively.

Secure Intracluster Communication

Security between the nodes in a cluster is just as important as front-facing security. Because nodes in a cluster must trust each other completely to provide failover services and be efficient, the lack of an intracluster security mechanism means one compromised cluster node can read and modify data throughout the cluster.

NetEye uses certificates signed by a Certificate Authority to ensure that only trusted nodes can join the cluster, to encrypt data passing between nodes so that externals cannot tamper with your data, and allows for certificate revocation for the certificates of each component in each module.

Two examples of cluster-based modules are:

  • DRBD, which replicates block devices over the network

  • The ELK stack, which the NetEye 4 Log Management is based on.

Modules that Use Intracluster Security

The Log Manager modules use secure communication:

Module

Enforcement Mechanism

Component

Log Manager

X-Pack Security

Elasticsearch

Logstash

Kibana

Satellite

Master-Satellite Architecture

A Satellite is a NetEye instance which depends on a main NetEye installation, the Master, and is responsible for different tasks, including but not limited to,

  • execute Icinga 2 checks and forward results to the Master

  • collect logs and forward them to the Master

  • forward data through NATS

  • collect data through Tornado Collectors and forward them to the Master to be processed by Tornado

NetEye implements secure communication between Satellites and Master; each Satellite is responsible to handle a set of hosts. On hosts can be also installed different agents, software responsible to perform different tasks on the host itself and are connected to the Satellite.

See also

Icinga 2 Agents are presented in section Agent Nodes

A Satellite proves useful in two scenarios: to offload the Master and to implement multi tenancy.

As an example of the first scenario, consider an infrastructure that needs to monitor a large number of servers and devices, possibly located in multiple remote locations.

NetEye Satellites allow to reduce the load on Master and also the number of requests between Master and hosts. Indeed, all the checks are scheduled and executed by the Satellite and only their results are forwarded to the Master.

The second scenario sees NetEye Satellites operate in an isolated environment by implementing multi tenancy. For each tenant, multiple satellites can be specified that are responsible for monitoring and collecting logs. The Master receives data only via Satellites and identifies each tenant by means of the certificate installed on each Satellite.

Starting with NetEye 4.19 Satellites are officially supported by NetEye and many manual steps previously required are now available as neteye satellites commands.

See also

Please refer to Prerequisites to configure a Satellite; Update and upgrade procedures are explained in NetEye Satellites, Satellite Upgrade and Satellite Upgrade, respectively.

Satellites communicate with other nodes using the NATS Server, the default message broker in NetEye. If you want to learn more about NATS you can refer to the official NATS documentation

Satellite Services

The services that need to run on the NetEye Satellite Nodes are managed by a dedicated systemd target named neteye-satellite.target, which takes care of starting and stopping the services of the Satellite when needed.

Various services are configured and activated out of the box on NetEye Satellites: among others, the Tornado Collectors and those provided by Icinga 2.

For the complete list of the services enabled on a NetEye Satellite, on the Satellite you can execute:

systemctl list-dependencies neteye-satellite.target

To further understand how systemd targets are used in NetEye and how they can be inspected, you can have a look at NetEye Systemd Targets.

Data Gathering in NetEye Satellites

NetEye Satellites are used to collect data and send them to the Master, where they are stored and processed; for example they can later be used to set up dashboards.

Communication between NetEye Satellites and Master is encrypted using NATS, which operates excellently also in multi-tenancy environments. The remainder of this section details how NATS is deployed on NetEye.

Multi Tenancy and NATS Leaf

One interesting functionality provided by NATS Server is the support for a secure, TLS-based, multi tenancy, that can be secured using multiple accounts. According to the Multi Tenancy using Accounts documentation, it is thus possible to create self-contained, isolated communications from multiple clients to a single server, that will then process independently all data streams. This ability can be exploited on NetEye clusters from 4.12 onwards, in which the single server is the NetEye master and the clients are the NetEye satellites.

The architecture is depicted in image Fig. 2. Here, we see similar configurations on the NetEye master (bottom) and on the satellites (top). On the master, there are Telegraf consumers that process data coming from clients to the NATS server. On each satellite, a Telegraf instance sends data to the local NATS server. Here, data can be processed immediately, but the can also be forwarded to the Master’s NATS server, thanks to a NATS leaf node, configured to add authentication and a security layer to the data to prevent any third-party interception.

../_images/satellite-architecture.jpg

Fig. 2 Architecture of NetEye with two satellites.

On the Master, one Telegraf local consumer instance for each Tenant is spawned: the service is called telegraf-local@neteye_consumer_influxdb_<tenant_name> and will consume only contents from subject <tenant_name>.telegraf.metrics. If you are in a cluster environment, an instance of Telegraf local consumer is started on each node of the cluster, to exploit the NATS built-in load balancing feature called distributed queue. For more information about this feature, see the official NATS documentation <https://docs.nats.io/nats-concepts/queue> Data are stored in InfluxDB: data from each Satellite are written in a specific database, that belongs to the Tenant, called <tenant_name> in order to allow data isolation in a multi-tenant environment.

To learn more about Telegraf configuration please check Telegraf Configuration section

Multi Tenancy configuration explained

The procedure to configure a NetEye Satellite automatically configures NATS Accounts on the Master and NATS Leaf Node on the Satellites. In this section we will give an insight into the most relevant configurations performed by the procedure.

The automatic procedure configures the following:

  1. NATS Server

    1. On the NATS Server of the NetEye Master, for each NetEye Tenant a dedicated Account is created. For each satellite a user is created and associated to its Tenant account. This is done with the purpose to isolate the traffic of each Tenant. This way, the NATS subscribers on the NetEye Master will receive the messages coming from the Satellites and from the Master itself. NATS Subscribers on a NetEye Satellite will not be able to access the messages coming from the other NetEye Tenants.

    2. The stream subjects coming from the NetEye Satellites are prefixed with the Tenant unique identifier defined during the NetEye Satellite configuration. This is done in order to let subscribers securely pinpoint the origin of the messages, by solely relying on the NATS subject. So, for example, if the NATS Leaf Node of NetEye Satellite acmesatellite belonging to the tenantA publishes a message on subject mysubject, NATS subscribers on the NetEye Master will need to subscribe to the subject tenantA.mysubject in order to receive the message.

  2. NATS Satellite:

    1. A server certificate for the Satellite NATS Leaf Node is generated with the Root CA of the NetEye Satellite. This must be trusted by the clients that need to connect to the NetEye Satellite NATS Leaf Node.

    2. A client certificate is generated with the Root CA of the NetEye Master. This is used by the NATS Leaf Nodes to authenticate to the NetEye Master NATS Server.

    3. The NATS Leaf Node is configured to talk to the NATS Server of the NetEye Master, using the FQDN defined during the NetEye Satellite configuration and the port 7422.