User Guide

Architecture

This section introduces you to the various possibilities to install NetEye: single node, which is the most basic setup, and Cluster.

Within a cluster, which provides redundancy and dedicated communication and management channels among nodes, dedicated Elasticsearch nodes and Voting only nodes can be configured. See Section Cluster Nodes for more information. Additionally, Satellite nodes can complete a Cluster setup.

Single Node

NetEye can run in a Single Node Architecture, that is, as a self-contained server. This setup is ideal for small environments and infrastructure, where limited resources are necessary, and requires only to install it, carry out the initial configuration, and then start working on it: define services, hosts, and so on and so forth.

On NetEye Single Node installations, NetEye services are managed by systemd, see next section.

However, when dealing with large infrastructures, in which hundreds of hosts and services are present, and many more of NetEye’s functionalities are required, a clustered NetEye installation will prove more effective.

NetEye Systemd Targets

A single systemd target is responsible for managing start and stop operations of services which belong to NetEye. Systemd Targets are special systemd units, which provide no service by themselves, but are needed to both group sub-services and serve as reference point for other systemd services and systemd targets.

In NetEye all systemd services that depend on one of the several neteye-[...].target systemd units, are connected in such a way, that whenever the systemd target is started or stopped, each dependent systemd service will also be started or stopped.

Note

Even though there exist more than one neteye-[...].target, the NetEye autosetup scripts take care of enabling only one of them based on the contents of /etc/neteye-cluster.

If you want to discover which NetEye Systemd Target is currently active, you can use the following code snippet:

# systemctl list-units "neteye*.target"
UNIT                        LOAD   ACTIVE SUB    DESCRIPTION
neteye-cluster-local.target loaded active active NetEye Cluster Local Services Target

For single node installations, this systemd target is called neteye.target. You can verify which services are bound to the target by either using systemctl list-dependencies command, even if the systemd target is currently not enabled, like so:

# systemctl list-dependencies neteye.target
neteye.target
● ├─elasticsearch.service
● ├─eventhandlerd.service
● ├─grafana-server.service
● ├─httpd.service
● ├─icinga2-master.service
[...]

The commands /usr/sbin/neteye start, /usr/bin/neteye stop and /usr/bin/neteye status are wrapper scripts around this systemd functionality to expose an ergonomic interface.

Cluster

The clustering service of NetEye 4 is based on the RedHat 7 High Availability Clustering technologies, including Corosync, Pacemaker, and DRBD, used to set up an HA cluster composed of a combination of operating nodes, elastic-only nodes, and voting-only nodes. NetEye cluster is a failover cluster at service level, meaning that it provides redundancy to avoid any downtime or service disruption whenever one node in the cluster goes offline. In such a case, indeed, services are moved to another node if necessary.

Reasons for a node to be offline include–but are not limited to:

  • A networking issue (failure of a network interface or in the connectivity) which prevents a node to communicate with the other nodes

  • A hardware or software issue which freezes or blocks a node

  • A problem with the synchronisation of the data

All the cluster services run on a dedicated network called Corporate Network: every cluster node has therefore two IP addresses: A public one, accessible by the running service (including e.g., SSH), and a private one, used by Corosync, Pacemaker, DRBD, and Elastic-only nodes.

Cluster resources are typically quartets consisting of a floating IP in the Corporate Network, a DRBD device, a filesystem, and a (systemd) service. Fig. 1 shows the general case for High Availability, where cluster services are distributed across nodes, while other services (e.g., Icinga 2, Elasticsearch) handle their own clustering requirements. The remainder of this section details the architecture and implementation of a NetEye cluster.

NetEye cluster architecture

Fig. 1 The NetEye cluster architecture.

If you have not yet installed clustering services, please turn to the Cluster Installation page for setup instructions.

Type of Nodes

Within a NetEye cluster, different types of nodes can be setup:

Operative node

On an operative node runs any services offered by NetEye, like e.g., Tornado, Icinga 2, slmd, and so on. They can be seen as single nodes, connected by the clustering technologies mentioned above.

Elastic-only node

Elastic-only nodes host only the DB component of the Elastic Stack, while FileBeat, Kibana, and other Elastic Stack components are still clusterised resources and run on operative nodes. Elastic-only nodes are used for either data storage or to add to the cluster more resources and processing abilities of elasticsearch data. In the latter case, the following are typical use cases:

  • Process log data in some way, for example with Machine Learning tools

  • Implement an hot-warm-cold architecture

  • Increase data retention, redundancy, or storage to archive old data

Note

An operative node may also run services of the Elastic Stack, including its DB component. In other words, it is not necessary to have a dedicated node for Elastic services.

Voting-only node

Nodes of this type are a kind of silent nodes: They do not run any service and therefore require limited computational resources compared to the other nodes. They are needed only in case of a node failure to establish the quorum and avoid cluster disruption.

Cluster Failure and Voting-only Nodes

A cluster composed by N nodes requires that at least N/2 + 1 operative nodes be online to operate properly (the quorum). For example, a cluster composed by 3 nodes needs in theory at least 2.5 nodes to operate properly. This means that whenever one of the three nodes goes offline, the cluster does not work anymore, because the quorum can not be reached. To make sure the cluster remains operational in these cases, adding a voting-only node is the solution, because as soon as a node is offline, it will count as a regular node.

See also

Voting-only nodes and their use are described with great details in a NetEye blog post: https://www.neteye-blog.com/2020/03/neteye-voting-only-node/

Clustering and Special Nodes

The following services use their own native clustering capabilities rather than Red Hat HA Clustering. NetEye will also take advantage of their inbuilt load balancing capabilities.

Icinga 2 Cluster

An Icinga 2 cluster is composed by one master instance holding configuration files and by a variable number of satellites and agents.

See also

Icinga 2 clusters are described in great detail in the official Icinga documentation

Elasticsearch

Each cluster node runs a local master-eligible Elasticsearch service, connected to all other nodes. Elasticsearch itself chooses which nodes can form a quorum (note that all NetEye cluster nodes are master eligible by default), and so manual quorum setup is no longer required.

See also

Elastic clusters and Elastic-only nodes are described with more details in the General Elasticsearch Cluster Information section.

Clustering Services

The combination of the following software is at the core of the NetEye’s clustering functionalities:

  • Corosync: Provides group communication between a set of nodes, application restart upon failure, and a quorum system.

  • Pacemaker: Provides cluster management, lock management, and fencing.

  • DRBD: Provides data redundancy by mirroring devices (hard drives, partitions, logical volumes, etc.) between hosts in real time.

“Local” NetEye services running simultaneously on each NetEye node ( i.e. not managed by Pacemaker and Corosync ), are managed by a dedicated systemd target unit called neteye-cluster-local.target. This reduced set of local services is managed exactly alike the Single Node neteye target:

# systemctl list-dependencies neteye-cluster-local.target
neteye-cluster-local.target
● └─drbd.service
● └─elasticsearch.service
● └─icinga2.service
[...]

Cluster Management

There are several CLI commands to be used in the management and troubleshooting of clusters, most notably drbdmon, drbdadm, and pcs.

The first one, drbdmon is used to monitor the status of DRBD, i.e., to verify if the nodes of a cluster communicate flawlessly or if there is some ongoing issue, like e.g., a node or network failure, or a split brain.

The second command, drbdadm allows to carry out administrative tasks on DRBD.

Finally, the pcs command is used to manage resources on a pcs cluster only; its main purpose is to move services between the cluster nodes when required.

In particular, pcs status retrieves the current status of the nodes and services, while pcs node standby and pcs node unstandby put a node offline and back online, respectively.

Cluster Types and Node Visibility

While a NetEye cluster is unique, we can distinguish two types of clusters, that run in parallel therein: elasticsearch clusters and pcs clusters.

Elasticsearch clusters

They are composed by all Elastic-only nodes. They do not rely on the technologies described in Clustering Services, but have an own synchronisation mechanism. Each node has its own configuration and runs its own service.

pcs clusters

They rely on Corosync, Pacemaker, and DRBD to keep data and configuration synchronisation. They are composed by operating nodes and voting-only nodes, while Elasticsearch nodes are completely invisible to pcs clusters. Voting-only nodes show up in the cluster status only by using the command pcs quorum status.

Secure Intracluster Communication

Security between the nodes in a cluster is just as important as front-facing security. Because nodes in a cluster must trust each other completely to provide failover services and be efficient, the lack of an intracluster security mechanism means one compromised cluster node can read and modify data throughout the cluster.

NetEye uses certificates signed by a Certificate Authority to ensure that only trusted nodes can join the cluster, to encrypt data passing between nodes so that externals cannot tamper with your data, and allows for certificate revocation for the certificates of each component in each module.

Two examples of cluster-based modules are:

  • DRBD, which replicates block devices over the network

  • The ELK stack, which the NetEye 4 Log Management is based on.

Modules that Use Intracluster Security

The Log Manager modules use secure communication:

Module

Enforcement Mechanism

Component

Log Manager

X-Pack Security

Elasticsearch

Logstash

Kibana

Satellite

Master-Satellite Architecture

A Satellite is a NetEye instance which depends on a main NetEye installation, the Master, and is responsible for different tasks such as, but not limited to,

  • execute Icinga 2 checks and forward results to the Master

  • collect logs and forward them to the Master

  • forward data through NATS

NetEye allows to implement secure communication between Satellites and Master; each Satellite is responsible to handle a set of hosts. On hosts can be also installed different agents which are software responsible to perform different tasks on the host itself and are connected to the Satellite. Icinga 2 Agents are presented in section Agent Nodes

A Satellite is useful for two purposes: offload the Master and implementing multi tenancy.

When monitoring large numbers of servers and devices, especially in multiple remote locations, Satellites allow to reduce the load on Master and also the number of requests between Master and hosts. Indeed, Icinga 2 checks are scheduled and executed by the Satellite and only results are forwarded to the Master.

Satellites can be used to implement multi tenancy, providing an isolated environment. Each tenant has a specific Satellite responsible for monitoring and collecting logs. The Master receives data only via Satellites and can identify each tenant through the certificate installed on each Satellite.

Starting with NetEye 4.19 Satellites are officially supported by NetEye and many manual steps previously required are now available as neteye satellites commands.

See also

Please refer to Prerequisites to configure a Satellite; Update and upgrade procedures are explained in Update NetEye Satellites, Upgrade NetEye Satellites and Upgrade NetEye Satellites, respectively.

Satellites communicate with other nodes using the NATS Server, the default message broker in NetEye. If you want to learn more about NATS you can refer to the official NATS documentation

Exploiting NATS functionalities:

Multi Tenancy and NATS Leaf

One interesting functionality provided by NATS Server is the support for a secure, TLS-based, multi tenancy, that can be secured using multiple accounts. According to the Multi Tenancy using Accounts documentation, it is thus possible to create self-contained, isolated communications from multiple clients to a single server, that will then process independently all data streams. This ability can be exploited on NetEye clusters from 4.12 onwards, in which the single server is the NetEye master and the clients are the NetEye satellites.

The architecture is depicted in the image below. Here, we see similar configurations on the NetEye master (left) and on the satellite (right, only one depicted but multiple can be used). On the master, there are Telegraf consumers that process data coming from clients to the NATS server. On each satellite, a Telegraf instance sends data to the local NATS server. Here, data can be processed immediately, but the can also be forwarded to the Master’s NATS server, thanks to a NATS leaf node, configured to add authentication and a security layer to the data to prevent any third-party interception.

../_images/nats-mt.png

Fig. 2 Architecture of NATS Server with two satellites.

On the Master, one Telegraf local consumer instance for each Satellite is spawned: the service is called telegraf-local@neteye_consumer_influxdb_<satellite_name> and will consume only contents from subject <satellite_name>.telegraf.metrics. If you are in a cluster environment, an instance of Telegraf local consumer is started on each node of the cluster, to exploit the NATS built-in load balancing feature called distributed queue. For more information about this feature, see the official NATS documentation <https://docs.nats.io/nats-concepts/queue> Data are stored in InfluxDB: data from each Satellite are written in a specific database called <satellite_name> in order to allow data isolation in a multi-tenant environment.

To learn more about Telegraf configuration please check Telegraf Configuration section

Multi Tenancy configuration explained

The procedure to configure a NetEye Satellite automatically configures NATS Accounts on the Master and NATS Leaf Node on the Satellites. In this section we will give an insight into the most relevant configurations performed by the procedure.

The automatic procedure configures the following:

  1. NATS Server

    1. On the NATS Server of the NetEye Master, for each NetEye Satellite a dedicated Account is created. This is done with the purpose to isolate the traffic of each Satellite. This way, the NATS subscribers on the NetEye Master will receive the messages coming from the Satellites and from the Master itself. NATS Subscribers on a NetEye Satellite will not be able to access the messages coming from the other NetEye Satellites.

    2. The stream subjects coming from the NetEye Satellites are prefixed with the Satellite unique identifier defined during the NetEye Satellite configuration. This is done in order to let subscribers securely pinpoint the origin of the messages, by solely relying on the NATS subject. So, for example, if the NATS Leaf Node of NetEye Satellite acmesatellite publishes a message on subject mysubject, NATS subscribers on the NetEye Master will need to subscribe to the subject acmesatellite.mysubject in order to receive the message.

  2. NATS Satellite:

    1. A server certificate for the Satellite NATS Server is generated with the Root CA of the NetEye Satellite. This must be trusted by the clients that need to connect to the NetEye Satellite NATS Leaf Node.

    2. A client certificate is generated with the Root CA of the NetEye Master. This is used by the NATS Leaf Nodes to authenticate to the NetEye Master NATS Server.

    3. The NATS Leaf Node is configured to talk to the NATS Server of the NetEye Master, using the FQDN defined during the NetEye Satellite configuration and the port 7422.