Cluster Nodes¶

NetEye 4’s clustering service is based on the RedHat 8 High Availability Clustering technologies:

Corosync: Provides group communication between a set of nodes, application restart upon failure, and a quorum system.
Pacemaker: Provides cluster management, lock management, and fencing.
DRBD: Provides data redundancy by mirroring devices (hard drives, partitions, logical volumes, etc.) between hosts in real time.

Cluster resources are typically quartets consisting of an internal floating IP, a DRBD device, a filesystem, and a (systemd) service.

Once you have installed clustering services according to the information on this page, please turn to the Cluster Architecture page for more information on configuration and how to update.

Prerequisites¶

A NetEye 4 cluster must consist of between 2 and 16 identical servers (“Nodes”) running RHEL 8; each node must satisfy the following requirements:

Networking:
- Bonding across NICs must be configured
- A dedicated cluster network interface, named exactly the same on each node
- One external static IP address which will serve as the external Cluster IP
- One IP Address for each cluster node (i.e., N addresses)
- One virtual (internal) subnet for internal floating service IPs (this subnet MUST NOT be reachable from any machine except cluster nodes, as it poses a security risk otherwise)
- All nodes must know the internal IPs (Virtual IP) of all other nodes, which must be stored in file /etc/hosts
- All nodes must be reachable over the internal network
- The Corporate Network’s NIC must be in firewall zone public, while the Heartbeat Network’s NIC must be in firewall zone trusted
Storage:
- At least one volume group with enough free storage to host all service DRBD devices defined in Services.conf
In general, each node in a NetEye Cluster…
- must have SSH keys generated for the root user
- must store the SSH keys of all nodes in file /root/.ssh/authorized_keys
- needs Internet connectivity, including the ability to reach repositories of Würth Phoenix and Red Hat
- must have the dnf group neteye installed
- must have the tags set with the command neteye node tags set. To know more about this command please refer to neteye node tags set
- must be subscribed with a valid Red Hat Enterprise Linux license. This can be done with the command neteye node register. To know more about this command please refer to neteye node register
- must have the latest operating system and NetEye 4 updates installed
- if a virtual Cluster Node, its RAM memory must be completely reserved
- requirements for characters that can be used in the hostnames are the same for Single and Satellite Nodes and can be checked in the installation procedure

Installation Procedure¶

The first step of a NetEye Cluster installation is to install the NetEye ISO image, after which you need to follow, for each Node, installation’s Part 1: Single Nodes and Satellite Nodes. Then, make sure to copy the SSH key of each node on all the other Node’s /root/.ssh/authorized_keys file. To accomplish this goal, you can use on each the command

cluster# ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.27.0.3

Repeat the command for each Node, replacing 172.27.0.3 with the IP address of each of the other Nodes.

Once done, depending on the type of nodes you are installing in your cluster, select either of the following procedures: Cluster Services Configuration, NetEye Service Configuration, or Single Purpose Nodes

Once done, if your NetEye Cluster setup includes satellites, please make sure to carry out the steps in section Satellite Nodes Only after each Satellite Node’s installation.

Basic Cluster Installation¶

This task consists of two steps:

Copy the cluster configuration json template from /usr/share/neteye/cluster/templates/ClusterSetup.conf.tpl to /etc/neteye-cluster and edit it to match your intended setup. You will be required to fill the following fieds:

Key	Type	Description
ClusterInterface	str	The name of the internal cluster network interface
Hostname	str	Cluster’s FQDN that will resolve to ClusterIp
ClusterIp	str	Floating IP address reserved for the cluster
ClusterCIDR	int	Netmask in CIDR notation (8-32)
Nodes	list	List of `Operative node` (must be at least 2)
VotingOnlyNode	object	(Optional) Definition of the `Voting only node`
ElasticOnlyNodes	list	(Optional) List of `Elastic only nodes`

All the nodes specified in Nodes, VotingOnlyNode and ElasticOnlyNodes must have all of the following fileds:

Key	Type	Description
addr	str	The internal ip address of the node
hostname	str	Internal FQDN of the node
hostname_ext	str	External FQDN of the node
roles	list	List of roles assigned to the node. The complete list of the roles assignable to a node can be found in `/usr/share/neteye/cluster/config_validators/roles.d/`
id	int	An unique, progressive number (Note: ElasticOnlyNodes don’t require this field)

Note

take into account that the first node defined in the Nodes array, in the /etc/neteye-cluster, file will act as The NetEye Active Node during the update and upgrade procedures.

After setting up the cluster configuration in /etc/neteye-cluster, run the command neteye config cluster check to verify that the configuration is correct. This command will check that the configuration defined in the /etc/neteye-cluster file is correct and that all the roles have a valid configuration in terms of node distribution.
```
cluster# neteye config cluster check
```
Run the cluster setup command neteye cluster install to install a basic Corosync/Pacemaker cluster with a floating clusterIP. In case of any issue which prevents the correct script execution you can run the same command again adding the option --force to override. This will destroy existing cluster on the nodes.
```
cluster# neteye cluster install
```
Note

Any not recognised option given to the neteye cluster install command will be passed to the internal Ansible installation command.
At this point, all cluster nodes must be online, hence, as last step, verify that the Cluster installation was completed successfully by running the command:
```
cluster# pcs status | grep -A4
```
This command returns something like:
```
Node List:
* Online: [ my-neteye-01.example.com my-neteye-02.example.com ]
```
If the installation includes also a Voting-only Node, check that it is online by running:
```
cluster# pcs quorum status
```
The bottom part of the output is similar to the following snippet:
```
Membership information
----------------------
Nodeid Votes Qdevice Name
1   1  A,V,NMW my-neteye-01.example.com (local)
2   1  A,V,NMW my-neteye-02.example.com
0   1  Qdevice
```
The last line shows that the Voting-only Node is correctly online.

Cluster Fencing Configuration¶

This section describes the procedures you can use to configure, test, and manage the fence devices in a cluster. Fencing is useful when occour that a node is unresponsive and may still be accessing data. The only way to be certain that your data is safe is to fence the node using STONITH. STONITH is an acronym for “Shoot The Other Node In The Head” and it protects your data from being corrupted by rogue nodes or concurrent access. Using STONITH, you can be certain that a node is truly offline before allowing the data to be accessed from another node.

See also

For more complete general information on fencing and its importance in a Red Hat High Availability cluster, see Fencing in a Red Hat High Availability Cluster.

Initial Setup
- Fencing can be enabled upon setting an environment variable. However, it is recommended to keep fencing disabled until it is configured properly:
  pcs property set stonith-enabled=false pcs stonith cleanup
- Install ipmilan fence device on each node
  yum install fence-agents-ipmilan
- Test that IDRAC interface is reachable on port 623 on each node
  nmap -sU -p623 10.255.6.106
Note

Fencing on VMware Cluster In the case you’re a installing a virtual cluster please keep in mind that a fencing device must be different from IPMI. To install a fence device on VMware Cluster apply the following command:
dnf install fence-agents-vmware-rest fence-agents-vmware-soap
IDRAC Configuration
- Enable IPMI access to IDRAC: IDRAC Settings > Connectivity > Network > IPMI Settings
  Enable IPMI Over LAN: Enable
  
  Channel Privilege Level Limit: Administrator
  
  Encryption Key*: <mandatory random string, also 00000000>
- Create a new user with username and password of your choice, Read-only privileges on console but administrative privileges on IPMI. (IDRAC Settings > Users > Local Users > Add)
  User Role: Read Only
  
  Login to IDRAC: enable
- Advanced Settings
  LAN Privilege Level: Administrator
To test that the settings were properly applied to a news user you can check the status from NetEye machine
ipmitool -I lanplus -H <IDRAC IP> -U <your_IPMI_username> -P <your_IPMI_password> -y <your_encryption_key> -v chassis status
PCS Configuration
To obtain information about your fence device run:
pcs stonith list pcs stonith describe fence_idrac
Create a fence device

The following instructions will help you create a fence device.
pcs stonith create <fence_device_name> fence_idrac ipaddr="<ip or fqdn>" pcmk_delay_base="5" lanplus="1" login="IPMI_username" passwd="IPMI_password" method="onoff" pcmk_host_list="<host_to_be_fenced>"
Where:
- fence_device_name: device name of your choice (e.g. idrac_node1)
- fencing_agent: in this case fence_idrac, you can obtain this with pcs stonith list
- ipaddr: IDRAC IP or FQDN
- pcmk_delay_base: by default is 0, must differ on nodes by 5 seconds or more, based on how fast iDRAC can initiate a shutdown
- lanplus: set always at 1 otherwise it will not connect
- login: IPMI username (created before)
- passwd: IPMI password created before
- passwd_script: an alternative to password, if available you should use this instead of plain password
- method: usually you should ‘onoff’ if available otherwise restart is not guarantee (power off/power on)
- pcmk_host_list: list of host controlled by
Warning

In a 2-node cluster it may happen that both nodes are unable to communicate and both try to fence each other. This will cause a reboot of both nodes. To avoid this, set different pcmk_delay_base parameters for each fence device; this way one of the nodes will acquire more priority over the other.

It is strongly suggested to set this parameter for EVERY cluster regardless of the number of its nodes.

Note

If possible use a passwd_script instead of passwd, as anybody with access to PCS can see the IPMI password. A password script is a simple bash script which performs an echo of the password and is also helpful to avoid escaping problems e.g.

#!/bin/bash echo “my_secret_psw“

and only root user has read privileges on it. (FYI chmod 500)

You must put this script on all nodes e.g. in /usr/local/bin

Example:
pcs stonith create idrac_node1 fence_idrac ipaddr="idrac-neteye06.intra.tndigit.it" lanplus="1" login="neteye_fencing" passwd_script="/usr/local/bin/fencing_passwd.sh" method="onoff" pcmk_host_list="node1.neteyelocal"
If your fence device has been properly configured running pcs status you should see the fencing device in status Stopped otherwise check in /var/log/messages.

pcs stonith show <fence device> permit to view the current setup of device

Now you have to create a fence device for each node of your cluster (remember to increase the delay)
Note

If you need to update a fence device properties, use the update command, e.g.:
pcs stonith update <fence device> property=”value"

Only for ‘onoff’ method

edit the power key on /etc/systemd/logind.conf
HandlePowerKey=ignore
To do it programmatically:
sed -i 's/#HandlePowerKey=poweroff/HandlePowerKey=ignore/g' /etc/systemd/logind.conf

Increase totem token timeout
Increasing totem token timeout at least to 5 seconds will avoid unwanted fencing (default is 1s); on cluster with virtual nodes it should be set to 10. It is not recommended to set the timeout to more than 10 seconds.
pcs cluster config update totem token=10000
To check if the value has been updated:
corosync-cmapctl | grep totem.token
Warning

Stonith acts after totem token expiration, therefore it may take also 30-40 seconds to fence a node

See also

https://access.redhat.com/solutions/221263
Testing
To fence a device you can use the following command:
pcs stonith fence <node1.neteyelocal>
Warning

The host will now be taken to a shutdown mode. Fencing should be tested on a node in standby.
Enable fencing
To enable fencing set property to true
pcs property set stonith-enabled=true pcs stonith cleanup
Warning

If fencing fails cluster freezes and resources will not be relocated on a different node. Always disable fencing during updates/upgrades Disable fencing on virtual machines before shutting them down: it may happen that a fence device restarts a shutdown VM. A restart of a physical node may require several minutes so please be patient.

Cluster Services Configuration¶

PCS-managed Services¶

When installing a feature module, it is necessary to run all the related services on a NetEye Cluster. In order to do that, adjust all necessary options, including IPs, ports, DRBD devices, sizes, in the various *.conf.tpl files found in directory /usr/share/neteye/cluster/templates/.

In a typical configuration, e.g. the one below, you need to update only selected options:

{
   "volume_group": "vg00",
   "ip_pre": "192.168.1",
   "Services": [
       {
         "name": "my-service",
         "ip_post": "33",
         "drbd_minor": 12,
         "drbd_port": 7788,
         "folder": "/neteye/shared/my-service",
         "size": "1024",
         "service": "fancy-optional-name"
       }

     ]
 }

ip_pre, the prefix of the IP (e.g., 192.168.1 for 192.168.1.0/24), which will be used to generate the virtual IP for the resource
cidr_netmask, the CIDR of the internal subnet used by IP resources (e.g., 24 for 192.168.1.0/24).
size, the volume of the storage assigned to the service in MB. You can specify the volume, otherwise default values will be applied.

Run the cluster_service_setup.pl script on each *.conf.tpl file starting from Services-core.conf.tpl:

# cd /usr/share/neteye/scripts/cluster
# ./cluster_service_setup.pl -c Services-core.conf.tpl

The cluster_service_setup.pl script is designed to report the last command executed in case there were any errors. If you manually fix an error, you will need to remove the successfully configured resource template from Services.conf and re-run that command. Then you should re-execute the cluster_service_setup.pl script in order to finalize the configuration.

Non PCS-managed Services¶

For Cluster services that are not managed by PCS, it is necessary to manually configure the service in the cluster. Similarly to the PCS-managed services, you need to define all necessary options, including IPs, ports, Volume groups, sizes, in a .yaml file that will be used by the service configuration during the installation.

You can find the templates with the default options for all the available services configuration files in directory /etc/neteye-services.d/<module_name>, that you can copy to the same directory to compose the actual configuration file. The configuration file name should be the same as the template file name, removing the .tpl extension.

After all the service configuration files are in place, you can proceed to the next step of the installation procedure.

NetEye Service Configuration¶

Run the neteye install script only once on any cluster node. This script is designed to handle the configuration of all nodes specified in the cluster configuration file found at /etc/neteye-cluster.
```
cluster# neteye install
```
Set up the Director field API user on slave nodes (Director / Icinga Infrastructure / Endpoints)

Single Purpose Nodes¶

This section applies only if you have are going to setup a Single Purpose Node, i.e., an Elastic-only or a NetEye Voting-only node.

Both Elastic-only and Voting-only nodes have the same prerequisites and follow the same installation procedure as a standard NetEye Cluster Node.

After installation, a Single Purpose Node requires to be configured as Elastic-only or Voting-only: please refer to Section General Elasticsearch Cluster Information for guidelines.