Cluster Nodes¶
NetEye 4’s clustering service is based on the RedHat 8 High Availability Clustering technologies:
Corosync: Provides group communication between a set of nodes, application restart upon failure, and a quorum system.
Pacemaker: Provides cluster management, lock management, and fencing.
DRBD: Provides data redundancy by mirroring devices (hard drives, partitions, logical volumes, etc.) between hosts in real time.
Cluster resources are typically quartets consisting of an internal floating IP, a DRBD device, a filesystem, and a (systemd) service.
Once you have installed clustering services according to the information on this page, please turn to the Cluster Architecture page for more information on configuration and how to update.
See also
For more information about RedHat Cluster, check the official RedHat’s documentation on High Availability Clusters.
Prerequisites¶
A NetEye 4 cluster must consist of between 2 and 16 identical servers (“Nodes”) running RHEL 8; each node must satisfy the following requirements:
Networking:
Bonding across NICs must be configured
A dedicated cluster network interface, named exactly the same on each node
One external static IP address which will serve as the external Cluster IP
One IP Address for each cluster node (i.e., N addresses)
One virtual (internal) subnet for internal floating service IPs (this subnet MUST NOT be reachable from any machine except cluster nodes, as it poses a security risk otherwise)
All nodes must know the internal IPs (Virtual IP) of all other nodes, which must be stored in file
/etc/hosts
All nodes must be reachable over the internal network
The Corporate Network’s NIC must be in firewall zone public, while the Heartbeat Network’s NIC must be in firewall zone trusted
Storage:
At least one volume group with enough free storage to host all service DRBD devices defined in Services.conf
In general, each node in a NetEye Cluster…
must have SSH keys generated for the
root
usermust store the SSH keys of all nodes in file
/root/.ssh/authorized_keys
needs Internet connectivity, including the ability to reach repositories of Würth Phoenix and Red Hat
must have the dnf group neteye installed
must have the tags set with the command neteye node tags set. To know more about this command please refer to neteye node tags set
must be subscribed with a valid Red Hat Enterprise Linux license. This can be done with the command neteye node register. To know more about this command please refer to neteye node register
must have the latest operating system and NetEye 4 updates installed
if a virtual Cluster Node, its RAM memory must be completely reserved
requirements for characters that can be used in the hostnames are the same for Single and Satellite Nodes and can be checked in the installation procedure
See also
Section Cluster Requirements and Best Practices contains more detailed requirements for NetEye cluster installation.
Installation Procedure¶
The first step of a NetEye Cluster installation is to install the NetEye
ISO image, after which you need to follow, for each Node,
installation’s Part 1: Single Nodes and Satellite Nodes. Then, make sure to copy the
SSH key of each node on all the other Node’s
/root/.ssh/authorized_keys
file. To accomplish this goal, you
can use on each the command
cluster# ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.27.0.3
Repeat the command for each Node, replacing 172.27.0.3 with the IP address of each of the other Nodes.
Once done, depending on the type of nodes you are installing in your cluster, select either of the following procedures: Cluster Services Configuration, NetEye Service Configuration, or Single Purpose Nodes
Once done, if your NetEye Cluster setup includes satellites, please make sure to carry out the steps in section Satellite Nodes Only after each Satellite Node’s installation.
Basic Cluster Installation¶
This task consists of two steps:
Copy the cluster configuration
json
template from/usr/share/neteye/cluster/templates/ClusterSetup.conf.tpl
to/etc/neteye-cluster
and edit it to match your intended setup. You will be required to fill the following fieds:Key
Type
Description
ClusterInterface
str
The name of the internal cluster network interface
Hostname
str
Cluster’s FQDN that will resolve to ClusterIp
ClusterIp
str
Floating IP address reserved for the cluster
ClusterCIDR
int
Netmask in CIDR notation (8-32)
Nodes
list
List of
Operative node
(must be at least 2)VotingOnlyNode
object
(Optional) Definition of the
Voting only node
ElasticOnlyNodes
list
(Optional) List of
Elastic only nodes
All the nodes specified in
Nodes
,VotingOnlyNode
andElasticOnlyNodes
must have all of the following fileds:Key
Type
Description
addr
str
The internal ip address of the node
hostname
str
Internal FQDN of the node
hostname_ext
str
External FQDN of the node
roles
list
List of roles assigned to the node. The complete list of the roles assignable to a node can be found in
/usr/share/neteye/cluster/config_validators/roles.d/
id
int
An unique, progressive number (Note: ElasticOnlyNodes don’t require this field)
Note
take into account that the first node defined in the
Nodes
array, in the/etc/neteye-cluster
, file will act as The NetEye Active Node during the update and upgrade procedures.After setting up the cluster configuration in
/etc/neteye-cluster
, run the command neteye config cluster check to verify that the configuration is correct. This command will check that the configuration defined in the/etc/neteye-cluster
file is correct and that all the roles have a valid configuration in terms of node distribution.cluster# neteye config cluster check
Run the cluster setup command neteye cluster install to install a basic Corosync/Pacemaker cluster with a floating clusterIP. In case of any issue which prevents the correct script execution you can run the same command again adding the option
--force
to override. This will destroy existing cluster on the nodes.cluster# neteye cluster install
Note
Any not recognised option given to the neteye cluster install command will be passed to the internal Ansible installation command.
At this point, all cluster nodes must be online, hence, as last step, verify that the Cluster installation was completed successfully by running the command:
cluster# pcs status | grep -A4
This command returns something like:
Node List: * Online: [ my-neteye-01.example.com my-neteye-02.example.com ]
If the installation includes also a Voting-only Node, check that it is online by running:
cluster# pcs quorum status
The bottom part of the output is similar to the following snippet:
Membership information ---------------------- Nodeid Votes Qdevice Name 1 1 A,V,NMW my-neteye-01.example.com (local) 2 1 A,V,NMW my-neteye-02.example.com 0 1 Qdevice
The last line shows that the Voting-only Node is correctly online.
Cluster Fencing Configuration¶
This section describes the procedures you can use to configure, test, and manage the fence devices in a cluster. Fencing is useful when occour that a node is unresponsive and may still be accessing data. The only way to be certain that your data is safe is to fence the node using STONITH. STONITH is an acronym for “Shoot The Other Node In The Head” and it protects your data from being corrupted by rogue nodes or concurrent access. Using STONITH, you can be certain that a node is truly offline before allowing the data to be accessed from another node.
See also
For more complete general information on fencing and its importance in a Red Hat High Availability cluster, see Fencing in a Red Hat High Availability Cluster.
Initial Setup
Fencing can be enabled upon setting an environment variable. However, it is recommended to keep fencing disabled until it is configured properly:
pcs property set stonith-enabled=false pcs stonith cleanup
Install ipmilan fence device on each node
yum install fence-agents-ipmilan
Test that IDRAC interface is reachable on port 623 on each node
nmap -sU -p623 10.255.6.106
Note
Fencing on VMware Cluster In the case you’re a installing a virtual cluster please keep in mind that a fencing device must be different from IPMI. To install a fence device on VMware Cluster apply the following command:
dnf install fence-agents-vmware-rest fence-agents-vmware-soap
IDRAC Configuration
Enable IPMI access to IDRAC: IDRAC Settings > Connectivity > Network > IPMI Settings
Enable IPMI Over LAN: Enable
Channel Privilege Level Limit: Administrator
Encryption Key*: <mandatory random string, also 00000000>
Create a new user with username and password of your choice, Read-only privileges on console but administrative privileges on IPMI. (IDRAC Settings > Users > Local Users > Add)
User Role: Read Only
Login to IDRAC: enable
Advanced Settings
LAN Privilege Level: Administrator
To test that the settings were properly applied to a news user you can check the status from NetEye machine
ipmitool -I lanplus -H <IDRAC IP> -U <your_IPMI_username> -P <your_IPMI_password> -y <your_encryption_key> -v chassis status
PCS Configuration
To obtain information about your fence device run:
pcs stonith list pcs stonith describe fence_idrac
Create a fence device
The following instructions will help you create a fence device.
pcs stonith create <fence_device_name> fence_idrac ipaddr="<ip or fqdn>" pcmk_delay_base="5" lanplus="1" login="IPMI_username" passwd="IPMI_password" method="onoff" pcmk_host_list="<host_to_be_fenced>"
Where:
fence_device_name: device name of your choice (e.g. idrac_node1)
fencing_agent: in this case fence_idrac, you can obtain this with pcs stonith list
ipaddr: IDRAC IP or FQDN
pcmk_delay_base: by default is 0, must differ on nodes by 5 seconds or more, based on how fast iDRAC can initiate a shutdown
lanplus: set always at 1 otherwise it will not connect
login: IPMI username (created before)
passwd: IPMI password created before
passwd_script: an alternative to password, if available you should use this instead of plain password
method: usually you should ‘onoff’ if available otherwise restart is not guarantee (power off/power on)
pcmk_host_list: list of host controlled by
Warning
In a 2-node cluster it may happen that both nodes are unable to communicate and both try to fence each other. This will cause a reboot of both nodes. To avoid this, set different pcmk_delay_base parameters for each fence device; this way one of the nodes will acquire more priority over the other.
It is strongly suggested to set this parameter for EVERY cluster regardless of the number of its nodes.
Note
If possible use a passwd_script instead of passwd, as anybody with access to PCS can see the IPMI password. A password script is a simple bash script which performs an echo of the password and is also helpful to avoid escaping problems e.g.
#!/bin/bash echo “my_secret_psw“
and only root user has read privileges on it. (FYI
chmod 500
)You must put this script on all nodes e.g. in
/usr/local/bin
Example:
pcs stonith create idrac_node1 fence_idrac ipaddr="idrac-neteye06.intra.tndigit.it" lanplus="1" login="neteye_fencing" passwd_script="/usr/local/bin/fencing_passwd.sh" method="onoff" pcmk_host_list="node1.neteyelocal"
If your fence device has been properly configured running
pcs status
you should see the fencing device in status Stopped otherwise check in /var/log/messages.pcs stonith show <fence device>
permit to view the current setup of deviceNow you have to create a fence device for each node of your cluster (remember to increase the delay)
Note
If you need to update a fence device properties, use the update command, e.g.:
pcs stonith update <fence device> property=”value"
Only for ‘onoff’ method
edit the power key on
/etc/systemd/logind.conf
HandlePowerKey=ignore
To do it programmatically:
sed -i 's/#HandlePowerKey=poweroff/HandlePowerKey=ignore/g' /etc/systemd/logind.conf
Increase totem token timeout
Increasing totem token timeout at least to 5 seconds will avoid unwanted fencing (default is 1s); on cluster with virtual nodes it should be set to 10. It is not recommended to set the timeout to more than 10 seconds.
pcs cluster config update totem token=10000
To check if the value has been updated:
corosync-cmapctl | grep totem.token
Warning
Stonith acts after totem token expiration, therefore it may take also 30-40 seconds to fence a node
Testing
To fence a device you can use the following command:
pcs stonith fence <node1.neteyelocal>
Warning
The host will now be taken to a shutdown mode. Fencing should be tested on a node in standby.
Enable fencing
To enable fencing set property to true
pcs property set stonith-enabled=true pcs stonith cleanup
Warning
If fencing fails cluster freezes and resources will not be relocated on a different node. Always disable fencing during updates/upgrades Disable fencing on virtual machines before shutting them down: it may happen that a fence device restarts a shutdown VM. A restart of a physical node may require several minutes so please be patient.
Cluster Services Configuration¶
PCS-managed Services¶
When installing a feature module, it is necessary to run all the related services on a NetEye Cluster.
In order to do that, adjust all necessary options, including IPs, ports, DRBD devices, sizes, in the
various *.conf.tpl
files found in directory
/usr/share/neteye/cluster/templates/
.
In a typical configuration, e.g. the one below, you need to update only selected options:
{
"volume_group": "vg00",
"ip_pre": "192.168.1",
"Services": [
{
"name": "my-service",
"ip_post": "33",
"drbd_minor": 12,
"drbd_port": 7788,
"folder": "/neteye/shared/my-service",
"size": "1024",
"service": "fancy-optional-name"
}
]
}
ip_pre, the prefix of the IP (e.g., 192.168.1 for 192.168.1.0/24), which will be used to generate the virtual IP for the resource
cidr_netmask, the CIDR of the internal subnet used by IP resources (e.g., 24 for 192.168.1.0/24).
size, the volume of the storage assigned to the service in MB. You can specify the volume, otherwise default values will be applied.
Run the cluster_service_setup.pl script on each
*.conf.tpl
file starting from
Services-core.conf.tpl
:
# cd /usr/share/neteye/scripts/cluster
# ./cluster_service_setup.pl -c Services-core.conf.tpl
The cluster_service_setup.pl script is designed to report
the last command executed in case there were any errors. If you
manually fix an error, you will need to remove the successfully
configured resource template from Services.conf
and re-run
that command. Then you should re-execute the
cluster_service_setup.pl script in
order to finalize the configuration.
Non PCS-managed Services¶
For Cluster services that are not managed by PCS, it is necessary to
manually configure the service in the cluster. Similarly to the
PCS-managed services, you need to define all necessary options,
including IPs, ports, Volume groups, sizes, in a .yaml
file
that will be used by the service configuration during the installation.
You can find the templates with the default options for all the available
services configuration files in directory /etc/neteye-services.d/<module_name>
,
that you can copy to the same directory to compose the actual configuration
file. The configuration file name should be the same as the template file
name, removing the .tpl
extension.
After all the service configuration files are in place, you can proceed to the next step of the installation procedure.
NetEye Service Configuration¶
Run the neteye install script only once on any cluster node. This script is designed to handle the configuration of all nodes specified in the cluster configuration file found at
/etc/neteye-cluster
.cluster# neteye install
Set up the Director field API user on slave nodes (
)
Single Purpose Nodes¶
This section applies only if you have are going to setup a Single Purpose Node, i.e., an Elastic-only or a NetEye Voting-only node.
Both Elastic-only and Voting-only nodes have the same prerequisites and follow the same installation procedure as a standard NetEye Cluster Node.
After installation, a Single Purpose Node requires to be configured as Elastic-only or Voting-only: please refer to Section General Elasticsearch Cluster Information for guidelines.