Cluster¶
Service Resource Management¶
To manage service resources, several scripts have been developed by the NetEye team and are provided with every NetEye installation. These scripts are wrappers of the PCS and DRBD APIs and their use is showcased in section Adding a Service Resource to a Cluster. Examples of commands that are useful for NetEye Cluster troubleshooting are introduced in section Cluster Management Commands.
Cluster Nodes Roles¶
In a NetEye cluster environment, some of the distributed NetEye services that run in multiple nodes can be configured to run only on specific nodes among the cluster. This functionality is useful to balance the load of the cluster nodes and be able to assign specific services to specific nodes depending on the needs of the customer.
To assign a specific role to a node or to modify the roles configuration of the cluster, it is necessary to edit
the role assignation in the cluster configuration file /etc/neteye-cluster
, adding or modifying the
“roles” section of a specific node:
{
"Hostname" : "my-neteye-cluster.example.com",
"Nodes" : [
{
"addr" : "192.168.1.1",
"hostname" : "my-neteye-01",
"hostname_ext" : "my-neteye-01.example.com",
"roles": [
"mariadb"
],
"id" : 1
},
]
}
The roles that can be assigned to a node can be found in
/usr/share/neteye/cluster/config_validators/roles.d/
.
After modifying the configuration file, it is necessary to sync the cluster configuration to all the nodes in the cluster. This can be done by executing the following command:
cluster# neteye config cluster sync
Finally, to apply the changes to the cluster services configuration, it is necessary to execute the install procedure focused on the service you want to apply the changes to:
cluster# neteye install --restrict-to-services <service_name>
Where the <service_name>
is the name of the service or a list of
services separated by commas, usually related to the roles assigned
to the nodes in the cluster that has been modified.
Please refer to neteye install for more information about the neteye install command.
Cluster Services Configuration¶
When dealing with services in a NetEye cluster, it is possible to define and configure
NetEye specific parameters related to the architecture, such as the IP address of the
service or the volume group to use.
The configuration of these parameters can be done in the dedicated file dedicated to
the service configuration located in /etc/neteye-services.d/<module_name>
. The folder
contains a set of YAML files, each one named as the service it refers to.
Once the configuration of a specific service has been performed, it is necessary to apply the changes to the cluster services configuration. This can be done by executing the install procedure focused on the service you want to apply the changes to:
neteye install --restrict-to-services <service_name>
Where the <service_name>
is the name of the service you want to apply the changes to.
Please refer to neteye install for more information about the neteye install command.
Adding a Service Resource to a Cluster¶
Service resources can be added by modifying an existing template,
located under the /usr/share/neteye/cluster/templates/
directory, then copying it to a suitable location, and finally using
it in a script.
For example, consider the Services-core-nats-server.conf.tpl
template.
{
"volume_group": "vg00",
"ip_pre" : "192.168.1",
"Services": [
{
"name": "nats-server",
"ip_post": "48",
"drbd_minor": 23,
"drbd_port": 7810,
"folder": "/neteye/shared/nats-server/",
"collocation_resource": "cluster_ip",
"size": "1024"
}
]
}
Copy it, then edit it.
cluster# cd /usr/share/neteye/cluster/templates/
cluster# cp Services-core-nats-server.conf.tpl /tmp/Services-core-nats-server.conf
cluster# vi /tmp/Services-core-nats-server.conf
Hint
You can copy the edited file to any other location, to be used for reference or in case you need to change settings at any point in the future.
In the file, make sure to change the following values to match your infrastructure network:
ip_pre: the corporate network address of the node (i.e., the first three octets).
ip_post: the IP address of the node (only the last octet)
Once done, make sure that the JSON file you saved is valid syntactically, for example by using the jq utility:
cluster# jq . /tmp/Services-core-nats-server.conf
A valid file will be displayed on, but if there is some syntactic mistake in the file, an explanatory message will provide a hint to fix the problem. Some possible message is shown next.
parse error: Expected separator between values at line 7, column 21
parse error: Objects must consist of key:value pairs at line 12, column 10
Note
Even if multiple errors are present in the file, only one
error message is shown at a time, so always run jq
until you
see the whole content of the file instead of error messages: this
will prove the file contains valid JSON.
Finally, let the cluster pick up the changes and configuration.
cluster# cd /usr/share/neteye/scripts/cluster
cluster# ./cluster_service_setup.pl -c /tmp/Services-core-nats-server.conf
Cluster Management Commands¶
The most important commands used for checking the status of a (NetEye) Cluster and to troubleshoot problems are:
drbdmon, a small utility to monitoring the DRBD devices and connections in real-time
drbdadm, DRBD’s primary administration tool
pcs, used to manage a cluster, verify its resources, constraints, fencing devices and much more
Hint
You can find more information about all their functionalities and sub-commands in their respective manual pages: drbdmon, drbdadm, and pcs.
In the remainder, we show some typical use of these commands, starting from the simplest one.
cluster# drbdmon
As its name implies, this command monitors what is happening in DRBD and shows in real time a lot of information about the DRDB status. Within the interface, any resource highlighted in red is in a degraded status and therefore requires some inspection and fix. Click p to show only problematic resources.
The next command is the Swiss army knife of DRBD and is used to carry
out all configurations, tuning, and management of a DRBD
infrastructure. The most important option of the drbdadm
command is -d
(long option: --dry-run
): the command is
executed and behaves exactly like without the option, but it makes
no changes to the system. This option should always be used
before making any change to the configuration, to check for possible
problems and unexpected side effects.
The command itself has a lot of options and sub commands, extensively described in the above-mentioned man page. Within a NetEye Cluster, the most used sub command is perhaps
cluster# drbdadm --dry-run adjust all
This command checks the content of the configuration file and
synchronises the configuration on all nodes. The given command only
shows what would happen, remove the --dry-run
option to actually
run it and make changes.
The third command is the main tool to manage the corosync/pacemaker stack of a cluster: pcs. Like drbdadm it has a number of sub commands and option.
cluster# pcs status
This command prints the current status of the NetEye Cluster, its nodes, and its resources, and allows to check whether there are any ongoing issues.
In the output, right above the Full list of resources, all the nodes (if any) are shown there, along with their state–Online/Offline and Standby being the most common.
The presence of Offline nodes, that is, nodes disconnected form the cluster or even shut down, is usually a sign of an ongoing problem and requires a quick reaction. Indeed, the only legitimate situation when a node can be Offline is after a planned reboot (like i.e., a kernel update or a hardware upgrade).
On the other hand, nodes should be in Standby state only during updates: if this is not the case, it is worth to check that node for problems.
If in the list of resources there is any resource marked as Stopped, below the list and right above the Daemon status appear some log entries for each stopped service. While these logs should suffice to give some hint about the reason for the resource being stopped, it is possible to check the full status and log files using the commands systemctl status <resource name> and journalctl -u <resource name>.
Additional sub commands of pcs are:
cluster# pcs property list
This command returns some information about the cluster and is similar to the following snippet:
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: NetEye
dc-version: 1.1.23-1.el7_9.1-9acf116022
have-watchdog: false
last-lrm-refresh: 1648467995
stonith-enabled: false
Node Attributes:
neteye02.neteyelocal: standby=on
The important points here are:
stonith-enabled: false
. This should always betrue
, a value offalse
, like in the example, implies that the cluster fencing has been enabled for the node. This should happen only during maintenance windows, otherwise an immediate inspection is required because it may result in a split-brain situation. It is important to remark that Fencing must always be configured on a cluster before starting any resource.neteye02.neteyelocal: standby=on
. The node is in Standby status, meaning it can not host any running services or resources, but will still vote in the quorum.
See also
Fencing is described in great details in NetEye’s blog post Configuring Fencing on Dell Servers.
cluster# pcs constraint
Returns a list of all active constraints on the cluster.
cluster# pcs resource show [cluster_ip]
This command shows all the configured resources; if the parameter
cluster_ip
is added, shows only the Cluster IP address.
See also
For more information, troubleshooting options, and debugging commands, you can refer to RedHat’s Reference Documentation for Pacemaker and high-availability, in particular Chapters 3. The pcs CLI, 9.7 Displaying fencing devices, and 10.3. Displaying Configured Resources.