Restarting Stopped Services¶
This section helps you in solving issues related to the correct starting of the services running on NetEye.
Single Node Installations¶
If you use the neteye status command and see in the output
some service marked as DOWN
, when in fact it should be UP
and
running, you can try to restart it. For example, take this excerpt
from the output of neteye status.
UP [0] tornado.service
DOWN [3] tornado_email_collector.service
DOWN [0] tornado_icinga2_collector.service
DOWN [3] tornado_nats_json_collector.service
DOWN [3] tornado_webhook_collector.service
As an example, suppose you just installed Tornado. In this case, the relevant information shows
Tornado as running (green UP
), but the collectors as stopped (red
DOWN
), therefore you will need to start all the collector you
need. You can do by running either these four commands:
# systemctl start tornado_email_collector.service
# systemctl start tornado_icinga2_collector.service
# systemctl start tornado_nats_json_collector.service
# systemctl start tornado_webhook_collector.service
After a while, depending on how many services are restarting, run again:
# neteye status
Alternatively, you can check the status (and restart if necessary) of one service only, for example if you only need the email collector:
# systemctl status tornado_email_collector.service
The output should include the line Active: active (running)
:
● tornado_email_collector.service - Tornado Email Collector - Data Collector for procmail
Loaded: loaded (/usr/lib/systemd/system/tornado_email_collector.service; disabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/tornado_email_collector.service.d
└─neteye.conf
Active: active (running) since Mon 2021-07-26 15:59:49 CEST; 15s ago
Main PID: 24389 (tornado_email_c)
CGroup: /docker/fe7ce88742f2ec8405c470064b414e8c73b384c2f5292605109ddfa7cf6b5373/system.slice/tornado_email_collector.service
Jul 26 15:59:49 fe7ce88742f2 systemd[1]: Started Tornado Email Collector - Data Collector for procmail.
Jul 26 15:59:49 fe7ce88742f2 tornado_email_collector[24389]: Jul 26 15:59:49.313 INFO tornado_email_collector: Email collector started
Jul 26 15:59:49 fe7ce88742f2 tornado_email_collector[24389]: Jul 26 15:59:49.313 INFO tornado_email_collector: Connect to Tornado through NATS
If still the Tornado email collector shows up as Active: inactive
(dead)
, then you check the log entries that appear as the output of
the above command and, should they still be unuseful, check the full
log with the journalctl -u tornado_email_collector.service
command.
Hint
If the service is stopped, the -n 50 option (show the last 50 lines) can prove useful, while to follow the log in real time, add the -f option to the command.
Cluster Installations¶
On a NetEye Cluster, resources (i.e., services running on a cluster) should automatically restart when something goes wrong and they are terminated. In case this does not happen, you need to check the cluster to find where the resource was started the last time and then try to manually restart it or inspect its configuration.
On cluster, the neteye status, neteye start, and neteye stop commands are not available, hence you need to rely on pcs, that you can run on any node of the cluster:
# pcs status
Cluster name: NetEye
Stack: corosync
Current DC: neteye01.local (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Wed Jul 28 09:47:52 2021
Last change: Tue Jul 27 15:04:36 2021 by root via cibadmin on neteye02.local
2 nodes configured
74 resource instances configured
Online: [ neteye01.local neteye02.local ]
Full list of resources:
cluster_ip (ocf::heartbeat:IPaddr2): Started neteye02.local
Resource Group: tornado_rsyslog_collector_group
tornado_rsyslog_collector_drbd_fs (ocf::heartbeat:Filesystem): Started neteye02.local
Resource Group: tornado_group
tornado_drbd_fs (ocf::heartbeat:Filesystem): Started neteye02.local
tornado_virt_ip (ocf::heartbeat:IPaddr2): Started neteye02.local
tornado (systemd:tornado): Started neteye02.local
Resource Group: tornado_email_collector_group
tornado_email_collector_drbd_fs (ocf::heartbeat:Filesystem): Started neteye02.local
tornado_email_collector (systemd:tornado_email_collector): FAILED neteye02.local
Resource Group: tornado_icinga2_collector_group
tornado_icinga2_collector_drbd_fs (ocf::heartbeat:Filesystem): Started neteye01.local
tornado_icinga2_collector (systemd:tornado_icinga2_collector): Started neteye01.local
Resource Group: tornado_webhook_collector_group
tornado_webhook_collector_drbd_fs (ocf::heartbeat:Filesystem): Started neteye02.local
tornado_webhook_collector (systemd:tornado_webhook_collector): Started neteye02.local
Failed Resource Actions:
* tornado_email_collector_monitor_30000 on neteye02.local 'not running' (7): call=414, status=complete, exitreason='',
last-rc-change='Wed Jul 28 09:57:21 2021', queued=0ms, exec=0ms
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
In this snippet shows only tornado-related resources are shown, but the actual output of the pcs status command can be much longer. At the bottom we see that tornado_email_collector is in a failed state (this is shown also by the tornado_email_collector (systemd:tornado_email_collector): FAILED neteye02.local line.
Usually on a cluster a failed resource is restarted automatically, so you can simply check again if it started by issuing again pcs status: If this does not happen, the resource must be started manually by issuing these two commands.
# pcs resource cleanup tornado_email_collector
# pcs resource enable tornado_email_collector
Here, cleanup resets the status of the resource, while enable allows the cluster to start the resource.
If you only need or want to restart a running resource, use
# pcs resource restart tornado_email_collector
In case the resource still does not start and remains in a failed state, it will be necessary to inspect the log. Most of the logs can be checked using the same journalctl command shown in previous section, provided you are on the node where the resource failed.
1 2 3 | Failed Resource Actions:
* tornado_email_collector_monitor_30000 on neteye02.local 'not running' (7): call=414, status=complete, exitreason='',
last-rc-change='Wed Jul 28 09:57:21 2021', queued=0ms, exec=0ms
|
In our case, line 2 shows that tornado_email_collector was launched on neteye02.local, so make sure to log in to that node before checking the log.
# ssh neteye02.local
# journalctl -u tornado_email_collector
Other resources may have dedicated log files saved separately on disk;
this happens for example to logstash, whose log is by default
saved in /neteye/shared/logstash/log/logstash-plain.log
.