User Guide Functional Overview Requirements Architecture System Installation NetEye Additional Components Installation Setup The neteye Command Director NetEye Self Monitoring Tornado Business Service Monitoring IT Operation Analytics - Telemetry Geo Maps NagVis Audit Log Shutdown Manager Reporting ntopng Visual Monitoring with Alyvix Elastic Stack IT Operations (Command Orchestrator) Asset Management Service Level Management Cyber Threat Intelligence - SATAYO NetEye Update & Upgrade How To NetEye Extension Packs Troubleshooting Security Policy Glossary
module icon Service Level Management
Overview Customers Availability Event Adjustment Outages Resource Advanced Topics
ntopng Visual Monitoring with Alyvix Elastic Stack IT Operations (Command Orchestrator) Asset Management Service Level Management Cyber Threat Intelligence - SATAYO Introduction to NetEye Monitoring Business Service Monitoring IT Operation Analytics Visualization Network Visibility Log Management & Security Orchestrated Datacenter Shutdown Application Performance Monitoring User Experience Service Management Service Level Management & Reporting Requirements for a Node Cluster Requirements and Best Practices NetEye Satellite Requirements TCP and UDP Ports Requirements Additional Software Installation Introduction Single Node Cluster NetEye Master Master-Satellite Architecture Underlying Operating System Acquiring NetEye ISO Image Installing ISO Image Single Nodes and Satellites Cluster Nodes Configuration of Tenants Satellite Nodes Only Nodes behind a Proxy Additional NetEye Components Single Node Cluster Node Satellites Nodes only Verify if a module is running correctly Accessing the New Module Cluster Satellite Security Identity and Access Management External Identity Providers Configure federated LDAP/AD Emergency Reset of Keycloak Configuration Advanced Configuration Authorization Resources Tuning Advanced Topics Basic Concepts & Usage Advanced Topics Monitoring Environment Templates Monitored Objects Import Monitored Objects Data Fields Deployment Icinga 2 Agents Configuration Baskets Dashboard Monitoring Status VMD Permissions Notifications Jobs API Configuring Icinga Monitoring Retention Policy NetEye Self Monitoring 3b Concepts Collecting Events Add a Filter Node WHERE Conditions Iterating over Event fields Retrieving Payload of an Event Extract Variables Create a Rule Tornado Actions Test your Configuration Export and Import Configuration Example Under the hood Development Retry Strategy Configuration Thread Pool Configuration API Reference Configure a new Business Process Create your first Business Process Node Importing Processes Operators The ITOA Module Configuring User Permissions Telegraf Metrics in NetEye Telegraf Configuration Telegraf on Monitored Hosts Visualizing Dashboards Customizing Performance Graph The NetEye Geo Map Visualizer Map Viewer Configuring Geo Maps NagVis 3b Audit Log 3b Overview Shutdown Manager user Shutdown Manager GUI Shutdown Commands Advanced Topics Overview User Role Management Cube Use Cases ntopng and NetEye Integration Permissions Retention Advanced Topics Overview User Roles Nodes Test Cases Dashboard Use Cases Overview Architecture Authorization Elasticsearch Overview Enabling El Proxy Sending custom logs to El Proxy Configuration files Commands Elasticsearch Templates and Retentions El Proxy DLQ Blockchain Verification Handling Blockchain Corruptions El Proxy Metrics El Proxy Security El Proxy REST Endpoints Agents Logstash Elastic APM Elastic RUM Log Manager - Deprecated Overview Authorization in the Command Orchestrator Module Configuring CLI Commands Executing Commands Overview Permissions Installation Single Tenancy Multitenancy Communication through a Satellite Asset collection methods Display asset information in monitoring host page Overview Customers Availability Event Adjustment Outages Resource Advanced Topics Introduction Getting Started SATAYO Items Settings Managed Service Mitre Attack Coverage Changelog Before you start Update Procedure Single Node Upgrade from 4.41 to 4.42 Cluster Upgrade from 4.41 to 4.42 Satellite Upgrade from 4.41 to 4.42 DPO machine Upgrade from 4.41 to 4.42 Create a mirror of the RPM repository Sprint Releases Feature Troubleshooting Tornado Networking Service Management - Incident Response IT Operation Analytics - Telemetry Identity Provider (IdP) Configuration Introduction to NEP Getting Started with NEPs Online Resources Obtaining NEP Insights Available Packages Advanced Topics Upgrade to NetEye 4.31 Setup Configure swappiness Restarting Stopped Services Enable stack traces in web UI How to access standard logs Director does not deploy when services assigned to a host have the same name How to enable/disable debug logging Activate Debug Logging for Tornado Modules/Services do not start Sync Rule fails when trying to recreate Icinga object How to disable InfluxDB query logging Managing an Elasticsearch Cluster with a Full Disk Some logs are not indexed in Elasticsearch Elasticsearch is not functioning properly Reporting: Error when opening a report Debugging Logstash file input filter Bugfix Policy Reporting Vulnerabilities Glossary 3b

Availability

Defining SLA Types

An SLA is a commitment between a service provider and a client, defining particular aspects of a service. Within the SLM module, an SLA type can be associated with a customer contract and defines limits for metrics to be guaranteed by the service provider as well as the exact temporal boundaries during which the metrics must be guaranteed.

When the SLM module is first launched, the SLA Type panel is focused, displaying a row for each configured SLA type. Additional panels allow to define Contracts and Customers respectively. Search functionality is available for all three panels, but the text being searched will be restricted to the Name and Contract columns in the first two panels.

Note

Before you can successfully set up SLA types and contracts, you need to have defined a few other objects in the Directory, namely one or more TimePeriods (as Operational Time) and a filter expression (as Object Filter). An Object Filter is used to define for which host(s) or service(s) the contract is defined; in other words, on these hosts or services it will be calculated the availabilty required by the customer. Examples of valid input for the Object Filter are: host_name=*neteye* or service_description=jenkins*.

Before you can create an SLA contract to see the availability of monitored objects for a customer, you must first define the parameters for the SLAs you intend to use. You can do this at SLM / SLA Types, using the follow options:

  • Name: The name of this SLA type (e.g., “Gold level”)

  • Description: A more user-friendly description of the SLA type

  • Operational Time: The exact time(s) during which all elements necessary for a monitored object to function properly should in fact be in operation. The operational time is precisely defined by a TimePeriod object in Director. This field lets you select either a pre-defined Timeperiod object or one that you have created at Icinga Director / Timeperiods / Timeperiods.

  • Calculation Period: The unit of time over which the data will be aggregated into service level reports. For instance, if you want an availability report for the current year, you might want it broken down into “Monthly” or “Weekly” subsections.

  • Availability %: The target percentage of SLA availability for the calculation period. For hosts, only Down states have a negative impact on availability. For services instead, both Critical and Unknown (but not Warning) will decrease availability.

  • Downtime: When this box is checked, the scheduled downtimes of monitored objects will be taken into account for any related availability calculations. When downtime is in effect, the related monitored object is considered available, regardless of its actual state during that period. Once the scheduled downtime ends, the object’s state will be reset to the value of its most recent state change.

At the moment the only supported TimePeriod values (i.e., Ranges) are exact dates and names of weekdays. There is also currently no support for excluded ranges and included ranges.

More precise definitions of Calculation Period, Availability, Downtime, and other terms can be found in the glossary, while an example of the algorithm on which the SLM is based, is shown in How the availability is calculated.

The following table defines each calculation period more precisely:

Calculation Period

Unit of Time

daily

from 00:00 to 24:00 of that same day

weekly_sunday

from 00:00 on Sunday to 24:00 on Saturday

weekly_monday

from 00:00 on Monday to 24:00 on Sunday

monthly_1

from 00:00 on the first day of one month to 24:00 on the last day of the same month

monthly_2

from 00:00 on the first day of one month to 24:00 on the last day of the subsequent month

monthly_3

from 00:00 on the first day of one month to 24:00 on the last day of the third month

monthly_4

from 00:00 on the first day of one month to 24:00 on the last day of the fourth month

monthly_6

from 00:00 on the first day of one month to 24:00 on the lsat day of the sixth month

monthly_12

from 00:00 on the firstday of one month to 24:00 on the last day of the twelfth month

Note

The “last day” of a month may be the the 28th or 29th for February, 30th or 31st otherwise.

Creating SLA Contracts

Average Availability provides aggregated statistics to support the verification of SLA for customers with more complex contract definitions. Specifically, every report includes by default the measure of the Average Availability of all hosts and services in the time period. In order to enable Average Availability, you should tick the Include Average Availability checkbox as described below. You can override this setting by changing the same variable in the report creation section–see Availability Reports.

Once you have defined an SLA type, you can begin creating Service Level Agreement contracts. Click on SLM / Contracts and enter appropriate values for the following options:

  • Name: The name of this contract

  • Description: A more user-friendly description of the contract. This description can be displayed in the report and supports formatting through GitHub Flavored Markdown.

  • Customer: You can set in the Customer tab the customer whose monitored objects (typically hosts and services) will be included in the availability report. Only customers with the same role as the logged in user are displayed.

  • SLA Type: The type of SLA you defined in the section above

  • Consider Event Adjustments: This checkbox should be set by an administrator if you want to allow event adjustments to be considered when generating a report.

  • Include Average Availability: This checkbox should be set if you want to include average availability when generating a report.

  • Render Contract Description in Report: This checkbox should be set if you want to insert contract description in rendered report.

  • Objects Type: can be set to host or service* for including respectively only host or service objects into the current ***Contract* or to all for considering both hosts and services.

  • Objects Filter: A set of monitored objects determined by an Icinga filter expression. It is important to check that the filter expression actually returns at least one monitored object.

  • Monitoring Views: Depending on the choice of the Object types, here will be shown in parentheses the count of hosts, services, or both, that match the object filter. A click on the Hosts link or Services link will take you to the related Monitoring Overview.

Availability Reports

The SLM module is compatible with Icinga’s Reporting module. One use of the data provided via SLM is for creating availability reports for the monitored objects included in each customer contract.

Note

Before configuring a new report, make sure appropriate permissions are granted to the user’s NetEye role. If the user has to to define a new report he should have at least the General module access enabled both for SLM Module and for Reporting, and the reporting/reports permission enabled under the Reporting section.

To create an availability report, you will need to:

  • Configure one or more customers, SLA types, and contracts in the SLM module

  • Create a new report in the Reporting module and set the following fields, which are all compulsory:

    • Name: A name that uniquely identifies the report

    • Timeframe: Selecting a value here defines for how much time the report will be generated. This value must be higher than the calculation period defined in the SLA Type for which the report is generated, otherwise it will lead to an empty report (see next section).

    • Report: Set this to SLM Report, whereupon the form will add the following field

    • Customers: Choose the customer you want to create the report for

    • Contract: Choose the contract corresponding to the data to be processed in the report. In the Contract dropdown, you will be able to select only those contracts where the Calculation Period is defined to be smaller than the selected Timeframe. This ensures that the report will have an appropriate number of pages.

    • Consider Event Adjustments: With this drop-down, users with the appropriate permission can choose whether or not to consider user-defined event adjustments in this report. There are three possible choices:

      • <Yes/No> (inherited from “< ContractName >”): This option contains the Consider Event Adjustment flag value (i.e. Yes or No) with the contract name from which the value is inherited.

      • Yes: Override that value, forcing event adjustments to be considered

      • No: Override that value, forcing event adjustments to be ignored

    • Include Average Availability: With this drop-down, users can choose whether or not to include average availability in this report. There are three possible choices:

      • <Yes/No> (inherited from “< ContractName >”): This option contains the Include Average Availability flag value (i.e. Yes or No) with the contract name from which the value is inherited.

      • Yes: Override that value, forcing average availability to be included

      • No: Override that value, forcing average availability to be ignored

    • Show Outages Count: Show how many Outages are defined per calculation period.

    • Show Outages List: Show Outages in the report. If set to Yes the following fields will appear:

      • Show Outages List Limit: Set the maximum number of outages to show per calculation period.

      • Sort Outages by: Users can sort Outages by Duration or by Start/End Date. If not specified, Outages are sorted by Duration.

      • Outages order: Users can invert the Outages sorting order by selecting the Ascending or Descending option in this field. Descending is the default order.

After you click on “Create Report”, the report will appear in the list of available reports.

Within each report, you can read the details related to the selected contract and its monitored objects. This information is typically divided into hosts and services, and represents their percentage of availability. The availability of a monitored object will be green if it is above the threshold defined in its SLA Type, and red if not. In addition, all monitored objects that did not record any events during the reporting period will be listed separately.

You should ensure that the filter expression used in the Objects Filter field on the Contract tab returns at least one monitored object (e.g., at least one host or service).

Invalid Report Configurations

While the SLM module does not allow users to create incorrect report configurations, there are circumstances in which reports may seemingly contain wrong data, namely when the report is empty or very large. The reasons behind these two cases, along with solutions, are explained in the next sections.

Configurations Leading to Empty Reports

If the report’s time frame and the contract’s calculation period aren’t compatible, the report generated will be empty. This can happen when:

  • The Calculation Period is greater than the Time Frame. For example, setting the calculation period to monthly_12 and defining the time frame to be from 01.01.2019 to 01.06.2019. This would be like trying to fit 12 months inside 6 months.

  • The time frame doesn’t contain at least one entire and valid calculation period. For instance, when you define a report with a monthly calculation period, while the time frame is defined to start on 02.07.2019 and finish on 29.08.2019. Here, neither the time within July nor the time within August represents a complete month.

If you find you have created a report definition matching one of these cases, you can fix it with one of the following solutions:

  • Make the time frame defined for the specific report longer

  • Select a different SLA Type in the contract form, with a smaller calculation period

  • Select a smaller calculation period in the definition of the SLA type associated with the contract used for creating the report

Configurations Leading to Very Large Reports

If the combination of a report’s time frame and the contract’s calculation period would lead to a number of calculation period slots higher than a pre-determined limit, it strongly implies that the report produced would have an excessive number of pages.

NetEye attempts to avoid this situation by preventing a user from creating very large reports. In general, reports consisting of hundreds of pages are not useful. However, should you wish to override this upper bound for the allowed number of calculation periods, you can change this limit in the SLM module’s configuration page (Configuration / Modules / slm / Configuration) with the field Maximum report size. If you do so, please note that increasing this limit will lead to a proportional decrease in performance.

Outages

Annotations

Every outage can have an Outage Annotation that will be shown in the SLA report when rendered: it can be easily customized and supports markdown formats. You can add an annotation directly from SLM > Outage Annotations. In order to associate an annotation with the desired outage, you have to select a date between the start and the end of the outage itself.

Add Outage Annotation from monitoring event

The annotation can be also inserted directly from the monitoring event tab. Select the link under the “Outage Annotations”, in the outage event itself. The form for creating the annotation will be partially precompiled with the information related to the selected outage.