Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

By default, the Occurrence counter is disabled.
For enabling, you have to activate the switch and add one or more counter definitions to your BVQ alert condition.

...

Expand
titleFigure 1

Image Added

Figure 1: Use Occurrence counter


After enabling the Occurrence counter, you can decide on a method to choose from  → Consecutive violations or Violations per time

...

INFO: Please note that a mixture of SLA and non SLA mode within a single Alert Condition is not possible.

Expand
titleFigure 2


Consecutive violations

...


Image Added



TypeType of this
Occurrence counter                                                     
  • Consecutive violations - define how many consecutive violations in a row must occur before the condition is fulfilled
AmountNumber of violationsMaximum number of violation occurrences that are counted until the Alert level is raised.
Alert levelBVQ Alert level

Desired Alert level, one of: OK, INFO, WARN, ERROR, UNKNOWN


Violations per timeframe

...


Image Added



TypeType of this
Occurrence counter                                                     
  • Violations per timeframe - define how often in a certain timeframe a condition must match until the condition is fulfilled
TimeMinutesSize of the sliding timeframe, in minutes. Only available for type "Violations per time" if SLA mode is turned off.
AmountNumber of violationsMaximum number of violation occurrences that may be counted until the Alert level is raised.
Alert levelBVQ Alert level

Desired Alert level, one of: OK, INFO, WARN, ERROR, UNKNOWN


Violations per SLA interval (fixed window)

...


Image Added


TypeType of this
Occurrence counter
  • Violations per SLA interval - define how often in a fixed timeframe a condition must match until the condition is fulfilled
Time

...

Image Added
If the SLA Mode is enabled the time setting is not available. The number of Alert conditions within the SLA interval will be counted.
AmountNumber of violationsMaximum number of violation occurrences that may be counted until the Alert level is raised.
Alert levelBVQ Alert level

Desired Alert level, one of: OK, INFO, WARN, ERROR, UNKNOWN


Figure 2: Differences between the options

Illustration of the different methods:

...

YELLOW:  Only a certain amount of exceedances within a specified timeframe triggers an alarm. (Occurrence counter ON - Violations per Time) - (delayed trigger mode)Image Removed

Expand
titleFigure 3

Image Added

Figure 3: Real Life example and illustration of the different methods

Example & description

This table describes the different options using a real-life example.
More detailed information about each option is given, as well as a description of how to activate one of the options.

Expand
titleFigure 4


Methods of choice

Example

How does it work?

What to fill in

Default (direct trigger mode)

...


Image Added

Each exceeding of a predefined threshold triggers an event within the designated warning level. (Info, Warning, Error)

If a latency exceeds 3ms, the threshold is reached and an error message is triggered. 

The example with the red border shows only one of these events for clarification, but this behavior would apply to all further exceedances.


The default will raise an Error event as soon as the rule is violated the first time.

...



Image Added

Consecutive Violations

...


Image Added

Only when a set of 5 consecutive errors occur, an event is triggered within the designated warning level. (Info, Warning, Error)

This condition is shown inside the blue frame and actually occurs only once in this performance chart.


The latency for the MDisk is 5 times above the 3ms threshold and therfore this rule will raise an Error.

...

Image Added

Violations per timeframe

...


Image Added


The yellow box shows the possibility of setting "Violations per time". Here, a number of violations per time is specified.

As soon as this number is reached, the designated warning level is displayed.

However, if this value is no longer exceeded, the status of the alert rotates back to the next better value.


In this example we have specified a number of 3 Violations per 60 min. timeframe. With the exceedance of this an alert message will be raised.

...



Image Added

Figure 4: The different options using a real-life example

When using the non-SLA - sliding window mode, the status of a rule varies, depending on the current value.
If in the next measurement the value falls below a critical value again, this will be reflected in the status of the alert rule and it will change from e.g. Error Level back to an OK state.
Relevant for this is the PI timing and the time specified in the Sliding window. 
Here, an additional possibility has been created to maintain this state in the long term, over a certain SLA time interval. (fixed window)

...

The main difference from the Sliding Window (used without the SLA option) is the fact that the state of the Alert rule remains.
By default, SLA mode is disabled. To enable it, change the SLA INTERVAL from "SLA mode not enabled" to the desired timeframe.
The following illustration shows the differences by means of a comparative analysis:


Standart timing mode (sliding window)SLA timing mode (fixed window)
Intended to be used for normal monitoring purposesIntended to prove the health of service level objectives.
Allows to define a separate Timing for each Occurrence counter definitionForces all Occurrence counter definitions to use the configured SLA timing
Uses the timing for a sliding window - Time setting within the Occurrence counter definitionsUses the timing for fixed window - Time setting for the SLA interval
Reset to default Alert level (typ. OK) when the measurements in the sliding window are below the counts of all Occurrence counter definitions.Reset to default Alert level (typ. OK) at each start of the SLA interval fixed window.

Figure 5: Standart timing mode versus SLA timing mode

...

Expand
titleStep 1: Start of the interval


Image RemovedImage Added

A rule is monitored every 5 min and leads to the output of the error status if it is exceeded once or several times.

In this picture it is to be recognized that with the start the state still runs in the "OK" status and after in each case 5 min another error occurs.

Accordingly the status changes independently of the sliding or fixed window first to info, then to warning and finally to error status.

Thus, both options are in the Error status in interval 25.


...

Expand
titleStep 2: Behaviour of the fixed window mode


Image Removed

Image Added

The SLA interval is set to 30 minutes in this example, so the alert level status will be reset to OK after exactly this period. 

With the default time mode, the status is re-evaluated every time the warning rule is exceeded.
As can be seen in the next step 3, the status changes over a period of time and thus re-evaluates each time how many violations of the rule have occurred within the time window.




Expand
titleStep 3: Behaviour of the sliding window mode


Image RemovedImage Added

This example shows the display of the different alarm levels. In SLA mode, the status is reset exactly after 30 minutes, as shown in step 2. And the counting of rule violations starts again. 

Whereas in sliding window mode, the current state and the number of state changes in the time window are taken into account. After the 4th PI interval, i.e. after minute 20, the rule is no longer violated. The sliding window over a period of 30 minutes now counts the number of rule violations within the 30 minutes and decides how often the rule was broken here. 

Here you can clearly see the "expiration" of the alarm level status. 


...

  • ERROR at 3 ms
  • Violations per time 5, 10, 20 occurrences per time
  • consecutive violation by more than 5 occurrences

...

Expand
titleFigure 6

Image Added

Figure 6: Real-life example



BVQ Alert rule - SLA Mode not enabled (sliding window)Each box represents a 1 hours sliding window and the color shows the current statusExplanation
NameSVC MDisk violated
Image Removed

Image Added


INFO: Each colored box represents a sliding window of 1 hour and indicates the current status by its color.

From 07:15 untill 08:55 the 3ms latency rule was exceeded 5 times. → Status Raised to ERROR

The next upcoming 5 min PI measurements did also contain more than 5 violations of the 3ms latency rule. → Status still at ERROR
From 09:15 the staus lowerded the severity because the 3ms latency rule is no longer exceeded 5 times in that 60 min sliding window. → Status  WARN


Perfomance indicator timing5 minutes
SLA intervalNONE (OFF)

AR ConditionLatency > 3ms  

1. AR Condition > 3 msERROR
Image Removed

Image Added

Image Removed

Image Added

INFO: Each colored box represents a sliding window of 1 hour and indicates the current status by its color.

Figure 7: Disabled SLA mode


BVQ Alert rule - SLA Mode enabled (fixed window)SLA interval of one dayExplanation
NameDaily SVC MDisk SLA violated
Image Removed

Image Added

INFO: Each box shows by its color the status since the beginning of the day up to that moment. 


SLA mode sets the status to default OK at the start of the day.

At 08:55 5 violations are counted this will raise the status to  INFO

At 09:10 10 violations of 3ms latency are counted so that the status will raise to WARN

At 09:15 5 consecutive violations are happening. This will trigger our 4. alert rule and therefore the status will direclty raise to ERROR

This status will be based on the SLA mode persistant untill the next start of the day.

It will then be reset to 

Status
colourGreen
titleOK




Perfomance indicator timing5 minutes
SLA interval1 day
AR ConditionLatency > 3ms
1. AR Condition Occurrence counter

ERROR

20 times per SLA interval
2. AR Condition Occurrence counter

WARN

10 times per SLA interval

3. AR Condition Occurrence counter

INFO

5 times per SLA interval
4. AR Condition Occurrence counterERROR5 times in a row per SLA interval

Figure 8: Enabled SLA mode

...

  • Violations per timeframe (sliding window) → → Time must be greater than  →  Amount of violations  X  PI timing

Image RemovedImage Removed X Image RemovedImage Added > Image Added X Image Added

  • Violations per timeframe (fixed window) → →   SLA interval must be greate than → Amount of violations  X  PI timing

Image Removed Image Added >Image RemovedImage AddedX Image Removed Image Added

Summary

The new BVQ Occurrence counter is an important possibility to specify the amount and the way of monitoring your environment with the Alerting options. With this option it is possible to control very simple up to highly complex scenarios. 

...