Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 137 Next »

Unable to render {include} The included page could not be found.

VersionReleasedChangelog
2022.H1.9

 

2022.H1.8

 

2022.H1.7

 

2022.H1.6

 

2022.H1.5

 

2022.H1.4

 

2022.H1.3

 

2022.H1.2

 

2022.H1.1

 

2022.H1.0

 


Download

BVQ downloads are available on the BVQ° Website

Due to major changes in the AIX model, historical AIX data collected by previous BVQ releases will be deleted during the upgrade to version 2022.H1.2. Please be aware, that these model changes could also affect compatibility to custom reports, dashboards, alert rules, GUI favorites  and external applications using BVQ REST.

Please see AIX OS model refinements for further details.

Content

Highlights

Linux OS model

Brief description

BVQ now supports a "Linux operating system" as a new platform. This is currently an extension to the BVQ PowerVM platform. An agent needs to be installed on each Linux instance that collects performance and topology information that aren't available via HMC.

Licensing

For customers with BVQ licensed PowerVM LPARs, the Linux and AIX OS agents can be run without additional licensing. Please review BVQ License Information Page or ask your sales contact for more information.

Information gathering

Similar to the BVQ AIX OS agent, the BVQ Linux OS Agent receives data sent from the BVQ agent on the scanned systems. An RPM package including the BVQ agent has to be installed on each OS instance to be monitored. This agent is responsible for collecting, packaging, encrypting and sending the data to the BVQ server using the scp protocol once a minute.

The data contains topology information coming from several Linux CLI commands as well as performance statistics gathered by njmon.

Which information is collected?

  • Topology configuration information
    • collected every minute (default)
    • 17 object types containing a total of ~105 attributes
  • Performance statistics
    • collected every minute (default)
    • 10 objects types containing a total of ~165 performance statistics


Additional information:

Object types

A bunch of new object types come along with the BVQ Agent for Linux. The following table provides an overview of those new object types, their meaning and whether they offer performance information:

GroupBVQ NameDescriptionLinux devicePVM OTPerf
OT?
SystemLinux Instance group

Group of Linux OS instances configured to report their information to the same BVQ Scanner instance. Its name and ID is copied from the BVQ Scanner instance name.

Origin: BVQ Scanner logic

N/AN/A(error)
Linux Instance

Linux OS instance running on an LPAR of a PVM Managed system. Its name represents the configured Linux host name.

Origin: Linux OS attributes, /proc/stat, /proc/vmstat, /proc/meminfo, /proc/net metrics

N/A

PVM LPAR

(tick)
Linux Physical CPU

Physical CPU core reported by Linux. It usually runs multiple logical CPU threads. Its name is constructed from "vcpu" + ID

Origin: Linux "lscpu" attributes

N/APVM vCPU(error)
Linux Logical CPU

Logical CPU threads running on a multithreaded physical CPU core reported by Linux. Its name is constructed from "lcpu" + ID

Origin: Linux "lscpu" attributes, /proc/stat/<cpu> metrics

N/Acpu<num>N/A(tick)
StorageLinux Swap space

Linux paging/swap space. Its name is taken from the block device or file it runs on.

Origin: Linux "swapon", "findmnt" attributes, /proc/diskstats metrics

/dev/<blockdev>N/A(tick)
Linux Filesystem

Linux Filesystem usually running on block volume device. Its name represents the mount point in the filesystem tree. Performance statistics are copied from the block device of the filesystem. 

Origin: Linux "findmnt", "lsblk", "df" attributes, /proc/diskstats metrics

/dev/<blockdev>N/A(tick)
Linux Disk path

Path of a Linux disk storage volume to a target port of a storage device via a storage adapter. A single copy of the disk device is created as fake path for single path disk devices. 

Origin: AIX OS "lsscsi", "lsblk" attributes, /proc/diskstats metrics

/dev/<blockdev>

PVM vSCSI map


(tick)
Linux Disk

Linux single or multipath disk storage volume device provisioned by an integrated or external storage system attached via storage adapter. Multipath disks are managed by the Linux multipath device mapper function.

Origin: Linux "lsblk", "multipath" attributes, /proc/diskstats metrics

/dev/<blockdev>PVM vSCSI device


(tick)
Linux Disk partition

Partition of a Linux disk storage volume device.

Origin: Linux "lsblk" attributes, /proc/diskstats metrics

/dev/<blockdev>N/A(tick)
Linux RAID disk

RAID layer block device containing one or more Disks or Disk partitions managed by Linux md layer (mdadm).

Origin: Linux "lsblk" attributes, /proc/diskstats metrics

/dev/<blockdev>N/A(tick)
Linux Block device

BVQ created general block device representing a Linux Disk, Disk partition, RAID disk or LV. That eases path selection and report design to show the block device for a FS, Swap space or PV without knowing its type.

Origin: Linux "lsblk" attributes

/dev/<blockdev>N/A(error)
Linux Storage adapter

Linux storage host bus adapter device to attach block storage devices via FC, SCSI, NVMe or other protocols. Its name is constructed by: "host"+<id>

Origin: Linux "lsscsi" attributes

N/A

PVM vSCSI client adapter

PVM vFC client adapter

PVM FC port

(error)
LVMLinux Logical volume

Logical Volume (LV) provisioned by one or more Physical Volumes of a Volume group configured by the Linux Logical Volume Manager. 

Origin: Linux "lvs", "lsblk" attributes, /proc/diskstats metrics

/dev/<blockdev>N/A(tick)
Linux Physical volume

Physical Volume (PV) part of a Volume group configured by the Linux Logical Volume Manager. Its name is taken from the block device of the PV.

Origin: Linux "pvs", "lsblk" attributes

/dev/<blockdev>N/A(tick)
Linux Volume group

Volume group (VG) containing one or more Physical Volumes provisioning Logical Volumes configured by the Linux Logical Volume Manager. 

Origin: Liunx "vgs" attributes

N/AN/A(tick)
NetworkLinux Network adapter

Linux local area network interface.

Origin: Linux "ethtool", "ip" attributes, /proc/net/dev metrics

eth<num>

PVM vNIC

PVM Network client adapter

(tick)

Object relations

The following graphs show the Linux object types and their connections to existing BVQ platforms:

Object tipps & tricks

Linux Block device

BVQ is designed to support End to End analytics. We followed this paradigm also for the Linux OS, where the relations between Filesystems or Swap spaces and the disks they depend on are not obvious. Linux storage can be configured in many ways. For example, the path from a Filesystem to its Disk paths seldom go down the full stack through LV > PV > RAID disk > Disk partition > Disk. Some configurations don't use LVM others don't use MD RAID, others omit partitioning ... . To respect that, we added the abstract object type "Linux Block device" as a connection object between a Filesystem or Swap space and the Block device they reside on. The Block device can be either a Logical volume, a Disk, a Disk partition or a RAID disk and more or less represents the output of the lsblk CLI command.

With the Linux Block device object you get the following usage advantages

  • A single object type list represents an overview of all important block devices in your configuration
  • The Path Filesystem / Block device and Swap space / Block device always shows the Block device below it. No need for you to know the exact type of the block device.

Linux Disk path

Most Linux storage configurations use multipathing but some don't (e.g. direct attached disks). The BVQ Linux Disk path Object was implemented to always exist between a Disk and a Storage adapter. In cases of single path attached Disks, BVQ creates an artificial Disk path object and copies some attributes of the Disk (name, addresses, IDs, ...) into it. That eases the navigation through the path of dependencies a lot.

Linux Storage adapter performance stats

The IBM PowerVM drivers ibmfc and ibmvscsi don't deliver performance stats but nevertheless BVQ shows some for the related BVQ Linux Storage adapter objects. BVQ aggregates these from the statistics of all Disk pathes provided by a Storage adapter. That provides the bandwidth, IO rate, latency and more for all disk based IO flowing into an adapter. Be aware that non disk IO (e.g. tape) isn't counted.

Linux Filesystem and Swap space

Linux Filesystems and Swap spaces don't deliver performance statistics. That makes it especially hard to identify the filesystem, that causes the most IO on a Disk. To ease that, we copied the performance stats of the nearest underlying block device into the Filesystem and Swap space object. With that, each Filesystem and Swap space directly shows the block IO load it causes.

Kernel path vs Device path

Multiple names can be configured for a Linux Block device, but only one of it is the primary name, the others are just aliases represented as links to the primary entry in the /dev directory. To respect this, the BVQ storage objects provide all names of the block device. The primary name is represented as the "kernel path", alias names as "device_path" or "lvm_path" in addition for LV objects.

Identifying a Disk by UID

Often it is hard to relate a Disk occurring in a Linux or AIX OS to the volume provided by the storage subsystem. In many cases the volume goes through one or more virtualization / hypervisor configuration stacks that obscure its unique ID until it reaches the OS. Another problem is, that a volumes ID can have multiple formats (NAA, EUI, MD5, vendor serial) that can differ from each other completely. BVQ tries hard to get all native IDs for a disk to address that. That's why the BVQ Linux Disk object has "Volume vendor UID", "Volume NAA UID", "Volume EUI UID" and "Volume MD5 UID" in addition to "Volume UID", which represents the NAA format as long as the Volume provides it.

Full storage stack statistics

Linux allows the configuration of multiple storage block device layers controlled by an internal device mapper. This stack provides block devices to applications like a filesystem from which IO flows through the following (optional) layers down to the storage adapter and storage backend devices:

  • LVM2
  • MD RAID
  • Disk partition
  • Disk
  • Disk path

Linux delivers performance statistics for each of these layers and guess what, BVQ collects and provides them to be usable in all BVQ applications. This is very useful to identify the culprit of high storage load or to identify storage performance bottlenecks in the stack.

Some use case examples are:

  • Stats per LV or Disk partition allow to monitor the storage performance of the application using it.
  • Stats per Disk below an MD controlled RAID mirroring allow to monitor, if load equally balanced across all mirrors.
  • Stats per Disk path below a multipathed Disk allow to to monitor, if load equally balanced across all paths.

Predefined Alert Rules

CategoryAlert ruleSeverityDescription
Configuration VG without LV 

LOW

There are no logical volumes in a volume group!

SUGGESTED ACTION:
Check if the Volume Group can be deleted.

 RAID1 disk without redundancy 

MEDIUM

A valid RAID1 array must consist of two disks!

SUGGESTED ACTION:
Check which configuration is intended: Either remove the RAID from the disk or add a 2nd one so that data can properly be mirrored.

 SCSI queue depth different to block device queue depth 

MEDIUM

The SCSI device (=OS volume) queue depth should be equal to the block device queue depth.

SUGGESTED ACTION:
Adjust either queue depth value. Block device queue depth is defined in /sys/block//queue/nr_requests and SCSI device queue depth is defined in /sys/bus/scsi/H:C:T:L/queue_depth. 

 Storage adapter without disk paths 

LOW

A valid RAID1 array must consist of two disks!

SUGGESTED ACTION:
Check which configuration is intended: Either remove the RAID from the disk or add a 2nd one so that data can properly be mirrored.

 Disk contains unused, unpartitioned capacity 

LOW

Unpartitioned space on a disk indicates that the space on a disk does not belong
to any partition and no data can be written to it.

SUGGESTED ACTION:
Take advantage of this additional space for resizing a file system.

Satus Disk path unavailable 

MEDIUM

There are disk paths in an unhealthy state. All disk paths should be running.

SUGGESTED ACTION:
Please check the disk path.

 FC adapter not online 

MEDIUM

There are FC adapters that are not online.

SUGGESTED ACTION:
Please check why the adapter is not online.

 Disk state error 

MEDIUM

Disks state is not "running".

SUGGESTED ACTION:
Please check the corresponding disk to solve the problem.

 LV state error 

MEDIUM

Logical volume state is not "running".

SUGGESTED ACTION:
Please check the corresponding logical volume to solve the problem.

 Network adapter not up 

MEDIUM

There are network adapters not in state "UP,LOWER_UP".

SUGGESTED ACTION:
Please check why the network adapter is not up. Maybe it is not in use and can be removed from the system.

BVQ Expert GUI representation

All objects, attributes and statistics of this platform can be interactively browsed within the BVQ Expert GUI. 

Where to find Linux objects?

Similar to all other objects, they show up in the Path browser inside the property panel of the BVQ Expert GUI or can be opened as a Table view from the Favorite menu. Several Linux objects can be combined in an end to end relation to objects of other platforms.

Path browserTable views

Object attributes

BVQ displays a lot of interesting attributes provided by Linux. Please explore the BVQ Linux table view favorites to get an impression of the most important attributes.

Performance and error statistics

The BVQ Agent for Linux collects many performance statistics using njmon.

Typical questions around performance, load and utilization are answered by BVQ:

  • Is the LPAR performance negatively affected by the SAN congestion or storage latencies?
  • Which LPAR puts a heavy load on a storage volume?
  • Does a latency peak have influence on the performance of an LPAR?
  • Does LPAR load generate latency peaks in storage
  • Is the LPAR or host performance affected by slow drainers in the SAN?
  • How is the LPAR connected to the SAN?

Predefined Favorites

Specific Table view type favorites are available for Linux and AIX. General overview favorites are also defined and can be found in the menu path: Favorites / System / OS / Generic See below in the "Generic OS model" section for more information

Generic OS model

Brief description

On top of the specific AIX and Linux OS models a third model - the Generic OS model -  is defined in BVQ which links generic objects, attributes and performance statistics to the specific models. The generic model concept was introduced in BVQ version 6.2. It provides an unified view to essential configuration and statistical data to the user. With that, you do not need to take care about the OS type of your VM or LPAR. Now you can define abstract dashboards, reports, alert rules or BVQ Expert GUI Favorites for the Generic OS model. All links and relations between generic objects and others are based on the specific models below it.

Object types

A bunch of object types come along with the Generic BVQ OS model. The following table provides an overview of those object types, their meaning and relation to the Linux or AIX model object types and whether they offer performance information:

Group

BVQ Name

DescriptionAIX OTLinux OT

Perf?

SystemOS InstanceInstance of an operating system running on a physical or virtual machine.AIX InstanceLinux Instance(tick)
OS Physical CPUPhysical CPU core reported by the OS. Such can be real physical or virtualized by hypervisors.AIX vCPULinux vCPU(error)
OS Logical CPULogical CPU running as a thread on a multi threaded physical CPU core.AIX lCPULinux lCPU(tick)
NetworkOS Network adapterLocal area network adapter (usually Ethernet) device.AIX Network adapterLinux Network adapter(tick)
StorageOS VolumeBlock storage disk/volume device provisioned by a integrated or external storage system attached via storage adapter.AIX DiskLinux Disk(tick)
OS Volume path

Path between a Storage disk/volume device and a storage subsystem target port mapped to a storage adapter.

AIX Disk pathLinux Disk path(error)
OS Storage adapter

Host bus adapter device to access storage devices via FC, SCSI, NVMe or other block based protocols.

AIX Storage adapterLinux Storage adapter(tick)
OS FilesystemFilesystem mounted from a block storage volume or network file system.AIX FilesystemLinux Filesystem(tick)
OS Swap spaceSwap or Paging space to expand physical memory to virtual memory.AIX Paging spaceLinux Swap space(tick)

Predefined Alert Rules

CategoryAlert ruleSeverityDescription
Configuration BVQ OS Agent version 

MEDIUM

OS Agent version is incompatible to BVQ Scanner version

SUGGESTED ACTION:
Please go to "Scanner" and locate the OS scanner configuration. Click on "Edit" and download the agent installation package.
Use this package to upgrade all OS Instances which are not running the latest BVQ Agent version.

OS Volume path redundancy check

MEDIUM

OS Volume path not redundant.

SUGGESTED ACTION:

Typically, 4 or 8 paths per LUN are optimal for performance and redundancy. There are circumstances where the maximum of 16 paths should be configured.

OS Filesystem inodes usage

HIGH


Capacity OS Filesystem usage 

HIGH

Filesystem is running out of space!

SUGGESTED ACTION:
Delete files or increase the filesystem.

 OS Swap space usage 

HIGH

Paging or swap space is running out of space!

SUGGESTED ACTION:
Check whether memory usage of running processes can be reduced or increase the swap space.

Performance OS Volume too busy 

HIGH

OS Volume (Disk) is very busy!

SUGGESTED ACTION:
The queue depth of the OS Volume might need to changed or workload should be reduced.

 OS Run queue too high 

HIGH

There are insufficient computing resources to handle the workload.

SUGGESTED ACTION:
Check or change the workload or adjust the number of threads per physical CPU. 

SCSI queue depth too low

HIGH

Configured OS Volume SCSI queue depth too low.

SUGGESTED ACTION:
The SCSI queue depth is a configurable attribute per disk/volume.
Setting it too low results in a lower bandwidth than possible.
Setting it too high contains the risk of overflooding the target storage device which responds with latency increasing queue full messages.

Set the queue depth according to the vendor documentation of the attached storage device.

OS Network adapter utilization

HIGH

Network adapter with high utilization detected.

SUGGESTED ACTION:
Check and reduce adapter workload or redistribute workload to other network adapters.

OS Volume high latency

MEDIUM

Volume is responding slowly. This might lead to performance degradation or system instability.

SUGGESTED ACTION:
Please check the corresponding volume and storage system to solve the problem.

Predefined Web Dashboards

DashboardHighlights
PowerVM - LPAR/OS end to end performanceComprehensive view across all layers from OS Instance (Linux or AIX) LPAR with essential CPU, memory, storage, and network resource consumption down to its PowerVM System, attached VIOS, SAN and storage systems.

BVQ Expert GUI representation

All objects, attributes and statistics of this model can be interactively browsed within the BVQ Expert GUI. 

Where to find Generic OS objects?

Similar to all other objects, they show up in the Path browser inside the property panel of the BVQ Expert GUI or can be opened as a Table view from the Favorite menu. Several Generic OS objects can be combined in an end to end relation to objects of other platforms.

Path browserTable Views

Object attributes

BVQ displays a lot of interesting attributes provided by an OS. Please explore the BVQ Generic OS table view favorites to get an impression of the most important attributes.

Performance and error statistics

The BVQ Agent for Linux or AIX collects many performance statistics using njmon.

Typical questions around performance, load and utilization are answered by BVQ:

  • Is the LPAR performance negatively affected by the SAN congestion or storage latencies?
  • Which LPAR puts a heavy load on a storage volume?
  • Does an latency peak have influence on the performance of an LPAR?
  • Does LPAR load generate latency peaks in storage
  • Is the LPAR or host performance affected by slow drainers in the SAN?
  • How is the LPAR connected to the SAN?

Predefined Favorites

Generic OS Table view type favorites are available. General overview favorites are also defined and can be found in the menu path: "Favorites / System / OS / Generic":


To view performance statistics, we recommend a first look at the Favorite "General OS performance overview". Select one or multiple OS Instance objects, and refresh the view by menu entry Window > Refresh.

AIX OS model refinements

To be able to adapt the existing AIX OS model to the Generic OS model, we had to change some Object and Attribute definitions. These changes could break compatibility to your self defined dashboards, reports, alert rules, GUI favorites or external applications using the BVQ REST API, It should be easy to adapt these to the new model. Please do not hesitate to ask our BVQ support team for assistance. The following table provides an overview across the most remarkable changes:

ChangeNotes

Removed OT "AIX System"

This was a duplicate of the already existing PVM LPAR, PVM VIOS or PVM Partition OT.
Pleas use these to address the attributes and stats of the AIX System OT.

Joined OT "AIX Storage adapter"

The previous OT "AIX Disk adapter" and "AIX FC adapter" were joined into "AIX Storage adapter".

Joined OT "AIX Network adapter"

The previous OT "AIX Network interface" was joined into "AIX Network adapter". That implies the restriction, that an Ethernet adapter, e.g. "ent0" may only provide one type of interface, either "en0" or the rarely used "et0". Both at the same time are not supported. 

New OT "AIX Physical volume"

That adds support for configurations, where an AIX Disk (hdisk) is not used as an LVM PV. All allocated / unallocated capacity attributes from AIX Disk are moved to this OT.

OT "AIX Disk path" now visible

This OT was enriched with additional attributes. With that it made sense to show these objects in BVQ.

Attribute key ID & name changesMany attribute names and key names have been renamed to represent a generic name without "aix_" prefix.
Performance stat key ID changes

Most of the performance key IDs are unchanged. Exceptions are:

AIX Instance:

  • kernel_run_queue_per_occurrence > KERNEL_THREADS_RUNNING_AVG
  • kernel_proc_iowait > KERNEL_THREADS_BLOCKED_CUR
  • mem_paging_space_pages_in > MEM_PAGING_PAGE_INS
  • mem_paging_space_pages_out > MEM_PAGING_PAGE_OUTS
"AIX Instance" additional Attributes

os_build_date, boot_time, vendor, mtm, serial_number, processor_type, processor_speed

"AIX Filesystem" additional Attributes 

fs_nfs_remote_host, fs_nfs_remote_path

"AIX Disk" additional Attributes 

device_path, mtm, volume_queue_depth, generic_status

"AIX Disk path" additional Attributes 

hostmap_lun, fc_wwnn, fc_nportid

"AIX Storage adapter" additional Attributes 

fc_wwnn, port_speed_connected_rate

"AIX Network adapter" additional Attributes 

port_speed_connected_rate, generic_status

Predefined Web Dashboards

DashboardHighlights
PowerVM - LPAR/AIX end to end performanceComprehensive view across all layers from AIX Instance LPAR with essential CPU, memory, storage, and network resource consumption down to its PowerVM System, attached VIOS, SAN and storage systems.

Alerting: Occurrence counter

Up to this release of BVQ, a custom alert rule changed its state as soon as condition was violated. But often, a single violation of a threshold is not critical and only multiple violations within a certain timeframe indicate a real issue.

To be able to express such violations in a custom alert rule, the BVQ alerting has been enhanced by a new feature called "Occurrence counter".

An Occurrence counter allows you to define multiple counts violation for each condition and desired Alert level.

By default, Occurrence counters are disabled. To enable them, you need turn on the "USE OCCURRENCE COUNTER" switch and to add one or more Occurrence counter definitions to your BVQ Alert condition:

After addition, choose a type and fill in the mandatory information:

AttributeValuesDescription
TypeType of this
Occurrence counter

Two Occurrence counter types can be selected:

  • Violations per time - define how often in a certain timeframe a condition must match until the condition counts as violated
  • Consecutive violations - define many consecutive violations in a row must occur until the condition counts as violated
TimeMinutesOnly available for type "Violations per time" if SLA mode is turned off. It contains the width of the sliding window timeframe in minutes.
AmountNumber of violationsMaximum number of violation occurrences that may be counted until the Alert level is raised.
Alert levelBVQ Alert level

Desired Alert level, one of: OK, INFO, WARN, ERROR, UNKNOWN

Save the alert condition after you are done.

You can add multiple Occurrence counters for to each Condition. That is useful to define a multiple Alert levels with separate counts and to have the ability to combine "per time" with "consecutive" violations. You can reorder their sequence by dragging them at the desired position. As soon as one of the upper ones are matched, the others below are no longer evaluated, so keep the largest counts on top of the list.


SLA timing mode for BVQ Occurrence counter

Another feature of the BVQ Alerting Occurrence counter is the Service Level Agreement (SLA) timing mode. If you need to prove that a given condition is not violated in a fixed timeframe, the SLA mode is what you are looking for.

Differences between Standard timing and SLA timing mode for a BVQ Occurrence counter:

Standard timing modeSLA timing mode
Intended to be used for normal monitoring purposesIntended to prove the health of service level objectives.
Allows to define a separate Timing for each Occurrence counter definitionForces all Occurrence counter definitions to use the configured SLA timing
Uses the timing for a sliding windowUses the timing for fixed window, aligned to Monday 00:00
Reset to default Alert level (typ. OK) when the measurements in the sliding window are below the counts of all Occurrence counter definitions.Reset to default Alert level (typ. OK) at each start of the SLA interval fixed window.


Advanced example: You need to attest, that a group of storage volumes (SVC VDisks) are kept a complex set of boundaries: Main condition: 5min average latencies > 20ms not more often than 40 times in the last day. Further constraints: The latency is valid for this rule, if the average IO rate, transfer size, Cache hit > 30% and R/W ratio are in defined boundaries.

BVQ Alert rule
NameDaily SVC VDisk SLA violated
Perfomance indicator timing5 minutes
SLA interval1 day
AR Condition (simplified)Latency > 20ms AND IO rate > 100IO/s AND Transfer size < 8k AND Cache % hit > 30% AND R% > 50%
1. AR Condition occurrence counter

ERROR

40 times per SLA interval
2. AR Condition occurrence counter

WARN

20 times per SLA interval

3. AR Condition occurrence counter

INFO

10 times per SLA interval
4. AR Condition occurrence counter

INFO

5 times in a row per SLA interval

While the normal occurrence counter timing is based on a shorter sliding window (some minutes) the SLA timing mode is based on a fixed window to assess the state of a custom alert rule for a defined larger timeframe (hours, days weeks). The SLA timing fixed window is aligned to Monday 00:00. At the start of each SLA interval, the Alert level is switched back to the default level (OK) and as soon as the occurrence count is met, the Alert level is raised to the level configured in the occurrence counter. Regarding the example above we could get:

  • 00:00 → set to OK
  • 12:00 → raised to INFO directly when occurrence is = 10
  • 14:00 → raised to WARN at 20 occurrences
  • 16:00 → raised to ERROR at 40 occurrences
  • 00:00 next day → reset to OK
  • ...

If SLA timing mode is enabled, an occurrence counter must be defined in each Alert condition and all occurrence counters are restricted to use the SLA timing window instead of the individual ones.

By default, SLA mode is disabled. To enable it, change the SLA INTERVAL from "SLA mode not enabled" to the desired timeframe.

Alerting: Which objects violate an Alert rule most frequently?

If you are interested to know which objects most often violate an alert rule in a given timeframe, BVQ now offers a new table to represent this. To see that, drill down to the Results of a single Alert rule in the BVQ Web GUI (Alerting > Results > of Alert rule). Find the new table "Historically matched objects" here:

Table shows the 500 most often rule violating objects of your rule in the timeframe given by the configured "Chart timing" field of the "Filter" panel above the table.

Grafana: Alert result tables

The BVQ Alerting is one of the most valuable features to get an overview of the monitored system. From now on it ist possible to show the results known from the BVQ SHM also in Grafana.

We introduced a new query mode to the BVQ Plugin as "Alert result table" with different options to control the appearance. As Grafana does not have sunburst visualization the result are shown in tabular format.


SHM Basics

  • The BVQ SHM has 4 different views which differs how alert results are grouped and aggregated
  • Each of this views is based on a chart with multiple layers.
  • Each layer represents a different aggregation of alert rules
  • Each view can be filtered by system


 Query

It is possible to select different queries which results in different representations of the results. This reflects the different representations available in the BVQ-Server.

Result mode

The result mode defines the layers shown requested from the server. Layers are defined as the different rings in a SHM.

Show aggregated resultsAll levels are shown from the selected entry level down to the alert rule.
Show aggregated results for layerOnly the selected level is shown.
Show alert rulesJust the alert rules are shown grouped by the selected level. Higher levels are not shown.

Systems

To further filter down the results, it is possible to filter them by BVQ Systems. Therefor it is necessary to define a Grafana template variable to be used in this field.

Layer selection

The last option is more dynamic based on the selected query. It provides an option to specify the entry layer you want to enter the hierarchy. 

Examples

The examples will show the SHM in a view displayed in Grafana.

Miscellaneous

User experience evaluation with matomo

In order to optimize and enhance the user experience of the BVQ Web GUI we are now asking the user for permission to collect usage data. Of course, the data collected is strictly anonymous! BVQ is using a tool called which is locally hosted within SVA GmbH.
Please agree and choose to support us and activate this feature by enabling "Send usage data". 

Please note: BVQ will enter maintenance mode after every restart as long as this option is disabled!

Changing the option is possible at any time in the user settings:

BVQ Editor in WebUI

The Expert GUI can be used to qualify BVQ objects. Part of this functionality is now available in the Web UI as well which makes modification of multiple objects much easier and faster. In the main menu in the Web UI, you can find a new section "Editor". Go to "Room / Site" to add, edit or delete room and site assignments or go to "Controller" to do the same for controller merge objects.

Drag & drop editors to assign Platform HW Objects to BVQ Room, Site and SVC Controllers to BVQ Controller groups

Aspects for all chartable attributes in Expert GUI

Objects in BVQ Treemaps can be sized using aspects. In previous BVQ releases only selected attributes and all performance metrics were available for aspects. In this release of BVQ, all chartable attributes can be chosen for aspects. You can select them below the "By attribute" entry in the Aspect menu:

Reporting Enhancements

The usability of the reporting template editor has been improved with the following new features:

  1. Header levels for table of content can be defined
  2. Snippets show content excerpt in the header for better clarity
  3. Table snippets can show a result row at the bottom


Grafana Dashboards

Several dashboards for new and existing platforms have been added and existing dashboards have been reworked and improved to utilize the new features in Grafana 8.

Requirements and restrictions 

Requirements of the HW/SW environment

Please see Supported Environments

Minimum BVQ version required for an update

2021.H2

Known Issues

See https://customercenter.sva.de/home/x/NwwgAw
(support agreement needed to get access credentials)

  • No labels