Scanner Issues
Detecting a scanner problem is quite easy because the affected scanners are listed on top of the scanner homepage in the WebUI. But finding the root cause for the failure can be more difficult. Here are some tipps how to troubleshoot scanner issues:
...
Critical configurations are listed on top of the page. There There are two buttons for each scan category (ToplogyTopology, Performance, Event, HFTCS): The left one shows the scan status, the right one the persist status.
Click on a red icon to see the log file for the selected item.
Set severity to "Error" to limit the amount of information.
In most cases, the error can now be narrowed down to either a configuration issue or a product defect:
...
Expand | ||
---|---|---|
| ||
Expand | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||
Expand | ||||||||||||||||
|
No Format |
---|
Got error from Brocade Switch with IP 10.10.101.147! (body: {
"errors": {
"error": [
{
"error-type": "application",
"error-tag": "operation-failed",
"error-app-tag": "Error",
"error-path": "/rest/login",
"error-message": "Max limit for REST sessions reached",
"error-info": {
"error-code": 14,
"error-module": "auth"
} |
BVQ Version: 6.2 and above
Suggested Action:
By default, the SSH session limits on Brocade switches is set to 3. The number of SSH sessions can be increased up to 10 by using CLI command mgmtapp --config -maxrestsession <1...10>
Error Message: HTTP-Error 500 "Unable to connect to Database"
No Format |
---|
2021-04-08T09:17:37,383 ERROR [topology-scan_Worker-1]: Error executing call to [/rest/api/uom/ManagedSystem/<system_id>/<object_type>], got status [500 INTERNAL_SERVER_ERROR] with message [
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:ns2="http://a9.com/-/spec/opensearch/1.1/" xmlns:ns3="http://www.w3.org/1999/xhtml">
<id>5fc9dc1c-03be-4e80-b8fb-4f9ea9dc0b4a</id>
<title>HttpErrorResponse</title>
<published>2021-04-08T09:17:46.640+02:00</published>
<author>
<name>IBM Power Systems Management Console</name>
</author>
<content type="application/vnd.ibm.powervm.web+xml; type=HttpErrorResponse">
<HttpErrorResponse:HttpErrorResponse xmlns:HttpErrorResponse="http://www.ibm.com/xmlns/systems/power/firmware/web/mc/2012_10/" xmlns="http://www.ibm.com/xmlns/systems/power/firmware/web/mc/2012_10/" xmlns:ns2="http://www.w3.org/XML/1998/namespace/k2" schemaVersion="V1_0">
<Metadata>
<Atom/>
</Metadata>
<HTTPStatus kb="ROR" kxe="false">500</HTTPStatus>
<RequestURI kxe="false" kb="ROR">/rest/api/uom/ManagedSystem/3870ebf9-dd78-31e5-a91e-719c4f86178b/NetworkBridge</RequestURI>
<ReasonCode kb="ROR" kxe="false">Unknown internal error.</ReasonCode>
<Message kb="ROO" kxe="false">com.ibm.pmc.rest.provider.exceptions.RESTProviderException: Exception While getting SEA:: Unable to connect to Database. </Message>
<RequestBody kb="ROO" kxe="false"/>
<RequestHeaders kxe="false" kb="ROO">
{x-forwarded-server=hmc101.localdomain, x-forwarded-host=172.16.146.192:12443, X-Transaction-ID=XT10011179, host=172.16.146.192:12443, connection=Keep-Alive, x-api-session=9PLm73Uh1Zg84V-wivc1TrsRpNLdrV-VbWeDWSfBEheHxNiuwGEgl_0OWwcLpFlVDaydA2z37DWTvHLMo9McsEwN9X_at8GfE5_ayfeF9Qjzf2EGlzYxJW0G4BzEaznehGmjRR7GfsY3ktUmcte6LT_-JHq5tCqUzfx4nVz6E9wu4dbG2Bo-DhJnkH81b1HLqpjCZyPQlAHf1fqpFOFWr29_7xq-GN_J5tiE1zwSXvY=, x-forwarded-for=172.16.168.84, accept-encoding=gzip,deflate, accept=application/vnd.ibm.powervm.uom+xml;, user-agent=Apache-HttpClient/4.5.12 (Java/11)}
</RequestHeaders>
</HttpErrorResponse:HttpErrorResponse>
</content>
</entry>
] (PowerVmClient) |
BVQ Version: 2021.H1.3 and above
Suggested Action:
The Postgres DB on VIO servers which the HMC queries to get various system information (like virtual networks, virtual storage, etc.) is broken or Postgres service (vio daemon) is not running. This causes HTTP-500 error messages on both, the HMC and BVQ scanner.
The issue can be fixed using the following commands on the VIOS:
No Format |
---|
ssh padmin@${vios}
$ oem_setup_env
# stopsrc -s vio_daemon
# /usr/sbin/slibclean
# rm -rf /home/ios/CM
# startsrc -s vio_daemon -a '-d 4' |
Error Message: HTTP-Error 500 "Exception while getting SEA"
HMC responding slowly to REST API calls
Typically, REST API calls to the HMC should be answered within ms up to a few seconds (depending on the amount of data that is requested by the command). If the execution of those calls takes much longer resulting in unacceptably increasing performance scan durations, the problem might be caused by a bug in the IBM HMC.
No Format |
---|
2023-06-30T12:07:51,354 INFO [performance-scan_Worker-7]: Execution of command [ScanPowerVmPerformanceData] took [1204342]ms [HMC_xxx] [PerfScan] (RawCommandExecutorImpl) |
BVQ Version: any
Suggested Action:
Upgrade the HMC to V10R2M1040 + iFix MF71107 or V10R2M1041 or higher.
Important: Upgrading to this level will only resolve the issue if performance collection on the HMC was disabled prior to the upgrade. Otherwise one of the following two methods is required:
- Reinstall the HMC
- Request the pesh password from IBM and reset Postgres DB on the HMC. All prior collected performance stats will be deleted by this reset procedure. Note: An IBM maintenance contract is required to get the pesh password:
No Format |
---|
### Postgres DB reset after V10R2M1031 installation
# Just for a better feeling ;-)
hmcshutdown -t now -r
# As hscroot, if possible save PCM data
saveupgdata -r disksftp -h <IP> -u <USER> -d <Directory> -i perfmon[,netcfg] --migrate
# As hscpe
pesh <HMC_Serial>
and then enter the "password of the day"
su -
#Insert root password, you can change it before, as hscroot do a chhmcusr -u root -t passwd
# yes twice! This resets the postgres DB - all PCM data will be lost
/opt/hsc/bin/hscSignal 511
/opt/hsc/bin/hscSignal 511
exit
hmcshutdown -t now -r
# Once again, it's another bug :-(
pesh <HMC_Serial>
and then enter the "password of the day"
su -
/opt/hsc/bin/hscSignal 511
/opt/hsc/bin/hscSignal 511
exit
hmcshutdown -t now -r
# After reboot, enable PCM data collection again
# Not tested yet, restore PCM data
rstupgdata -r sftp -h <IP> -u <USER> -d <Directory> --migrate
|
Error Message: HTTP-Error 500 "Unable to connect to Database" during topo scan
No Format |
---|
2021-04-08T09:17:37,383 ERROR [topology-scan_Worker-61]: Error executing call to [/rest/api/uom/ManagedSystem/b8f44367-98bd-377d-8227-7db6208f1c4c/NetworkBridge<system_id>/<object_type>], got status [500 INTERNAL_SERVER_ERROR] with message [ <entry xmlns="http://www.w3.org/2005/Atom" xmlns:ns2="http://a9.com/-/spec/opensearch/1.1/" xmlns:ns3="http://www.w3.org/1999/xhtml"> <id>7af9ea16<id>5fc9dc1c-352a03be-4ab14e80-890fb8fb-1e24405102e7<4f9ea9dc0b4a</id> <title>HttpErrorResponse</title> <published>2021-0504-18T1908T09:3317:2746.285640+02:00</published> <author> <name>IBM Power Systems Management Console</name> </author> <content type="application/vnd.ibm.powervm.web+xml; type=HttpErrorResponse"> <HttpErrorResponse:HttpErrorResponse xmlns:HttpErrorResponse="http://www.ibm.com/xmlns/systems/power/firmware/web/mc/2012_10/" xmlns="http://www.ibm.com/xmlns/systems/power/firmware/web/mc/2012_10/" xmlns:ns2="http://www.w3.org/XML/1998/namespace/k2" schemaVersion="V1_0"> <Metadata> <Atom/> </Metadata> <HTTPStatus kxekb="falseROR" kbkxe="RORfalse">500</HTTPStatus> <RequestURI kbkxe="RORfalse" kxekb="falseROR">/rest/api/uom/ManagedSystem/b8f443673870ebf9-98bddd78-377d31e5-8227a91e-7db6208f1c4c719c4f86178b/NetworkBridge</RequestURI> <ReasonCode kb="ROR" kxe="false">Unknown internal error.</ReasonCode> <Message kb="ROO" kxe="false">com.ibm.pmc.rest.provider.exceptions.RESTProviderException: Exception While getting SEA:: TheUnable system is currently too busyto connect to complete the specified request. Please retry the operation at a later time. If the operation continues to fail, check the error log to see if the filesystem is full. </Message> <RequestBody kb="ROO" kxe="false"/> <RequestHeaders kxe="false" kb="ROO">{x-forwarded-server=hmc3.labwi.sva.de, x-forwarded-host=hmc3.labwi.sva.de:12443, X-Transaction-ID=XT11438532, host=hmc3.labwi.sva.de:12443, connection=Keep-Alive, x-api-session=xnqKwOuPE9APo0rubodXReDeoXN2SnlAXIeEzEu4guge1pd6sg4oCF0WlAE94qpB7NjiX5q8L5xHLJMUuS4LWqRvxopaTucnrOqa6TACGCWhAMYJ4DekkrJtxlpM_s0GkNkoerZ5JSvutYojiYro9N2TNortma44FydeyORKQF260PAUjI2SLytd10mS8PJTpb9uzkbo6h0P0quXOSqRXg==, x-forwarded-for=10.10.120.73, accept-encoding=gzip,deflate, accept=application/vnd.ibm.powervm.uom+xml;, user-agent=Apache-HttpClient/4.5.12 (Java/11)}</RequestHeaders> </HttpErrorResponse:HttpErrorResponse> </content> </entry> ] (PowerVmClient) |
BVQ Version: 2021.H1.3 and above
Suggested Action:
Different problems can result in this error. One reason might be a full filesystem (as the error message itself suggests). Another reason might be a dodgy SEA adapter. Please run command
No Format |
---|
entstat -all entX |
to see if there are Limbo Packets which indicate that the SEA has detected its physical network is not operational.
Error Message: "ObjectNotValidException"
No Format |
---|
2021-07-19T15:43:45,609 ERROR [persistExecutor_3]: Error during command execution: Error during powervm topology persist execution! [TopoPersist] (AbstractCommandExecutor)
de.sva.bvq.data.grid.api.exception.ObjectNotValidException: Create operation not valid for object of type [pvm_physical_volume_to_virtual_io_server]! pvm_physical_volume_location_code: [[[pvm_physical_volume_location_code] must not be null!][[pvm_physical_volume_location_code] is primary key (not auto generated) and must be set!]] |
BVQ Version: 2021.H1.3 and above
Suggested Action:
Information provided by AIX and sent via REST do not match (In this case it was an hdisk which looked fine from AIX point of view but reported no location code when queried via REST API).
Root cause is not understood but rebooting the VIOS servers - one after the other - probably fixes the issue.
Network
BVQ Version: 2021.H1.3 and above Suggested Action: The Postgres DB on VIO servers which the HMC queries to get various system information (like virtual networks, virtual storage, etc.) is broken or Postgres service (vio daemon) is not running. This causes HTTP-500 error messages on both, the HMC and BVQ scanner. The issue can be fixed using the following commands on the VIOS:
Error Message: HTTP-Error 500 "Exception while getting SEA" during topo scan
BVQ Version: 2021.H1.3 and above Suggested Action: Different problems can result in this error. One reason might be a full filesystem (as the error message itself suggests). Another reason might be a dodgy SEA adapter. Please run command
to see if there are Limbo Packets which indicate that the SEA has detected its physical network is not operational. Error Message: "ObjectNotValidException" during topo persist
BVQ Version: 2021.H1.3 and above Suggested Action: Information provided by AIX and sent via REST do not match (In this case it was an hdisk which looked fine from AIX point of view but reported no location code when queried via REST API). Root cause is not understood but rebooting the VIOS servers - one after the other - probably fixes the issue. Error Message: "NullPointerException" during topo persist
BVQ Version: 2021.H1.3 and above Suggested Action: This problem can be caused by a corrupt CMDB on a VIO server. There is a script available from IBM to clean up the CMDB which probably fixes the issue. Error Message: "DuplicateKeyException" during topo persist
BVQ Version: 2022.H1 and above Suggested Action: This problem can be caused by a corrupt CMDB on a VIO server. There is a script available from IBM to clean up the CMDB which probably fixes the issue. |
Network
Expand | ||
---|---|---|
| ||
Error message: "Max limit for REST sessions reached"
BVQ Version: 6.2 and above Suggested Action: By default, the SSH session limits on Brocade switches is set to 3. The number of SSH sessions can be increased up to 10 by using CLI command Brocade scanner fails with "authentication failed" Due to a bug in FOS version 8.2.3a and 8.2.3a1, communication via SNMP and REST APIs is broken after an update causing Brocade REST scanners to fail. Root cause is a file descriptor leak that occurs in weblinker during LDAP authentication. Once all file descriptors are consumed a verify error is logged. This also results in webtools authentication failing, causing SANnav, BNA or BVQ to be unable to authenticate with the switch. BVQ Version: any Suggested Action: Workaround Final Fix Error message: "E11000 duplicate key error collection: bvq.dgx_brocade_rule" There is a known Brocade bug which leads to duplicate brocade rule entries. See Brocade documentation FOS-845272. This bug is fixed in Brocade FOS 9.1.1.c, 9.2.0 or higher. BVQ Version: 2023.H1.2 Suggested Action: There are two ways to resolve this issue:
|
Expand | ||
---|---|---|
| ||
Storage
Expand | ||||
---|---|---|---|---|
| ||||
Error Message: "rbash: line xxx: yyy Killed"
BVQ Version: all Suggested Action: None. This is an SVC limitation. The error typically disappears the next time the system is scanned. Error Message: "the svc raised an error -> CMMVC6098"
BVQ Version: all Suggested Action: None. The error occurs when SVC is busy , e.g. copying files between nodes. The error typically disappears the next time the system is scanned. |
Expand | ||
---|---|---|
| ||
Expand | ||
---|---|---|
| ||