Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Scanner Issues

...

Expand
titleVMware


Expand
titleIBM PowerVM

HMC responding slowly to REST API calls

Typically, REST API calls to the HMC should be answered within ms up to a few seconds (depending on the amount of data that is requested by the command). If the execution of those calls takes much longer resulting in unacceptably increasing performance scan durations, the problem might be caused by a bug in the IBM HMC.

No Format
2023-06-30T12:07:51,354 INFO  [performance-scan_Worker-7]: Execution of command [ScanPowerVmPerformanceData] took [1204342]ms [HMC_xxx] [PerfScan] (RawCommandExecutorImpl)

BVQ Version: any

Suggested Action:

Upgrade the HMC to V10R2 M1031 or higher.

Important: Upgrading to this level will only resolve the issue if performance collection on the HMC was disabled prior to the upgrade. Otherwise one of the following two methods is required:

  1. Reinstall the HMC
  2. Request the pesh password from IBM and reset Postgress DB on the HMC. Note: An IBM maintenance contract is required to get the pesh password:
No Format
### Postgres DB reset after V10R2M1031 installation
 
# Just for  a better feeling ;-)
hmcshutdown -t now -r
 
# As hscroot, if possible save PCM data
saveupgdata -r disksftp -h <IP> -u <USER> -d <Directory> -i perfmon[,netcfg] --migrate
 
# As hscpe
pesh <HMC_Serial>
and then enter the "password of the day"  
 
su - 
#Insert root password, you can change it before, as hscroot do a chhmcusr -u root -t passwd
 
# yes twice! This resets the postgres DB - all PCM data will be lost
/opt/hsc/bin/hscSignal 511
/opt/hsc/bin/hscSignal 511
 
exit
 
hmcshutdown -t now -r
 
# Once again, it's another bug :-(
pesh <HMC_Serial>
and then enter the "password of the day"  
 
su -
 
/opt/hsc/bin/hscSignal 511
/opt/hsc/bin/hscSignal 511
 
exit
 
hmcshutdown -t now -r
 
# After reboot, enable PCM data collection again
 
# Not tested yet, restore PCM data
 
rstupgdata -r sftp -h <IP> -u <USER> -d <Directory> --migrate 



Error Message: HTTP-Error 500 "Unable to connect to Database" during topo scan

No Format
2021-04-08T09:17:37,383 ERROR [topology-scan_Worker-1]: Error executing call to [/rest/api/uom/ManagedSystem/<system_id>/<object_type>], got status [500 INTERNAL_SERVER_ERROR] with message [
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:ns2="http://a9.com/-/spec/opensearch/1.1/" xmlns:ns3="http://www.w3.org/1999/xhtml">
<id>5fc9dc1c-03be-4e80-b8fb-4f9ea9dc0b4a</id>
<title>HttpErrorResponse</title>
<published>2021-04-08T09:17:46.640+02:00</published>
<author>
<name>IBM Power Systems Management Console</name>
</author>
<content type="application/vnd.ibm.powervm.web+xml; type=HttpErrorResponse">
<HttpErrorResponse:HttpErrorResponse xmlns:HttpErrorResponse="http://www.ibm.com/xmlns/systems/power/firmware/web/mc/2012_10/" xmlns="http://www.ibm.com/xmlns/systems/power/firmware/web/mc/2012_10/" xmlns:ns2="http://www.w3.org/XML/1998/namespace/k2" schemaVersion="V1_0">
<Metadata>
<Atom/>
</Metadata>
<HTTPStatus kb="ROR" kxe="false">500</HTTPStatus>
<RequestURI kxe="false" kb="ROR">/rest/api/uom/ManagedSystem/3870ebf9-dd78-31e5-a91e-719c4f86178b/NetworkBridge</RequestURI>
<ReasonCode kb="ROR" kxe="false">Unknown internal error.</ReasonCode>
<Message kb="ROO" kxe="false">com.ibm.pmc.rest.provider.exceptions.RESTProviderException: Exception While getting SEA:: Unable to connect to Database. </Message>
<RequestBody kb="ROO" kxe="false"/>
<RequestHeaders kxe="false" kb="ROO">
{x-forwarded-server=hmc101.localdomain, x-forwarded-host=172.16.146.192:12443, X-Transaction-ID=XT10011179, host=172.16.146.192:12443, connection=Keep-Alive, x-api-session=9PLm73Uh1Zg84V-wivc1TrsRpNLdrV-VbWeDWSfBEheHxNiuwGEgl_0OWwcLpFlVDaydA2z37DWTvHLMo9McsEwN9X_at8GfE5_ayfeF9Qjzf2EGlzYxJW0G4BzEaznehGmjRR7GfsY3ktUmcte6LT_-JHq5tCqUzfx4nVz6E9wu4dbG2Bo-DhJnkH81b1HLqpjCZyPQlAHf1fqpFOFWr29_7xq-GN_J5tiE1zwSXvY=, x-forwarded-for=172.16.168.84, accept-encoding=gzip,deflate, accept=application/vnd.ibm.powervm.uom+xml;, user-agent=Apache-HttpClient/4.5.12 (Java/11)}
</RequestHeaders>
</HttpErrorResponse:HttpErrorResponse>
</content>
</entry>
] (PowerVmClient)

BVQ Version: 2021.H1.3 and above

Suggested Action:

The Postgres DB on VIO servers which the HMC queries to get various system information (like virtual networks, virtual storage, etc.) is broken or Postgres service (vio daemon) is not running. This causes HTTP-500 error messages on both, the HMC and BVQ scanner.

The issue can be fixed using the following commands on the VIOS:

No Format
ssh padmin@${​​​​​​​​vios}​​​​​​​​
$ oem_setup_env
# stopsrc -s vio_daemon
# /usr/sbin/slibclean
# rm -rf /home/ios/CM
# startsrc -s vio_daemon -a '-d 4'



Error Message: HTTP-Error 500 "Exception while getting SEA" during topo scan

No Format
2021-05-18T19:33:27,290 ERROR [topology-scan_Worker-6]: Error executing call to [/rest/api/uom/ManagedSystem/b8f44367-98bd-377d-8227-7db6208f1c4c/NetworkBridge], got status [500 INTERNAL_SERVER_ERROR] with message [
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:ns2="http://a9.com/-/spec/opensearch/1.1/" xmlns:ns3="http://www.w3.org/1999/xhtml">
<id>7af9ea16-352a-4ab1-890f-1e24405102e7</id>
<title>HttpErrorResponse</title>
<published>2021-05-18T19:33:27.285+02:00</published>
<author>
<name>IBM Power Systems Management Console</name>
</author>
<content type="application/vnd.ibm.powervm.web+xml; type=HttpErrorResponse">
<HttpErrorResponse:HttpErrorResponse xmlns:HttpErrorResponse="http://www.ibm.com/xmlns/systems/power/firmware/web/mc/2012_10/" xmlns="http://www.ibm.com/xmlns/systems/power/firmware/web/mc/2012_10/" xmlns:ns2="http://www.w3.org/XML/1998/namespace/k2" schemaVersion="V1_0">
<Metadata>
<Atom/>
</Metadata>
<HTTPStatus kxe="false" kb="ROR">500</HTTPStatus>
<RequestURI kb="ROR" kxe="false">/rest/api/uom/ManagedSystem/b8f44367-98bd-377d-8227-7db6208f1c4c/NetworkBridge</RequestURI>
<ReasonCode kb="ROR" kxe="false">Unknown internal error.</ReasonCode>
<Message kb="ROO" kxe="false">com.ibm.pmc.rest.provider.exceptions.RESTProviderException: Exception While getting SEA:: The system is currently too busy to complete the specified request. Please retry the operation at a later time. If the operation continues to fail, check the error log to see if the filesystem is full. </Message>
<RequestBody kb="ROO" kxe="false"/>
<RequestHeaders kxe="false" kb="ROO">{x-forwarded-server=hmc3.labwi.sva.de, x-forwarded-host=hmc3.labwi.sva.de:12443, X-Transaction-ID=XT11438532, host=hmc3.labwi.sva.de:12443, connection=Keep-Alive, x-api-session=xnqKwOuPE9APo0rubodXReDeoXN2SnlAXIeEzEu4guge1pd6sg4oCF0WlAE94qpB7NjiX5q8L5xHLJMUuS4LWqRvxopaTucnrOqa6TACGCWhAMYJ4DekkrJtxlpM_s0GkNkoerZ5JSvutYojiYro9N2TNortma44FydeyORKQF260PAUjI2SLytd10mS8PJTpb9uzkbo6h0P0quXOSqRXg==, x-forwarded-for=10.10.120.73, accept-encoding=gzip,deflate, accept=application/vnd.ibm.powervm.uom+xml;, user-agent=Apache-HttpClient/4.5.12 (Java/11)}</RequestHeaders>
</HttpErrorResponse:HttpErrorResponse>
</content>
</entry>
] (PowerVmClient)

BVQ Version: 2021.H1.3 and above

Suggested Action:

Different problems can result in this error. One reason might be a full filesystem (as the error message itself suggests). Another reason might be a dodgy SEA adapter. Please run command

No Format
entstat -all entX

to see if there are Limbo Packets which indicate that the SEA has detected its physical network is not operational.



Error Message: "ObjectNotValidException" during topo persist

No Format
2021-07-19T15:43:45,609 ERROR [persistExecutor_3]: Error during command execution: Error during powervm topology persist execution! [TopoPersist] (AbstractCommandExecutor)
de.sva.bvq.data.grid.api.exception.ObjectNotValidException: Create operation not valid for object of type [pvm_physical_volume_to_virtual_io_server]! pvm_physical_volume_location_code: [[[pvm_physical_volume_location_code] must not be null!][[pvm_physical_volume_location_code] is primary key (not auto generated) and must be set!]]

BVQ Version: 2021.H1.3 and above

Suggested Action:

Information provided by AIX and sent via REST do not match (In this case it was an hdisk which looked fine from AIX point of view but reported no location code when queried via REST API).

Root cause is not understood but rebooting the VIOS servers - one after the other - probably fixes the issue.


Error Message: "NullPointerException" during topo persist

No Format
2022-03-17T11:34:01,746 ERROR [PersistExecutor_1]: Error during command execution [TopoPersist] (BaseJobExecutor) java.lang.NullPointerException: null 
at de.sva.bvq.persister.powervm.commands.PersistVirtualNetworkBridgeCommand.executeCommand(PersistVirtualNetworkBridgeCommand.java:53) ~[bvq-powervm-persist-2021.H2.9.jar!/:?]

BVQ Version: 2021.H1.3 and above

Suggested Action:

This problem can be caused by a corrupt CMDB on  a VIO server. There is a script available from IBM to clean up the CMDB which probably fixes the issue. 

...

Expand
titleBrocade (REST)

Error Message: "Max limit for REST sessions reached"

No Format
Got error from Brocade Switch with IP 10.10.101.147! (body: {
 "errors": {
  "error": [
   {
    "error-type": "application",
    "error-tag": "operation-failed",
    "error-app-tag": "Error",
    "error-path": "/rest/login",
    "error-message": "Max limit for REST sessions reached",
    "error-info": {
     "error-code": 14,
     "error-module": "auth"
    }

BVQ Version: 6.2 and above

Suggested Action:

By default, the SSH session limits on Brocade switches is set to 3. The number of SSH sessions can be increased up to 10 by using CLI command mgmtapp --config -maxrestsession <1...10>



SAN (REST) scanner fails with "authentication failed"

Due to a bug in FOS version 8.2.3a and 8.2.3a1, communication via SNMP and REST APIs is broken after an update causing Brocade REST scanners to fail.

Root cause is a file descriptor leak that occurs in weblinker during LDAP authentication. Once all file descriptors are consumed a verify error is logged.  This also results in webtools authentication failing, causing SANnav, BNA or BVQ to be unable to authenticate with the switch.

BVQ Version: any

Suggested Action:

Workaround
Performing an hafailover on a director or an hareboot on a non-director will restore connectivity.  Once executed, disabling LDAP will prevent the issue from being hit. If LDAP cannot be disabled, reducing the number of login attempts via HTTP/Webtools will increase the timeframe before the issue is observed again. Downgrading from 8.2.3a or 8.2.3a1 to a lower release will also stop the issue from occurring.

Final Fix
FOS 8.2.3a2

...