Let me start by saying this is definitely not a all-inclusive guide, but this hopefully might help a few people or get you started. I’ve been working with vRealize Automation for the last year, and I’ve had to troubleshoot quite a few problems. Not sure if that’s normal, considering the nature of the type of program it is, or I’ve just had bad luck / lack of knowledge. Either way, hopefully my issues can help someone.
There are always 3 obvious things to check on all machines at this point in a Vmware enviro. DNS, NTP, and Disk Space. Once the low hanging fruit is out of the way, start looking at logs, error messages. You will need SSH access into the vRealize Automation VM to do these commands.
Disk space shouldn’t be an issue after 8.6.2 but if you have less than 20% free this can be a problem. This command will search the /var/log for files larger then 100Mb.
find /var/log -size +100M -exec du -h {} \; | less
There are also some log files you can get rid of for more space. Any of the log files named with “heap_dump.hprof.xz” or log bundles you’ve created previously. A sample command for that might look like this.
rm /var/log/services-logs/prelude/ebs-aps/file-logs/…_heap_dump.hprof.xz
The following command will take files in /var/log and truncate them. This can free up a LOT of space back.
find /var/log/ -mount -type f -mtime +1 -exec echo {} \; -exec truncate -cs 0 {} \; 2>&1 | tee /tmp/files_truncated.txt
One thing that happened to me quite a few times, was upgrades failing due to disk space. The crazy thing about this is that the pre-checks (which check for space) all came back good. You should look for a minimum of 20% free space per disk. You can check on this at the command line by typing in the following.
vracli disk-mgr
You will get output that may look something like this:
If all the above doesn’t work, you may need to resize your disk. In order to accomplish this, you will first need to increase the size of the VMDK. vRealize Automation 8.x has multiple VMDKs. Make sure you choose the correct one. You can resize without shutting down the VM. Then run the following command
vracli disk-mgr resize
vRealize Automation 8.x uses docker containers. These should be started and stopped either through the vRealize Lifecycle Manager, or through a script. I have shut down hard before, and ended up killing the VM. To stop the containers run the following:
/opt/scripts/svc-stop.sh
sleep 120
/opt/scripts/deploy.sh –onlyClean
To start the containers back up run the following script
/opt/scripts/deploy.sh
There are occasions where the above script doesn’t start everything properly and I’ve been able to get them back and running by going to the vRealize Lifecycle Manager and “starting” the machine up from there. It will automatically recognize the machine’s power status and attempt to start the containers.
To check status of the containers, you can use regular docker commands or vracli commands like the following:
vracli service status
As mentioned, you can use kubectl commands.
kubectl get nodes
This shows nodes and status. It should show as ready
kubectl get pods –all-namespaces
This command will show you all the namespaces running in Kubernetes. Typically, infrastructure pods will be in the namespace of “kube-system” and vRA is, is as of 8.6.2, running 99% of it’s pods in the “prelude” namespace. You can also specify just the namespace you want.
kubectl get pods –namespace prelude
This filters the pods to just the namespace “prelude”
kubectl -n prelude get pods
This does the same thing, just using different shorthand
NTP is extremely important in vRealize Automation (as it is in most vSphere products). I’ve had commands etc. fail if there is a variance of 10 min or more.
vracli ntp esxi
This command enables NTP and syncs with the host
vracli ntp status
This command allows you to see the status of NTP and where it’s being sync’d with
vracli ntp systemd –set NAME_OF_NTP_SERVER
This syncs with a NTP server of your choosing. You can add multiple by placing each server in a single quote and separating by a comma.
Finally I will go over logs a little bit. There are two ways of obtaining logs from vRA 8.x. Well, three if you consider sending logs to another place. The first is to use the Lifecycle Manager and you can select “Generate Logs” from the menu and then download them. The second is to create them from the appliance. You would SSH into it and then run the following command:
vracli log-bundle
You can also view logging inside of Kubernetes by querying the pod, for example:
kubectl logs -n prelude postgres-0
Hopefully some of the above can help you in your journey. There are a few others that have helped me as well. Documentation from VMware helps some, also I’ve relied on some other blogs such as Steven Bright’s. Thanks!
-Mike