This section will help you in resolving your issue, if you are having troubleshooting.

Checking Gateway Running Status

  1. Use the following command to verify whether NextGen Gateway POD (nextgen-gw-0) is running or not.

    kubectl get pods
    Example:
    bootstrap

  2. If you see the POD status as Running, it means NextGen Gateway is running successfully.
    To ensure that the Gateway tunnel is properly established to the cloud, use the following command to verify the vprobe container logs.

    kubectl logs nextgen-gw-0 -c vprobe --tail=200 | grep TlsMonComm
    Example:
    bootstrap

  3. Make sure that the connection status is True. If the connection is False, use the following command to check the complete vprobe logs for additional information.

    kubectl logs nextgen-gw-0 -c vprobe -f

  4. If you see POD status other than Running, then you must debug the pod. Use the following command to check the current status of the POD.

    kubectl describe pod ${POD_NAME}
    Example:
    ubuntu@nextgen-gateway:~$ kubectl describe pod nextgen-gw-0
    Name:     	nextgen-gw-0
    Namespace:	default
    Priority: 	0
    Node:     	nextgen-gateway/10.248.157.185
    Start Time:   Fri, 28 Oct 2022 16:57:45 +0530
    Labels:   	app=nextgen-gw
              	controller-revision-hash=nextgen-gw-6744bddc6f
              	statefulset.kubernetes.io/pod-name=nextgen-gw-0
    Annotations:  <none>
    Status:   	Running
    IP:       	10.42.0.60
    IPs:
      IP:       	10.42.0.60
    Controlled By:  StatefulSet/nextgen-gw

Accessing Gateway Logs

Kubernetes keeps detailed logs of all cluster and application activities, which you can use to narrow down the causes of any failures.

You can access the Gateway logs in two ways:

  1. Using kubectl command
  2. Accessing log files directly from node

1. Using Kubectl command

The kubectl command is the built-in way to view logs on your Kubernetes cluster.

How to view detailed logs for each container ?

  1. To check the detailed logs for each container, first we need to get the pod name using the following command.

kubectl get pods -A
Example:
bootstrap

  1. Now, if we want to check the nextgen-gw-0 pod logs and the list of containers running within the POD, use the following command.

kubectl get pod nextgen-gw-0 -o="custom-columns=NAME:.metadata.name,CONTAINERS:.spec.containers[*].name"
Example:
bootstrap

  1. To check the container logs, use the following command.

    kubectl logs <pod name> --container <container name> -f

  2. To check the previously terminated pod logs, use the following command.

    kubectl logs <pod name> --container <container name> -f -p

Vprobe Container Logs:

Vprobe container is a core container and if any issues observed with connectivity issues, discovery, monitoring, scheduling, app install/uninstall, and app upgrade then you must to verify the vprobe container logs.

kubectl logs nextgen-gw-0 --container vprobe -f

Nativebridge Container Logs:

Nativebridge is responsible for native commands and script executions. If you observe any issues with modules that use native commands or script executions, you should check the nativebridge container logs.

Example: Ping, EMC VNX, EMC VNXe, EMC ClaRiion, RSE etc.

kubectl logs nextgen-gw-0 --container nativebridge -f

Postgres Container Logs:

Postgres container is responsible for persisting the data. If you observe any issues with postgres container startup, you should check the postgres container logs.

kubectl logs nextgen-gw-0 --container postgres -f

2. Accessing log files directly from node

If the Kubernetes service is down and needs to verify the pods logs, then the above approach will not work. In this case you can directly access the log files from the node.

Kubernetes by default store logs in the /var/log/pods/ location in the node. You can also manually check the logs from this location.

Example:

bootstrap

You can change the directory to the required pod and find the container folders inside it.

bootstrap

Debugging Connectivity Issues

Unable to register the NextGen Gateway ?

  1. OpsRamp IP should be reachable from the Gateway. Refer to this link for OpsRamp IP list.

    Example:
    telnet ${OPSRAMP_IP} 443

  2. Openssl should work properly. See the below examples:

    • Direct Connection

      openssl s_client -connect ${OPSRAMP_IP}:443

    • Proxy Connection

      openssl s_client -connect ${OPSRAMP_IP}:443 -proxy ${PROXY_SERVER_IP}:${PROXY_PORT}

The Gateway tunnel is not up after registering the Gateway ?

  1. Opsramp connection grid ip should be reachable. See the below example for your better understanding.

    telnet ${CONNECTION_NODE_IP} 443
    You can find the connection node ip address from vprobe logs using the following command.
    ERROR 17-Nov-22 06:03:10,330 TlsMonComm#189: CommChannelsProcessor. Connection Node : {"httpHost":"cn01-gi01-sjc.opsramp.net","httpPort":8443,"tlsHost":"cn01-gi01-sjc.opsramp.net","tlsPort":443,"resourceToken":"GWXHWfRnBfFj","apiHost":"nextgen.asura.api.opsramp.net"}
    Here you can copy the Host value. (cn01-gi01-sjc.opsramp.net)

  2. Openssl should work properly. See the below examples:

    • Direct Connection
      openssl s_client -connect ${CONNECTION_NODE_IP}:443
    • Proxy Connection
      openssl s_client -connect ${CONNECTION_NODE_IP}:443 -proxy ${PROXY_SERVER_IP}:${PROXY_PORT}

Verifying Memory Usage

To verify the memory usage in Kubernetes pods, make sure that you have enabled the metrics server in the Kubernetes cluster. Kubectl top command can be used to retrieve snapshots of resource utilization of pods or nodes in your Kubernetes cluster.

Verify POD memory usage

$ kubectl top pods
NAME           CPU(cores)   MEMORY(bytes)  
nextgen-gw-0   48m          1375Mi

Verify Node memory usage

$ kubectl top nodes
NAME              CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%  
nextgen-gateway   189m         9%     3969Mi          49%

Gateway Pre-checks

The OpsRamp collector will check the basic requirements for registering the NextGen Gateway to the OpsRamp cloud at the time of registration. This includes the following:

  • CoreDNS Check
  • Helm/Docker Repository Check
  • OpsRamp Cloud Check
  • System Resources Check (Memory, Disk, and CPU)
  • Connection Node Check

CoreDNS Check

During this pre-check, the OpsRamp collector will verify the CoreDNS status.
In this case, the OpsRamp collector tool verifies the following pre-checks:

  • Kubernetes POD to internal service communication
  • POD to an external network
  • Internal network (with and without proxy)

Helm/Docker Repository Check

During this pre-check, the OpsRamp collector will verify repository accessibility from the node and the container (with and without-proxy).

If you find the error shown in below figure, then the following could be the possible issue:

  • Verify whether the repo URL you passed is valid or not.
  • The repository URL is not reachable from the node.
bootstrap

OpsRamp Cloud Check

During this pre-check, the OpsRamp collector will verify OpsRamp cloud accessibility from the node and the container (with and without-proxy).

If you find the error shown in below figure, then the following could be the possible issue:

  • Verify whether the OpsRamp cloud URL you passed is correct or not.
  • Cloud URL is not reachable from the node.
  • Cloud URL is not whitelisted in the network.
bootstrap

System Resources Check

In this pre-check, the OpsRamp collector will verify whether system resources are properly assigned or not before registering the Gateway.
The following are the system resources pre-requisites:

  • Disk - 60GB
  • Memory - 8GB
  • CPU - 4 Core

Possible issues:

  • If you do not allocate the required Memory, you will receive the following error. Please provide the required Memory to resolve the issues.
    bootstrap
  • If you do not allocate the required Disk, you will receive the following error. Please provide the required Disk to resolve the issues.
    bootstrap
  • If you do not allocate the required CPU, you will receive the following error. Please provide the required CPU to resolve the issues.
    bootstrap

Connection Node Check

In this pre-check, the OpsRamp collector will get all the connection nodes from the OpsRamp cloud before registering the Gateway and will check whether they are accessible or not.

Possible issues:

  • If the user passes the incorrect access token, they you will see the following error.
    bootstrap
  • If the connection node is not reachable from the node, then you will see the following error.
    Make sure the connection node is reachable from the node and then try to register the Gateway.
    bootstrap