Built-in checks and error handling

Diagnose runs a set of built-in checks to help you understand where a scope might be failing.

Each check looks at a specific aspect of your scope and answers a clear question, such as whether traffic can reach the workload, whether pods are healthy, or whether services are configured correctly.

This guide groups common troubleshooting actions by check category so you can find a resolution faster.

How to use this guide

When Diagnose reports a failure:

Identify the failed check and its category.
Open the check to review the explanation and logs.
Use the sections below to understand common causes and next steps.
Apply a fix, redeploy the scope, and then use Diagnose again if you want to confirm the redeployed scope is healthy.

Networking issues

Networking issues usually prevent traffic from reaching your scope.

Ingress Existence

What this usually means
No ingress resource was found for the scope.

Common causes

Ingress was never created
Ingress exists in a different namespace
Labels do not match the scope

What to do next

Verify that an ingress resource exists for the scope
Check that the ingress is in the correct namespace
Confirm ingress labels match the scope configuration

Ingress Class

What this usually means
The ingress references an invalid or missing ingress class.

Common causes

Ingress class does not exist in the cluster
Ingress controller is not installed
Deprecated annotations are used instead of ingressClassName

What to do next

Verify the ingress class exists
Confirm the ingress controller is running
Update the ingress to use a valid ingress class

Ingress Controller Sync

What this usually means
The ingress controller failed to reconcile the ingress.

Common causes

Controller pods are not running
Backend services are missing or unhealthy
Cloud provider configuration errors

What to do next

Check ingress controller logs
Verify backend services exist and have endpoints
Review cloud provider events related to load balancers

Ingress Backend Service

What this usually means
Ingress backends do not exist or have no healthy endpoints.

Common causes

Service does not exist
Service selectors do not match pods
Pods are not ready

What to do next

Verify the service exists
Check service selectors and pod labels
Review pod readiness status

TLS Configuration

What this usually means
TLS configuration for the ingress is invalid.

Common causes

TLS secret does not exist
Certificate does not match hostnames
Certificate is expired or invalid

What to do next

Verify the TLS secret exists
Check certificate validity and host coverage
Regenerate or update certificates if needed

ALB Capacity

What this usually means
The load balancer could not be provisioned or assigned capacity.

Common causes

Subnet IP exhaustion
Invalid load balancer annotations
Cloud provider limits

What to do next

Review cloud provider logs and events
Verify subnet capacity and configuration
Adjust load balancer annotations

Host & Path Rules

What this usually means
Your ingress rules have an issue that can break routing, even if the ingress resource exists.

Common causes

No rules defined and no default backend configured
Invalid pathType (must be Exact, Prefix, or ImplementationSpecific)
Trailing / on a Prefix path (can cause unexpected routing behavior)
Duplicate host rules
Wildcard hosts without the right configuration

What to do next

Define at least one rule or configure a default backend
Confirm every rule uses a valid pathType
Remove trailing slashes from Prefix paths
Consolidate duplicate host rules
Prefer explicit hostnames over wildcards when possible

Scope issues

Scope issues usually indicate problems with the workload itself.

Pod Existence

What this usually means
No pods were found for the scope.

Common causes

Deployment failed to create pods
Pods failed to schedule
Pods were deleted or evicted

What to do next

Check deployment status and events
Review scheduling errors
Verify resource availability

Container Crash Detection

What this usually means
One or more containers are repeatedly crashing.

Common causes

Application startup errors
Missing configuration or secrets
Insufficient memory or CPU

What to do next

Review container logs
Verify configuration and environment variables
Check resource limits and usage

Image Pull Status

What this usually means
Container images could not be pulled.

Common causes

Incorrect image name or tag
Missing image pull credentials
Registry connectivity issues

What to do next

Verify image name and tag
Check image pull secrets
Confirm registry access from the cluster

Memory Limits

What this usually means
Containers do not have memory limits configured.

Common causes

Resource limits were omitted
Defaults were not applied

What to do next

Add memory limits to container specifications
Review namespace limit ranges

Container Port Health

What this usually means
Containers are not listening on the ports defined in the deployment.

Common causes

Application is bound to a different port than configured
Port binding failed (permissions, conflicts)
Wrong port set in the deployment spec
Environment variable for the port is missing or incorrect

What to do next

Verify containerPort matches the application’s listen port
Check startup logs for port binding errors
Confirm port-related environment variables are set correctly
Test port binding from inside the container if needed

Health Probe Endpoints

What this usually means
Health check probes are not configured correctly or are failing.

Common causes

Liveness or readiness probe endpoints do not exist
Probe paths return non-success status codes
Probe timeouts are too short for the application
Wrong HTTP headers or ports in probe configuration

What to do next

Verify probe endpoints exist and return success codes
Check probe configuration matches application routes
Increase timeout or initial delay if the application needs more time to start
Review probe logs in pod events

Storage Mounting

What this usually means
Volumes failed to mount.

Common causes

Missing persistent volume claims
Invalid storage class
Volume attachment errors

What to do next

Verify PVCs exist and are bound
Check storage class configuration
Review node and CSI driver logs

Resource Availability

What this usually means
Your pods cannot be scheduled because the cluster does not currently have enough available capacity or scheduling rules prevent placement.

Common causes

CPU or memory requests are higher than what any node can provide
Nodes are at capacity
Namespace resource quotas are exceeded
Taints and missing tolerations prevent scheduling
Node selector or affinity rules are too restrictive
Too many replicas requested for the current cluster size

What to do next

Reduce CPU/memory requests in your deployment
Scale up the cluster by adding nodes
Relax node selector or affinity constraints
Check resource quotas and adjust if needed
Review node taints and add tolerations if appropriate
Consider priority classes if this workload must schedule ahead of others

Service issues

Service issues usually prevent internal traffic from reaching pods.

Service Existence

What this usually means
No service exists for the scope.

Common causes

Service was not created
Service exists in a different namespace

What to do next

Verify service exists
Confirm namespace and labels

Service Selector Match

What this usually means
Service selectors do not match any pods.

Common causes

Label mismatches
Incorrect selector configuration

What to do next

Compare service selectors with pod labels
Update selectors or labels as needed

Service Endpoints

What this usually means
The service has no healthy endpoints.

Common causes

Pods are not ready
Selector mismatch
Network policies blocking traffic

What to do next

Verify pod readiness
Review service selectors
Check network policies

Service Port Configuration

What this usually means
Service ports do not map correctly to container ports.

Common causes

Incorrect targetPort
Container not listening on expected port

What to do next

Verify container ports
Update service port configuration

Service Type Validation

What this usually means
The service type is invalid or unsupported in the current environment.

Common causes

LoadBalancer not supported in the cluster
NodePort outside the allowed range

What to do next

Update the service type to one your cluster supports
Verify cluster support and cloud provider configuration for the selected service type

After applying a fix

After you apply a fix, redeploy the scope so the change takes effect.
Once the new deployment is live, you can run Diagnose again to confirm the issue is resolved on the updated scope.

Diagnose is safe to run multiple times and works best as an iterative troubleshooting tool after each deployment.

How to use this guide​

Networking issues​

Scope issues​

Service issues​

After applying a fix​

How to use this guide

Networking issues

Scope issues

Service issues

After applying a fix