Built-in checks and error handling
Diagnose runs a set of built-in checks to help you understand where a scope might be failing.
Each check looks at a specific aspect of your scope and answers a clear question, such as whether traffic can reach the workload, whether pods are healthy, or whether services are configured correctly.
This guide groups common troubleshooting actions by check category so you can find a resolution faster.
How to use this guide
When Diagnose reports a failure:
- Identify the failed check and its category.
- Open the check to review the explanation and logs.
- Use the sections below to understand common causes and next steps.
- Apply a fix, redeploy the scope, and then use Diagnose again if you want to confirm the redeployed scope is healthy.
Networking issues
Networking issues usually prevent traffic from reaching your scope.
Ingress Existence
What this usually means
No ingress resource was found for the scope.
Common causes
- Ingress was never created
- Ingress exists in a different namespace
- Labels do not match the scope
What to do next
- Verify that an ingress resource exists for the scope
- Check that the ingress is in the correct namespace
- Confirm ingress labels match the scope configuration
Ingress Class
What this usually means
The ingress references an invalid or missing ingress class.
Common causes
- Ingress class does not exist in the cluster
- Ingress controller is not installed
- Deprecated annotations are used instead of ingressClassName
What to do next
- Verify the ingress class exists
- Confirm the ingress controller is running
- Update the ingress to use a valid ingress class
Ingress Controller Sync
What this usually means
The ingress controller failed to reconcile the ingress.
Common causes
- Controller pods are not running
- Backend services are missing or unhealthy
- Cloud provider configuration errors
What to do next
- Check ingress controller logs
- Verify backend services exist and have endpoints
- Review cloud provider events related to load balancers
Ingress Backend Service
What this usually means
Ingress backends do not exist or have no healthy endpoints.
Common causes
- Service does not exist
- Service selectors do not match pods
- Pods are not ready
What to do next
- Verify the service exists
- Check service selectors and pod labels
- Review pod readiness status
TLS Configuration
What this usually means
TLS configuration for the ingress is invalid.
Common causes
- TLS secret does not exist
- Certificate does not match hostnames
- Certificate is expired or invalid
What to do next
- Verify the TLS secret exists
- Check certificate validity and host coverage
- Regenerate or update certificates if needed
ALB Capacity
What this usually means
The load balancer could not be provisioned or assigned capacity.
Common causes
- Subnet IP exhaustion
- Invalid load balancer annotations
- Cloud provider limits
What to do next
- Review cloud provider logs and events
- Verify subnet capacity and configuration
- Adjust load balancer annotations
Host & Path Rules
What this usually means
Your ingress rules have an issue that can break routing, even if the ingress resource exists.
Common causes
- No rules defined and no default backend configured
- Invalid
pathType(must beExact,Prefix, orImplementationSpecific) - Trailing
/on aPrefixpath (can cause unexpected routing behavior) - Duplicate host rules
- Wildcard hosts without the right configuration
What to do next
- Define at least one rule or configure a default backend
- Confirm every rule uses a valid
pathType - Remove trailing slashes from
Prefixpaths - Consolidate duplicate host rules
- Prefer explicit hostnames over wildcards when possible
Scope issues
Scope issues usually indicate problems with the workload itself.
Pod Existence
What this usually means
No pods were found for the scope.
Common causes
- Deployment failed to create pods
- Pods failed to schedule
- Pods were deleted or evicted
What to do next
- Check deployment status and events
- Review scheduling errors
- Verify resource availability
Container Crash Detection
What this usually means
One or more containers are repeatedly crashing.
Common causes
- Application startup errors
- Missing configuration or secrets
- Insufficient memory or CPU
What to do next
- Review container logs
- Verify configuration and environment variables
- Check resource limits and usage
Image Pull Status
What this usually means
Container images could not be pulled.
Common causes
- Incorrect image name or tag
- Missing image pull credentials
- Registry connectivity issues
What to do next
- Verify image name and tag
- Check image pull secrets
- Confirm registry access from the cluster
Memory Limits
What this usually means
Containers do not have memory limits configured.
Common causes
- Resource limits were omitted
- Defaults were not applied
What to do next
- Add memory limits to container specifications
- Review namespace limit ranges
Container Port Health
What this usually means
Containers are not listening on the ports defined in the deployment.
Common causes
- Application is bound to a different port than configured
- Port binding failed (permissions, conflicts)
- Wrong port set in the deployment spec
- Environment variable for the port is missing or incorrect
What to do next
- Verify containerPort matches the application’s listen port
- Check startup logs for port binding errors
- Confirm port-related environment variables are set correctly
- Test port binding from inside the container if needed
Health Probe Endpoints
What this usually means
Health check probes are not configured correctly or are failing.
Common causes
- Liveness or readiness probe endpoints do not exist
- Probe paths return non-success status codes
- Probe timeouts are too short for the application
- Wrong HTTP headers or ports in probe configuration
What to do next
- Verify probe endpoints exist and return success codes
- Check probe configuration matches application routes
- Increase timeout or initial delay if the application needs more time to start
- Review probe logs in pod events
Storage Mounting
What this usually means
Volumes failed to mount.
Common causes
- Missing persistent volume claims
- Invalid storage class
- Volume attachment errors
What to do next
- Verify PVCs exist and are bound
- Check storage class configuration
- Review node and CSI driver logs
Resource Availability
What this usually means
Your pods cannot be scheduled because the cluster does not currently have enough available capacity or scheduling rules prevent placement.
Common causes
- CPU or memory requests are higher than what any node can provide
- Nodes are at capacity
- Namespace resource quotas are exceeded
- Taints and missing tolerations prevent scheduling
- Node selector or affinity rules are too restrictive
- Too many replicas requested for the current cluster size
What to do next
- Reduce CPU/memory requests in your deployment
- Scale up the cluster by adding nodes
- Relax node selector or affinity constraints
- Check resource quotas and adjust if needed
- Review node taints and add tolerations if appropriate
- Consider priority classes if this workload must schedule ahead of others
Service issues
Service issues usually prevent internal traffic from reaching pods.
Service Existence
What this usually means
No service exists for the scope.
Common causes
- Service was not created
- Service exists in a different namespace
What to do next
- Verify service exists
- Confirm namespace and labels
Service Selector Match
What this usually means
Service selectors do not match any pods.
Common causes
- Label mismatches
- Incorrect selector configuration
What to do next
- Compare service selectors with pod labels
- Update selectors or labels as needed
Service Endpoints
What this usually means
The service has no healthy endpoints.
Common causes
- Pods are not ready
- Selector mismatch
- Network policies blocking traffic
What to do next
- Verify pod readiness
- Review service selectors
- Check network policies
Service Port Configuration
What this usually means
Service ports do not map correctly to container ports.
Common causes
- Incorrect targetPort
- Container not listening on expected port
What to do next
- Verify container ports
- Update service port configuration
Service Type Validation
What this usually means
The service type is invalid or unsupported in the current environment.
Common causes
- LoadBalancer not supported in the cluster
- NodePort outside the allowed range
What to do next
- Update the service type to one your cluster supports
- Verify cluster support and cloud provider configuration for the selected service type
After applying a fix
After you apply a fix, redeploy the scope so the change takes effect.
Once the new deployment is live, you can run Diagnose again to confirm the issue is resolved on the updated scope.
Diagnose is safe to run multiple times and works best as an iterative troubleshooting tool after each deployment.