Skip to main content

Add Diagnose to your scopes

Diagnose runs automated checks on your scopes and deployments. If your scope doesn't have the Diagnose tool yet, or you want to build your own diagnose flow, you need to register the right action specifications on your scope specification.

How Diagnose connects to your scope

Diagnose relies on three components working together:

  1. Action specifications register the Diagnose actions so they appear in the UI.
  2. Workflows define the execution flow when an action is triggered.
  3. The diagnose module contains the context builder and individual checks.

Action specifications

The agent registers two action specifications that make Diagnose available in the UI:

Diagnose Scope

Runs diagnostics against a scope. This action appears on the manage and performance tabs.

{
"name": "Diagnose Scope",
"slug": "diagnose-scope",
"type": "diagnose",
"retryable": true,
"parameters": {
"schema": {
"type": "object",
"required": ["scope_id"],
"properties": {
"scope_id": {
"type": "number",
"readOnly": true,
"visibleOn": ["read"]
}
}
},
"values": {}
},
"results": {
"schema": {
"type": "object",
"required": [],
"properties": {}
},
"values": {}
},
"annotations": {
"show_on": ["manage", "performance"],
"runs_over": "scope"
}
}

Diagnose Deployment

Runs diagnostics against a specific deployment. This action appears on the deployment tab and includes both scope_id and deployment_id.

{
"name": "Diagnose Deployment",
"slug": "diagnose-deployment",
"type": "diagnose",
"retryable": true,
"parameters": {
"schema": {
"type": "object",
"required": ["scope_id", "deployment_id"],
"properties": {
"scope_id": {
"type": "number",
"readOnly": true,
"visibleOn": ["read"]
},
"deployment_id": {
"type": "number",
"readOnly": true,
"visibleOn": ["read"]
}
}
},
"values": {}
},
"results": {
"schema": {
"type": "object",
"required": [],
"properties": {}
},
"values": {}
},
"annotations": {
"show_on": ["deployment"],
"runs_over": "deployment"
}
}

Annotations: show_on and runs_over

Diagnose action specifications use annotations to tell the UI where to display the action button and what entity the action targets. Without annotations, the UI won't know where to render the Diagnose tool or what data to pass to the workflow.

show_on

Controls which sections of the UI display the Diagnose tool. The value is an array, so a single action can appear in multiple places.

ValueWhere the button appears
manageThe Manage tab of the scope
performanceThe Performance tab of the scope
deploymentThe Deployment detail view

For example, "show_on": ["manage", "performance"] makes the Diagnose Scope button visible in both the Manage and Performance tabs of the scope.

runs_over

Tells the UI what entity the action runs against. This determines which parameters the UI passes to the workflow when a user clicks the button.

ValueWhat the action targetsParameters passed
scopeThe scope as a wholescope_id
deploymentA specific deploymentscope_id and deployment_id

A scope-level action uses the scope's label selector to gather resources, while a deployment-level action narrows the scope further with the deployment ID.

How they work together

The combination of show_on and runs_over gives you precise control over the Diagnose experience:

"annotations": {
"show_on": ["manage", "performance"],
"runs_over": "scope"
}

This registers a button on the Manage and Performance tabs that runs diagnostics at the scope level.

"annotations": {
"show_on": ["deployment"],
"runs_over": "deployment"
}

This registers a button on the Deployment view that runs diagnostics scoped to that specific deployment.

Match runs_over with the right parameters

If runs_over is "deployment", make sure deployment_id is in the required array of your parameters schema. The UI will pass it automatically, but the workflow will fail if the schema doesn't expect it.

Register the action specifications

If Diagnose isn't available on your scope yet, you need to register these action specifications manually. Use the CLI or API.

Here's an example for the Diagnose Scope action:

np service-specification action-specification create \
--serviceSpecificationId $service-spec-id \
--body '{
"name": "Diagnose Scope",
"slug": "diagnose-scope",
"type": "diagnose",
"retryable": true,
"parameters": {
"schema": {
"type": "object",
"required": ["scope_id"],
"properties": {
"scope_id": {
"type": "number",
"readOnly": true,
"visibleOn": ["read"]
}
}
},
"values": {}
},
"results": {
"schema": {
"type": "object",
"required": [],
"properties": {}
},
"values": {}
},
"annotations": {
"show_on": ["manage", "performance"],
"runs_over": "scope"
}
}'
note

The $service-spec-id is the ID of your scope specification. You can find it in the response when you create the specification, or by listing your existing ones.

Follow the same pattern to register the Diagnose Deployment action specification. See the lifecycle action specifications guide for more details on how action specs work.

The diagnose workflow

When you trigger a Diagnose action, the agent runs a workflow with three steps:

1. Load utility functions

The workflow starts by sourcing helper functions that every check relies on. These include output formatting (print_success, print_error, print_warning), resource validation (require_pods, require_services, require_ingresses), and the update_check_result function that reports check outcomes.

2. Build context

Before any checks run, the build_context script collects a point-in-time snapshot of the Kubernetes cluster. This snapshot ensures every check evaluates the same data, even if the cluster changes during the run.

The context includes:

ResourceWhat it collects
PodsAll pods matching the scope labels
ServicesServices associated with the deployment
EndpointsService endpoint information
IngressesIngress resources for the scope
SecretsMetadata only, no secret data
IngressClassesAvailable ingress classes in the cluster
EventsRecent Kubernetes events
ALB controllerController pods and logs, if applicable

All data is stored as JSON files in a data/ directory. Checks read from these files instead of making direct API calls, which keeps the run fast and consistent.

3. Execute checks

The workflow uses an executor step that discovers and runs all check scripts in parallel from three folders:

  • diagnose/service/ -- Kubernetes Service checks
  • diagnose/scope/ -- pod and workload checks
  • diagnose/networking/ -- ingress and routing checks

For each check, the executor:

  1. Runs a before_each hook that sets the check status to "running" and notifies the UI.
  2. Executes the check script.
  3. Runs an after_each hook that collects the result and sends it to the UI.

This is what makes the Diagnose UI update in real time as each check completes.

What the workflow looks like

Both the scope and deployment workflows share the same structure:

continue_on_error: true
include:
- "$SERVICE_PATH/values.yaml"
steps:
- name: load_functions
type: script
file: "$SERVICE_PATH/diagnose/utils/diagnose_utils"
output:
- name: update_check_result
type: function
- name: notify_results
type: function
- name: build context
type: script
file: "$SERVICE_PATH/diagnose/build_context"
output:
- name: CONTEXT
type: environment
- name: LABEL_SELECTOR
type: environment
- name: diagnose
type: executor
before_each:
name: notify_check_running
type: script
file: "$SERVICE_PATH/diagnose/notify_check_running"
after_each:
name: notify_check_results
type: script
file: "$SERVICE_PATH/diagnose/notify_diagnose_results"
folders:
- "$SERVICE_PATH/diagnose/service"
- "$SERVICE_PATH/diagnose/scope"
- "$SERVICE_PATH/diagnose/networking"

The continue_on_error: true flag ensures that if one check fails, the remaining checks still run. This is important because you want to see the full picture, not just the first failure.

What you need to enable Diagnose

If you're adding Diagnose to your scope or verifying that it's properly configured:

  1. Action specifications must be registered for the scope. The agent handles this automatically during setup, but you can also register them manually.
  2. Workflow files must exist at k8s/scope/workflows/diagnose.yaml and k8s/deployment/workflows/diagnose.yaml.
  3. The diagnose module must be present at k8s/diagnose/ with its full directory structure.
  4. kubectl access must be available inside the agent, since build_context uses it to collect cluster data.

For standard Kubernetes agents, all of this is included out of the box. You only need to think about these components if you're customizing the agent or building support for a new runtime.

Build your own diagnose flow

If the built-in Diagnose doesn't fit your scope — or your scope runs on a runtime other than Kubernetes — you can build your own diagnose flow from scratch. This means writing your own data collection, checks, and result reporting, while following the contract that the UI expects.

What the UI needs from your workflow

The Diagnose UI renders results based on a specific structure. Your workflow must produce results that match this contract, regardless of what your checks actually inspect.

Each check must produce a JSON result with these fields:

{
"name": "My Check",
"description": "What this check validates",
"category": "Networking",
"status": "success",
"evidence": { "pods_checked": 3 },
"logs": ["✓ All endpoints healthy", "✓ Port 8080 open"],
"start_at": "2025-01-15T10:30:00Z",
"end_at": "2025-01-15T10:30:02Z"
}
FieldDescription
nameDisplay name shown in the UI
descriptionShort explanation of what the check validates
categoryGroups checks in the UI. Use any string (e.g., Networking, Scope, Database)
statusOne of success, failed, warning, or skipped
evidenceArbitrary JSON with data that helps explain the result
logsArray of strings shown as check logs in the UI
start_at / end_atISO 8601 timestamps for the check execution window

How results reach the UI

At the end of the workflow, results must be sent to the nullplatform API using the service action update endpoint. The payload groups checks by category:

{
"results": {
"categories": [
{
"category": "Networking",
"summary": {
"pending": 0,
"running": 0,
"success": 2,
"failed": 1,
"warning": 0,
"skipped": 0
},
"checks": [
{ "name": "...", "status": "...", "..." : "..." }
]
}
]
}
}

The built-in diagnose flow handles this automatically through the notify_results function, which reads all check result files and groups them. If you're building your own flow, you need to build and send this payload yourself.

Wiring it up

To create a complete custom diagnose flow:

  1. Register the action specifications with "type": "diagnose" as shown above. Use annotations to control where the action button appears in the UI and what entity it targets.

  2. Create the workflow file at the expected path. The filename must match the action type:

    • k8s/scope/workflows/diagnose.yaml for scope-level diagnosis
    • k8s/deployment/workflows/diagnose.yaml for deployment-level diagnosis
  3. Collect your data. Replace build_context with whatever data collection makes sense for your runtime. The built-in flow uses kubectl to snapshot Kubernetes resources, but yours could query a database, call an API, or read metrics.

  4. Write your checks. Each check should analyze the collected data and report a result with status and evidence. See Create a custom check for the check structure and conventions.

  5. Report results by sending the grouped result payload to the nullplatform API at the end of the workflow.

Use the built-in flow as a reference

The standard Kubernetes diagnose workflow is a good template for your own. It follows the same collect → check → report pattern described here. You can browse the source at nullplatform/scopes — k8s/diagnose.