Add Diagnose to your scopes
Diagnose runs automated checks on your scopes and deployments. If your scope doesn't have the Diagnose tool yet, or you want to build your own diagnose flow, you need to register the right action specifications on your scope specification.
How Diagnose connects to your scope
Diagnose relies on three components working together:
- Action specifications register the Diagnose actions so they appear in the UI.
- Workflows define the execution flow when an action is triggered.
- The diagnose module contains the context builder and individual checks.
Action specifications
The agent registers two action specifications that make Diagnose available in the UI:
Diagnose Scope
Runs diagnostics against a scope. This action appears on the manage and performance tabs.
{
"name": "Diagnose Scope",
"slug": "diagnose-scope",
"type": "diagnose",
"retryable": true,
"parameters": {
"schema": {
"type": "object",
"required": ["scope_id"],
"properties": {
"scope_id": {
"type": "number",
"readOnly": true,
"visibleOn": ["read"]
}
}
},
"values": {}
},
"results": {
"schema": {
"type": "object",
"required": [],
"properties": {}
},
"values": {}
},
"annotations": {
"show_on": ["manage", "performance"],
"runs_over": "scope"
}
}
Diagnose Deployment
Runs diagnostics against a specific deployment. This action appears on the deployment tab and includes both scope_id and deployment_id.
{
"name": "Diagnose Deployment",
"slug": "diagnose-deployment",
"type": "diagnose",
"retryable": true,
"parameters": {
"schema": {
"type": "object",
"required": ["scope_id", "deployment_id"],
"properties": {
"scope_id": {
"type": "number",
"readOnly": true,
"visibleOn": ["read"]
},
"deployment_id": {
"type": "number",
"readOnly": true,
"visibleOn": ["read"]
}
}
},
"values": {}
},
"results": {
"schema": {
"type": "object",
"required": [],
"properties": {}
},
"values": {}
},
"annotations": {
"show_on": ["deployment"],
"runs_over": "deployment"
}
}
Annotations: show_on and runs_over
Diagnose action specifications use annotations to tell the UI where to display the action button and what entity the action targets. Without annotations, the UI won't know where to render the Diagnose tool or what data to pass to the workflow.
show_on
Controls which sections of the UI display the Diagnose tool. The value is an array, so a single action can appear in multiple places.
| Value | Where the button appears |
|---|---|
manage | The Manage tab of the scope |
performance | The Performance tab of the scope |
deployment | The Deployment detail view |
For example, "show_on": ["manage", "performance"] makes the Diagnose Scope button visible in both the Manage and Performance tabs of the scope.
runs_over
Tells the UI what entity the action runs against. This determines which parameters the UI passes to the workflow when a user clicks the button.
| Value | What the action targets | Parameters passed |
|---|---|---|
scope | The scope as a whole | scope_id |
deployment | A specific deployment | scope_id and deployment_id |
A scope-level action uses the scope's label selector to gather resources, while a deployment-level action narrows the scope further with the deployment ID.
How they work together
The combination of show_on and runs_over gives you precise control over the Diagnose experience:
"annotations": {
"show_on": ["manage", "performance"],
"runs_over": "scope"
}
This registers a button on the Manage and Performance tabs that runs diagnostics at the scope level.
"annotations": {
"show_on": ["deployment"],
"runs_over": "deployment"
}
This registers a button on the Deployment view that runs diagnostics scoped to that specific deployment.
runs_over with the right parametersIf runs_over is "deployment", make sure deployment_id is in the required array of your parameters schema. The UI will pass it automatically, but the workflow will fail if the schema doesn't expect it.
Register the action specifications
If Diagnose isn't available on your scope yet, you need to register these action specifications manually. Use the CLI or API.
Here's an example for the Diagnose Scope action:
- CLI
- cURL
np service-specification action-specification create \
--serviceSpecificationId $service-spec-id \
--body '{
"name": "Diagnose Scope",
"slug": "diagnose-scope",
"type": "diagnose",
"retryable": true,
"parameters": {
"schema": {
"type": "object",
"required": ["scope_id"],
"properties": {
"scope_id": {
"type": "number",
"readOnly": true,
"visibleOn": ["read"]
}
}
},
"values": {}
},
"results": {
"schema": {
"type": "object",
"required": [],
"properties": {}
},
"values": {}
},
"annotations": {
"show_on": ["manage", "performance"],
"runs_over": "scope"
}
}'
curl -L -X POST 'https://api.nullplatform.com/service_specification/$service-spec-id/action_specification' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <token>' \
-d '{
"name": "Diagnose Scope",
"slug": "diagnose-scope",
"type": "diagnose",
"retryable": true,
"parameters": {
"schema": {
"type": "object",
"required": ["scope_id"],
"properties": {
"scope_id": {
"type": "number",
"readOnly": true,
"visibleOn": ["read"]
}
}
},
"values": {}
},
"results": {
"schema": {
"type": "object",
"required": [],
"properties": {}
},
"values": {}
},
"annotations": {
"show_on": ["manage", "performance"],
"runs_over": "scope"
}
}'
The $service-spec-id is the ID of your scope specification. You can find it in the response when you create the specification, or by listing your existing ones.
Follow the same pattern to register the Diagnose Deployment action specification. See the lifecycle action specifications guide for more details on how action specs work.
The diagnose workflow
When you trigger a Diagnose action, the agent runs a workflow with three steps:
1. Load utility functions
The workflow starts by sourcing helper functions that every check relies on. These include output formatting (print_success, print_error, print_warning), resource validation (require_pods, require_services, require_ingresses), and the update_check_result function that reports check outcomes.
2. Build context
Before any checks run, the build_context script collects a point-in-time snapshot of the Kubernetes cluster. This snapshot ensures every check evaluates the same data, even if the cluster changes during the run.
The context includes:
| Resource | What it collects |
|---|---|
| Pods | All pods matching the scope labels |
| Services | Services associated with the deployment |
| Endpoints | Service endpoint information |
| Ingresses | Ingress resources for the scope |
| Secrets | Metadata only, no secret data |
| IngressClasses | Available ingress classes in the cluster |
| Events | Recent Kubernetes events |
| ALB controller | Controller pods and logs, if applicable |
All data is stored as JSON files in a data/ directory. Checks read from these files instead of making direct API calls, which keeps the run fast and consistent.
3. Execute checks
The workflow uses an executor step that discovers and runs all check scripts in parallel from three folders:
diagnose/service/-- Kubernetes Service checksdiagnose/scope/-- pod and workload checksdiagnose/networking/-- ingress and routing checks
For each check, the executor:
- Runs a before_each hook that sets the check status to "running" and notifies the UI.
- Executes the check script.
- Runs an after_each hook that collects the result and sends it to the UI.
This is what makes the Diagnose UI update in real time as each check completes.
What the workflow looks like
Both the scope and deployment workflows share the same structure:
continue_on_error: true
include:
- "$SERVICE_PATH/values.yaml"
steps:
- name: load_functions
type: script
file: "$SERVICE_PATH/diagnose/utils/diagnose_utils"
output:
- name: update_check_result
type: function
- name: notify_results
type: function
- name: build context
type: script
file: "$SERVICE_PATH/diagnose/build_context"
output:
- name: CONTEXT
type: environment
- name: LABEL_SELECTOR
type: environment
- name: diagnose
type: executor
before_each:
name: notify_check_running
type: script
file: "$SERVICE_PATH/diagnose/notify_check_running"
after_each:
name: notify_check_results
type: script
file: "$SERVICE_PATH/diagnose/notify_diagnose_results"
folders:
- "$SERVICE_PATH/diagnose/service"
- "$SERVICE_PATH/diagnose/scope"
- "$SERVICE_PATH/diagnose/networking"
The continue_on_error: true flag ensures that if one check fails, the remaining checks still run. This is important because you want to see the full picture, not just the first failure.
What you need to enable Diagnose
If you're adding Diagnose to your scope or verifying that it's properly configured:
- Action specifications must be registered for the scope. The agent handles this automatically during setup, but you can also register them manually.
- Workflow files must exist at
k8s/scope/workflows/diagnose.yamlandk8s/deployment/workflows/diagnose.yaml. - The diagnose module must be present at
k8s/diagnose/with its full directory structure. - kubectl access must be available inside the agent, since
build_contextuses it to collect cluster data.
For standard Kubernetes agents, all of this is included out of the box. You only need to think about these components if you're customizing the agent or building support for a new runtime.
Build your own diagnose flow
If the built-in Diagnose doesn't fit your scope — or your scope runs on a runtime other than Kubernetes — you can build your own diagnose flow from scratch. This means writing your own data collection, checks, and result reporting, while following the contract that the UI expects.
What the UI needs from your workflow
The Diagnose UI renders results based on a specific structure. Your workflow must produce results that match this contract, regardless of what your checks actually inspect.
Each check must produce a JSON result with these fields:
{
"name": "My Check",
"description": "What this check validates",
"category": "Networking",
"status": "success",
"evidence": { "pods_checked": 3 },
"logs": ["✓ All endpoints healthy", "✓ Port 8080 open"],
"start_at": "2025-01-15T10:30:00Z",
"end_at": "2025-01-15T10:30:02Z"
}
| Field | Description |
|---|---|
name | Display name shown in the UI |
description | Short explanation of what the check validates |
category | Groups checks in the UI. Use any string (e.g., Networking, Scope, Database) |
status | One of success, failed, warning, or skipped |
evidence | Arbitrary JSON with data that helps explain the result |
logs | Array of strings shown as check logs in the UI |
start_at / end_at | ISO 8601 timestamps for the check execution window |
How results reach the UI
At the end of the workflow, results must be sent to the nullplatform API using the service action update endpoint. The payload groups checks by category:
{
"results": {
"categories": [
{
"category": "Networking",
"summary": {
"pending": 0,
"running": 0,
"success": 2,
"failed": 1,
"warning": 0,
"skipped": 0
},
"checks": [
{ "name": "...", "status": "...", "..." : "..." }
]
}
]
}
}
The built-in diagnose flow handles this automatically through the notify_results function, which reads all check result files and groups them. If you're building your own flow, you need to build and send this payload yourself.
Wiring it up
To create a complete custom diagnose flow:
-
Register the action specifications with
"type": "diagnose"as shown above. Useannotationsto control where the action button appears in the UI and what entity it targets. -
Create the workflow file at the expected path. The filename must match the action type:
k8s/scope/workflows/diagnose.yamlfor scope-level diagnosisk8s/deployment/workflows/diagnose.yamlfor deployment-level diagnosis
-
Collect your data. Replace
build_contextwith whatever data collection makes sense for your runtime. The built-in flow useskubectlto snapshot Kubernetes resources, but yours could query a database, call an API, or read metrics. -
Write your checks. Each check should analyze the collected data and report a result with
statusandevidence. See Create a custom check for the check structure and conventions. -
Report results by sending the grouped result payload to the nullplatform API at the end of the workflow.
The standard Kubernetes diagnose workflow is a good template for your own. It follows the same collect → check → report pattern described here. You can browse the source at nullplatform/scopes — k8s/diagnose.