Skip to content

Runtime Diagnostics

IMPORTANT

TL;DR — LoopTroop exposes three different diagnostic surfaces: a local runtime-stall report, persisted blocked-error diagnostics on ticket failures, and structured retry diagnostics on artifacts that needed correction or re-prompting. Use the surface that matches the failure mode instead of treating everything as a generic "the ticket broke" event.

This page covers the diagnostics that help explain slow local behavior, blocked ticket runs, and recoverable structured-output failures.

1. Choose the Right Diagnostic Surface

SurfaceWhen it appearsWhere to inspect itBest for
Runtime stall reportYou run npm run diagnose:stall while the app is slow or behaving oddlytmp/diagnostics/runtime-stall-*.logSlow refreshes, missing tickets after reload, OpenCode reachability issues, disk / CPU / memory pressure
Blocked-error diagnosticsA phase ends in BLOCKED_ERRORTicket error view and persisted error occurrence dataProvider failures, timeouts, session errors, transport failures, model output truncation
Structured retry diagnosticsA structured-output phase rejects one or more model attempts before validating or finally failingArtifact processing notices and artifact detail viewsWhy a response was retried, what validation failed, and what excerpt caused the retry

2. Runtime Stall Report

Run the report while npm run dev is still running, ideally during the slowdown:

bash
npm run diagnose:stall

The command writes a timestamped local report under tmp/diagnostics/, for example:

text
tmp/diagnostics/runtime-stall-YYYYMMDD-HHMMSS.log

The script is read-only. It does not mutate tickets, repair databases, or modify attached projects.

2.1 Platform Support

The diagnostic script runs on Linux, WSL2, macOS, and Windows.

FeatureLinux/WSLmacOSWindows
Process /proc inspection
Pressure-stall metrics
Cgroup resource snapshot
TCP stats✅ (ss)✅ (netstat)✅ (netstat)
FD limits
Zombie process count
vm_stat / top integration
Shell baselinebash / shbash / shPowerShell

Platform-specific sections that are unavailable simply show as unavailable or n/a; the report still runs.

2.2 What the Report Captures

The report combines several layers of evidence:

  • Environment and startup context — resolved ports, candidate/listener PIDs, backend env snapshot, watcher context, shell startup latency, and focused ticket path resolution.
  • Endpoint probes — frontend, backend health, startup status, projects, tickets, and OpenCode reachability.
  • Short repeated samples — repeated backend and ticket probes to confirm whether the app was actually stalled during capture.
  • Runtime trend window — by default a 3-minute trend that samples backend health, /api/tickets, watched-process CPU/RSS/I/O, Linux pressure deltas, app/project DB and log growth, and trend-wide whole-system read/write/RSS/CPU leaders.
  • Process activity — backend, frontend, and OpenCode memory snapshots, wait state, thread count, FD count, and I/O counters.
  • System resource state — load, memory, pressure-stall metrics, cgroup state, vmstat, disk stats, and top resource consumers.
  • Storage and project state — mount type, free space, inode usage, filesystem latency, SQLite / WAL / SHM sizes, project ticket/session state, and Git responsiveness.
  • Focused ticket runtime sizing — when --ticket-path is supplied, the report also tracks runtime log growth, largest runtime subdirectories, and large artifact files such as build outputs.
  • Advanced probes — event-loop lag, localhost DNS probe, TCP states, zombie process count, swap pressure, and a diagnostic heap snapshot.

2.3 Useful Flags

FlagDefaultUse it when
--timeout-ms <ms>4000Probes are timing out too aggressively and you want slow endpoints or shell checks to finish
--sample-ms <ms>1000CPU or I/O spikes are brief and you want a wider one-window process sample
--trend-ms <ms>180000You want a longer or shorter observation window; use 0 to disable the trend entirely
--trend-interval-ms <ms>1000You want finer or coarser trend granularity
--ticket-path <path>noneYou want runtime sizing focused on one ticket; pass a .ticket dir, its runtime dir, or the worktree root
--backend-port, --frontend-port, --opencode-urlauto-detect when possibleYou started the stack on non-default ports or against a non-default OpenCode server
--no-coloroffYou are piping the report or running in CI; NO_COLOR is also respected

Examples:

bash
npm run diagnose:stall -- --timeout-ms 8000
npm run diagnose:stall -- --sample-ms 5000
npm run diagnose:stall -- --trend-ms 120000 --trend-interval-ms 1000
npm run diagnose:stall -- --ticket-path /path/to/worktree/.ticket
npm run diagnose:stall -- --backend-port 3001 --frontend-port 5175 --opencode-url http://127.0.0.1:4097

2.4 Reading the Report

The top-level sections map directly to the report banners:

  • 🔍 ENVIRONMENT & CONFIGURATION — resolved ports, detected PIDs, backend env vars, shell latency baseline, and focused ticket path resolution.
  • 🌐 NETWORK & ENDPOINT HEALTH — current HTTP probe results for frontend, backend, ticket/project routes, startup status, and OpenCode.
  • 🔁 REPEATED RUNTIME SAMPLES — repeated backend/ticket probes plus the longer Runtime Observation Trend output.
  • ⚙️ APPLICATION PROCESS ACTIVITY — backend/frontend/OpenCode candidate processes, memory snapshots, open files, and per-process CPU / I/O samples.
  • 💻 SYSTEM RESOURCES — pressure, memory, uptime, and whole-system top CPU / RSS / read / write consumers.
  • 💾 STORAGE, MOUNTS & FILESYSTEM — mount details, disk and inode usage, filesystem latency, and optional focused ticket-runtime sizing.
  • 🗄️ DATABASE & PROJECT STATE — app DB pathing, project DB/WAL state, recent ticket/session state, and execution-log tailing.
  • 🔀 GIT RESPONSIVENESSgit status, Trace2 perf output, branch resolution, and other responsiveness checks for attached repos.
  • 🧬 ADVANCED DIAGNOSTICS — event-loop lag, DNS, TCP state counts, zombie counts, swap pressure, and diagnostic heap output.

For intermittent issues, save at least one report from a healthy moment and one from a slow moment. The diff between the two is usually more useful than either report alone.

3. Blocked-Error Diagnostics

When a phase fails hard enough to enter BLOCKED_ERROR, LoopTroop persists a normalized diagnostic payload alongside the error occurrence. That payload is normalized by shared/errorDiagnostics.ts, typically assembled by server/opencode/blockedErrorDiagnostics.ts, and rendered in the ticket error view as Underlying error.

Use this surface when the ticket already blocked and you want the reason, not the whole-machine health picture.

3.1 OpenCode Provider Error Enrichment

OpenCode sometimes streams only Provider returned error even though its local log contains the exact provider failure. LoopTroop best-effort correlates those generic stream errors with recent OpenCode log files by session.id and replaces the generic summary with a sanitized provider summary when a match exists.

The enrichment keeps only compact diagnostic fields such as HTTP status, retryability, provider/model identity, request model, provider error type/title/message, and a short response-body preview. It does not persist prompt bodies, raw request payloads, headers, cookies, authorization values, or URL query strings.

  • Managed local OpenCode: LoopTroop checks the default local OpenCode log directory.
  • External or nonstandard OpenCode: set LOOPTROOP_OPENCODE_LOG_DIR to the log directory.
  • No matching log found: the ticket keeps the generic provider error and adds a troubleshooting hint instead of inventing details.

3.2 Diagnostic Classification

Each blocked-error diagnostic has a normalized kind and source.

Kind (BlockedErrorDiagnosticKind)

KindMeaning
model_output_truncatedOpenCode reported a finish reason such as length, so the model response was cut off
opencode_providerThe provider returned an API-style failure such as auth, quota, rate limit, or invalid request
opencode_sessionSession creation, reconnect, or session-level lifecycle failure
timeoutA prompt or execution deadline was exceeded
transportConnection reset, DNS, socket, or other transport failure
runtimeInternal LoopTroop runtime or orchestration failure
unknownThe failure could not be classified

Source (BlockedErrorDiagnosticSource)

SourceMeaning
opencodeOriginated from the OpenCode integration layer
providerOriginated from the underlying model provider
systemOriginated from a LoopTroop system-level source
runtimeOriginated from a runtime execution error

In the main OpenCode blocked-error builder, provider-like failures are currently emitted with source provider; the other generated OpenCode failure kinds use source opencode.

3.3 Sensitive Data Redaction

Before persistence, normalizeBlockedErrorDiagnostics() sanitizes string fields:

  • API keys, bearer tokens, passwords, and similar secrets are replaced with [redacted].
  • Query strings are stripped from error text so token-like values in URLs are not persisted.
  • The redacted payload still keeps enough structure to debug the issue: status codes, provider/model identity, finish reason, and token counts survive when present.

3.4 Persisted Fields

The normalized blocked-error payload may include:

FieldMeaning
summaryRequired short explanation shown in the UI when it differs from the primary ticket error
modelIdLoopTroop/OpenCode model identifier used for the failed run
sessionIdOpenCode session involved in the failure
providerIdProvider identifier such as openai
providerModelIdProvider-native model identifier when it differs from the requested model
requestModelExact request model recorded by provider diagnostics
statusCodeHTTP status code when available
isRetryableWhether provider diagnostics marked the failure as retryable
providerErrorTypeProvider error type/classification
providerErrorTitleProvider error title or headline
providerErrorMessageRedacted provider error message
responseBodyPreviewShort redacted preview of the provider response body
finishReasonOpenCode finish reason for truncation-style failures
inputTokens / outputTokens / reasoningTokensToken counts reported by OpenCode
cacheReadTokens / cacheWriteTokensToken-cache counts when OpenCode exposes them

The current compact blocked-error panel renders the most actionable subset of those fields. Less common fields, such as responseBodyPreview and cache token counts, can still exist in persisted payloads even if that panel does not show them today.

4. Structured Retry Diagnostics

Structured retry diagnostics explain recoverable structured-output failures. They are normalized by shared/structuredRetryDiagnostics.ts, merged into structured-output metadata by server/structuredOutput/metadata.ts, and rendered in artifact notices / viewers as Retry Attempts.

Use this surface when a phase kept going after rejecting one or more malformed outputs, or when a final failed artifact needs to show exactly why previous attempts were rejected.

4.1 What Gets Stored

Each retry entry captures one rejected attempt:

FieldMeaning
attempt1-based retry attempt number
validationErrorWhy the parser or validator rejected that attempt
failureClassOptional coarse classification such as validation_error, output_truncated, empty_response, provider_error, connection_reset, session_protocol_error, or transport_error
targetOptional target field or schema area that failed
line / columnOptional location for line-oriented parse failures
excerptBest-effort trimmed excerpt from the rejected output

Malformed retry entries are dropped during normalization, and duplicate retry entries are collapsed so the UI does not show the same failure repeatedly.

4.2 Where It Shows Up

Structured retry diagnostics currently surface in artifact-oriented UI rather than the blocked-error panel:

  • artifact processing notices
  • expanded artifact viewers
  • per-owner aggregate vote views when retry metadata exists on contributing artifacts

These diagnostics complement blocked-error diagnostics rather than replacing them:

  • Blocked-error diagnostics answer "why did the ticket stop?"
  • Structured retry diagnostics answer "why was this earlier model output rejected before we recovered or finally gave up?"

LoopTroop documentation for the current runtime.