Insigh8s : An opinionated MCP server for Kubernetes triage

It's 2am. Your phone goes off. A cost alert fires for the payments namespace: 23% over budget for the week. You open your laptop.

You check the Grafana dashboard. Spend is up, but it doesn't say why. You switch to OpenCost: fine-grained allocation, but no correlation with what changed. You kubectl get pods -n payments to see what's running. Forty pods. Which ones matter? You check the last few Argo CD syncs. Something was deployed three days ago. Was that it?

Then the security on-call pings you: there's a PCI audit next quarter, and three of those pods are running as root. They've been like that for weeks but nobody flagged it. Now it's blocking the audit.

So you're sitting there at 2am, with five browser tabs open, correlating five sources of data by hand, trying to figure out: what actually broke, what does it cost, what's the priority, and what's the fix?

This is the job. Every platform engineer, SRE, FinOps lead, and security reviewer knows this flow. The tools are all there. The data is all there. The gap is everything that lives between the tools.

That gap is what Insigh8s is trying to fill.

The problem isn't visibility. It's judgement.

Here's the thing nobody says out loud: your AI assistant can already see all of this data.

Claude Desktop can install the Kubernetes MCP server and run kubectl for you. There's an AKS MCP server. OpenCost ships with a built-in MCP on port 8081. Kyverno's PolicyReports are CRDs that any MCP can query. Even Microsoft Sentinel now has MCP support for security data.

You can wire a dozen MCPs to your AI today. Most teams aren't doing it, but the capability exists.

What you can't wire in is the judgement. Knowing:

Which of those data sources to query for a given question
How to join results across them (pod name → cost allocation → policy violation → recent deploy → network flow)
What thresholds actually matter (is 72% CPU headroom "waste"? what about 35%?)
How to rank findings by what will actually hurt if you don't fix it today
What a copy-paste remediation looks like for each category

That judgement still lives in the head of the engineer on-call at 2am. Every 2am. For every incident. Forever.

LLMs can get better at reasoning, but they can't learn your cluster's specific triage playbook just by getting smarter. And even if they could, you probably don't want "the AI decided what to do this time" as your answer for a PCI auditor.

What Insigh8s is

Insigh8s is one MCP server. Not a SaaS, not a dashboard, not an AI agent. A single open-source program you run on your cluster that registers itself once with your AI assistant.

It exposes a handful of composite tools: functions like audit_namespace, find_expensive_violators, and rightsize_workloads. Each one corresponds to a triage question a human would actually ask.

Here's the mental model I find useful: think of Insigh8s like a SQL view.

In a database, you could write a complex JOIN across five tables every time you want an answer. Or you could create a view like v_namespace_audit that encodes the join once, tested and named. Anyone querying the view gets the same clean answer. The DBA who wrote the view encoded their understanding of the schema once, and every user benefits.

Insigh8s is the view. Your AI assistant is the SQL client. The underlying systems (kubectl, OpenCost, Prometheus, Kyverno, Hubble) are the tables.

The three v0.1 tools, one per intent

Early on, we tried to build one composite tool that answered everything: audit, cost, triage, compliance, all from a single call. It felt clever. It was also wrong.

Different people asking different questions want different answers. A security reviewer auditing for PCI doesn't want pod restart counts mixed into the report. An SRE at 2am doesn't want a compliance score. A FinOps analyst asking about spend doesn't want a network flow summary. Bundling those concerns into one tool creates an output that's noisy for every caller and clean for none.

So v0.1 ships three composite tools, each answering one clear question:

investigate_namespace(namespace, window) → SRE workflow. What's wrong and why?
namespace_cost(namespace, window) → FinOps workflow. What does this cost?
audit_namespace(namespace, framework) → compliance workflow. Is this compliant with [framework X]?

Plus one helper, list_audit_frameworks(), which the AI calls when the user asks for an audit without specifying which framework they mean.

What `investigate_namespace` actually does

Let's take the SRE tool, since it's the one most people will hit first. Say you call investigate_namespace("payments", window="15m"). Here's what happens inside Insigh8s's code, not in your AI's reasoning:

Shell out to kubectl get pods -n payments and classify each pod as healthy, degraded (recent restarts), or failed (CrashLoopBackOff, ImagePullBackOff, Pending too long)
For each failed pod, pull the last 20 lines of logs and look for known error patterns (OOMKilled, context deadline, connection refused, DNS failures)
Query kubectl get deployments -n payments with revision history, find any deploy that landed within the window
Correlate: did the failures start after the most recent deploy? If yes, flag it as the likely cause
PromQL query for error rate delta: compare last 15 minutes to the hour before
Optionally, if Hubble is installed, pull flow logs for unusual rejection patterns
Optionally, query PolicyReport CRDs for admission denials in the window

Then, in Go code:

Join findings by pod and workload
Rank by severity and blast radius
Generate a concise, prioritized summary with suggested next steps (rollback command, pod to describe, log lines to investigate)

Your AI assistant makes one tool call. The orchestration, the joins, the pattern recognition: all of that lives in reviewable, versionable, testable Go code. Not in the LLM's interpretation of your prompt.

What `namespace_cost` does

Completely separate tool, deliberately narrow. Given a namespace and a window, it hits OpenCost's allocation API, returns spend broken down by workload, computes week-over-week delta, and ranks the top cost drivers.

No policy findings. No rightsizing recommendations (that's a future tool, find_waste). No investigation signals. Just cost, because that's what the caller asked for.

What `audit_namespace` does

Also separate, also deliberately narrow. Takes a namespace and a required framework parameter. v0.1 supports two frameworks:

pod-security-standards-restricted: the upstream Kubernetes Pod Security Standards, Restricted profile
cis-kubernetes-benchmark: CIS's widely-recognized Kubernetes hardening spec

The tool checks each relevant control for the framework, lists which pods or containers violate it, and suggests remediation (a Kyverno policy, a kubectl patch command, or a YAML edit).

What if the user says "audit the payments namespace" without naming a framework? Your AI assistant calls list_audit_frameworks() first, reads the options, and asks the user which one they want. That's the right division of labor: Insigh8s provides the capabilities, your AI handles the conversation.

More frameworks (SOC2 CC6, ISO 27001 A.12, PCI-DSS 4, NIST 800-190) are planned for v0.2+. Those require more interpretation and sometimes external context, so they're second-wave work.

Why this matters even as AI gets better

The obvious objection: "Won't Claude 7 or GPT-6 just do all of this natively?"

Maybe. But that's missing what this kind of tool actually is.

Cursor still exists even though Claude writes code. k9s still exists even though kubectl works fine. Terraform still exists even though every cloud has a web console. The "raw capability exists in the platform" vs "opinionated product for teams" distinction is durable.

When an enterprise ops team handles a real incident, they don't want "what did the AI decide to do this time." They want:

Deterministic answers. Same question today and tomorrow.
Versioned tools. audit_namespace v1.2 is a diff you can review.
Compliance-readable code. Your security team can read what gets checked.
Works offline. Because Claude's API goes down sometimes, and you still have an incident.
Consistent across models. Claude, GPT, Gemini, local Llama all give the same answer because they're all calling the same code.

As LLMs take on more critical work, the pressure for this kind of reviewability increases, not decreases.

Who this is for

Four audiences, each mapped to the tool that answers their question:

Developers → `investigate_namespace`

"My deploy failed. What broke?" → one tool call returns the failing pod, the deploy that preceded the failure, the error pattern in the logs, and the likely fix. Instead of digging through kubectl, logs, and recent git commits in three separate tools.

Platform / SRE → `investigate_namespace`

"Something's wrong in the payments namespace." → one tool call returns unhealthy pods, recent deploys that correlate with the problems, log error patterns, error rate changes, and unusual flows. Your 2am triage playbook, encoded. Same tool the developer uses, because "what's broken?" is fundamentally the same question regardless of role.

FinOps → `namespace_cost`

"What is the payments namespace costing us?" → one tool call returns spend broken down by workload, week-over-week delta, and top cost drivers. Clean output focused on the money question. For rightsizing and waste-hunting specifically, future tools will land in v0.2 (find_waste, idle_workloads).

Security and auditors → `audit_namespace`

"Audit the payments namespace against CIS Kubernetes Benchmark." → one tool call returns a compliance report: which controls pass, which fail, which pods are the violators, and what the remediation looks like. v0.1 supports Pod Security Standards (Restricted) and CIS Kubernetes Benchmark. If you don't name a framework, your AI asks which one you want.

The same principle underlies all of them: one tool per intent, no god-tool. An SRE investigating a problem should get investigation output. A FinOps engineer asking about cost should get cost output. A compliance reviewer running an audit should get audit output. No tool tries to answer every question at once.

What's next

Insigh8s is a work in progress. v0.1 is coming soon as open-source, Apache 2.0, with three composite tools (investigate_namespace, namespace_cost, audit_namespace) plus list_audit_frameworks as a helper. Supports AKS, EKS, GKE, or any CNCF-conformant cluster. No telemetry, no phone-home, no SaaS tier.

v0.2 and beyond will expand each intent: find_waste and idle_workloads on the FinOps side, more audit frameworks (SOC2, ISO 27001, PCI-DSS, NIST), and investigate_pod and trace_latency for deeper SRE work. The roadmap is public on GitHub and genuinely open to input.

If the composite-tool approach resonates with you, there are three ways to get involved early:

Star the repo at github.com/insigh8s to follow progress and see the roadmap take shape.
Shape the design by joining the GitHub Discussions. The early decisions (which tools to ship next, what the default thresholds should be, how to handle graceful degradation when Kyverno or OpenCost aren't installed) are still open.
Write code. Good-first-issue labels, a simple Go architecture, and a roadmap you can pick from.

If you just want to be notified when v0.1 drops, there's a minimal email form at insigh8s.io. One email when it ships. Nothing else.

One last thing

If you've ever been the human correlation engine at 2am, you already know why this has to exist. The tools have always been there. The data has always been there. What's been missing is something that encodes the judgement (the knowing-which-thing-matters part) into code that your AI can call once and get right.

That's Insigh8s. An opinionated MCP for Kubernetes triage, built by people who've done the 2am correlation by hand enough times to want something better.

Come build it with us.

Insigh8s is a community project, open-source under Apache 2.0. Website: insigh8s.io. GitHub: github.com/insigh8s. Blog: blog.insigh8s.io.

Insigh8s : An opinionated MCP server for Kubernetes triage

The problem isn't visibility. It's judgement.

What Insigh8s is

The three v0.1 tools, one per intent

What `investigate_namespace` actually does

What `namespace_cost` does

What `audit_namespace` does

Why this matters even as AI gets better

Who this is for

Developers → `investigate_namespace`

Platform / SRE → `investigate_namespace`

FinOps → `namespace_cost`

Security and auditors → `audit_namespace`

What's next

One last thing

Comments

More from this blog

How Uber Runs 30 Million ML Predictions Per Second on Kubernetes

Command Palette

The problem isn't visibility. It's judgement.

What Insigh8s is

The three v0.1 tools, one per intent

What investigate_namespace actually does

What namespace_cost does

What audit_namespace does

Why this matters even as AI gets better

Who this is for

Developers → investigate_namespace

Platform / SRE → investigate_namespace

FinOps → namespace_cost

Security and auditors → audit_namespace

What's next

One last thing

Comments

More from this blog

What `investigate_namespace` actually does

What `namespace_cost` does

What `audit_namespace` does

Developers → `investigate_namespace`

Platform / SRE → `investigate_namespace`

FinOps → `namespace_cost`

Security and auditors → `audit_namespace`