Buconos

Stop Sharing Context: How to Let Grafana Assistant Pre-Study Your Infrastructure for Faster Fixes

Published: 2026-05-12 19:56:29 | Category: Education & Careers

Introduction

When an unexpected alert fires, most engineers instinctively turn to their AI assistant for help. But without pre-loaded knowledge, the assistant requires extensive context sharing—what data sources are connected, which services are running, how they depend on each other. Every conversation starts from scratch, eating into valuable troubleshooting time. Grafana Assistant eliminates this friction by automatically building and maintaining a persistent knowledge base of your infrastructure before you ever ask a question. This guide walks you through setting up and leveraging that capability, so you can jump straight into fixing issues instead of wasting minutes explaining your environment.

Stop Sharing Context: How to Let Grafana Assistant Pre-Study Your Infrastructure for Faster Fixes

What You Need

  • A Grafana Cloud account (Free, Pro, or Advanced tier)
  • At least one Prometheus data source configured (for metrics)
  • Loki and Tempo data sources (recommended for logs and traces enrichment)
  • Grafana Assistant enabled in your stack (available in most plans; check your settings)
  • Admin or Editor permissions to modify Grafana settings

Step-by-Step Guide

  1. Enable Grafana Assistant
    Navigate to Administration → General → Grafana Assistant in your Grafana Cloud stack. Toggle the feature on if it isn't already enabled. This activates the background AI agents that will scan your infrastructure. (In some plans it's on by default; verify the status.)
  2. Connect All Relevant Data Sources
    Ensure your Prometheus, Loki, and Tempo data sources are properly configured in Configuration → Data Sources. The assistant automatically discovers every connected data source in your stack. For maximum context, include all Prometheus instances (metrics), Loki logs, and Tempo traces. No additional configuration or API keys are needed—the assistant uses existing connections.
  3. Let the AI Agents Work in the Background
    After enabling the assistant and confirming your data sources, the system runs a swarm of AI agents that:
    • Identify all Prometheus, Loki, and Tempo data sources in your Grafana Cloud stack.
    • Query Prometheus metrics in parallel to discover services, deployments, and infrastructure components.
    • Correlate logs (Loki) and traces (Tempo) with their corresponding metrics, enriching the knowledge base with log formats, trace structures, and service dependencies.
    • For each discovered service group, generate structured documentation covering: what the service does, its key metrics and labels, how it's deployed, its upstream/downstream dependencies, and relevant health indicators.
      This whole process happens automatically and continuously—no manual triggers required. Expect the first full scan to complete within a few minutes depending on your stack size.
  4. Verify the Knowledge Base
    After a short wait, test what the assistant knows. In the Grafana Assistant chat interface (accessible from the toolbar or directly via URL), ask a simple question like:
    • "What services are running on my infrastructure?"
    • "Which downstream services does my payment system depend on?"
    • "Show me the key latency metrics for the checkout service."
    If the assistant responds with accurate, detailed information without asking for context, the knowledge base is populated and ready for action.
  5. Use Assistant for Incident Response
    When an incident occurs, simply ask questions directly—no need to re-explain your setup. For example:
    • "Why is my checkout service slow?" – The assistant already knows its metrics live in a specific Prometheus data source and its logs are structured JSON in Loki. It will correlate metrics, logs, and traces to pinpoint the root cause.
    • "What upstream services could be causing errors?" – It knows dependencies from pre-scanned relations, so even if you're new to the system, you get accurate answers instantly.
    • "Show recent traces for the order service." – Traces are linked to metrics and logs without additional query construction.
    The assistant's pre-built context shaves valuable minutes off mean time to resolution (MTTR), especially for teams where not everyone has full infrastructure knowledge.

Tips for Optimal Results

  • Keep data sources comprehensive: The more Prometheus, Loki, and Tempo instances you connect, the richer the assistant's knowledge base becomes. Don't leave out secondary environments if they're part of your monitoring stack.
  • Ensure consistent naming conventions: Use meaningful service names and labels across metrics, logs, and traces. This helps the AI agents correlate entities accurately. Inconsistent naming can lead to missed dependencies.
  • Periodically review knowledge health: Grafana Assistant updates the knowledge base automatically, but check occasionally by asking broad questions like "List all my services." If something seems missing, verify that data sources are still correctly configured and accessible.
  • Combine with runbooks: While the assistant knows your architecture, it doesn't replace incident runbooks. Use it alongside your standard operating procedures for maximum efficiency.
  • Leverage for onboarding new team members: The persistent knowledge base is a powerful on-ramp for engineers unfamiliar with your infrastructure. Encourage them to ask questions via the assistant instead of bothering senior colleagues for basic context.
  • Report anomalies: If you notice the assistant giving outdated or inaccurate information, contact Grafana support. The AI agents rely on the latest data, but configuration changes may cause temporary inconsistencies.

By following these steps, you transform Grafana Assistant from a reactive helper into a proactive partner that already knows your infrastructure inside out. No more context sharing—just faster, smarter troubleshooting.