Displaying ASP.NET Core health checks with Grafana and InfluxDB

After finishing my writing about ASP.NET Core health checks I started finding a way how to visual health check results so I can display these on the wall mounted TV or big screen. This blog post introduces how to visualize ASP.NET Core health checks with Grafana and InfluxDB.

About TIG-stack

Visualizing side of systems monitoring can be done using open-source TIG-stack:

  • Telegraf – data collector, reports to Influxdb
  • InfluxDB – time series database, easy to use and intergrate
  • Grafana – web based reporting solution, reads data from Influxdb

I installed these tools to one of my test machines and created simple dashboard to show metrics of same box. It’s good to plan few hours of time to get everything running and to build some dashboards and see how things work.

Grafana dashboard

Grafana supports also automatic refreshing of dashboards and those who need more widgets or ready-made reports can download these from Grafana site.

What we are building

Our goal is to build web application that outputs granular health statuses and data collector application that reports these statuses to InfluxDB for Grafana.

Reporting ASP.NET Core health checks to Grafana

Telegraf is here to collect other metrics like CPU, memory and disk space. I don’t stop on these metrics in this post.

Preparing health checks

For this writing I will use ping-based health check from my blog post Avoiding ping flood in ASP.NET Core health checks.

public class PingHealthCheck : IHealthCheck
{
    private string _host;
    private int _timeout;
    private int _pingInterval;
    private DateTime _lastPingTime = DateTime.MinValue;
    private HealthCheckResult _lastPingResult = HealthCheckResult.Healthy();
    private object _locker = new object();


    public PingHealthCheck(string host, int timeout, int pingInterval = 0)
    {
        _host = host;
        _timeout = timeout;
        _pingInterval = pingInterval;
    }

    private bool IsCacheExpired()
    {
            return (_pingInterval == 0 || _lastPingTime.AddSeconds(_pingInterval) <= DateTime.Now);
    }

    public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken = default)
    {
        if(!IsCacheExpired())
        {
            return await Task.FromResult(_lastPingResult);
        }

        if(Monitor.TryEnter(_locker))
        {
            try
            {
                if (IsCacheExpired())
                {
                    PingService();
                }
            }
            finally
            {
                Monitor.Exit(_locker);
            }
        }

        return await Task.FromResult(_lastPingResult);
    }

    private void PingService()
    {
        try
        {
            using (var ping = new Ping())
            {
                _lastPingTime = DateTime.Now;

                var reply = ping.Send(_host, _timeout);

                if (reply.Status != IPStatus.Success)
                {
                    _lastPingResult = HealthCheckResult.Unhealthy();
                }
                else if (reply.RoundtripTime >= _timeout)
                {
                    _lastPingResult = HealthCheckResult.Degraded();
                }
                else
                {
                    _lastPingResult = HealthCheckResult.Healthy();
                }
            }
        }
        catch
        {
            _lastPingResult = HealthCheckResult.Unhealthy();
        }
    }
}

We can use this health check for multiple external services that we want to ping to make sure the machine is alive.

To format output I used the trick introduced by Dejan Stojanovic in his blog post Adding healthchecks just got a lot easier in ASP.NET Core 2.2.

public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
{
    // ...

    var options = new HealthCheckOptions();
    options.ResponseWriter = async (c, r) => {

        c.Response.ContentType = "application/json";

        var result = JsonConvert.SerializeObject(new
        {
            status = r.Status.ToString(),
            errors = r.Entries.Select(e => new { key = e.Key, value = e.Value.Status.ToString() })
        });

        await c.Response.WriteAsync(result);
    };

    app.UseHealthChecks("/hc", options);

    // ...
}

This trick gave me the following ping check output.

{
  "status": "Healthy",
  "errors": [
    {
      "key": "ping1",
      "value": "Healthy"
    },
    {
      "key": "ping2",
      "value": "Healthy"
    }
  ]
}

It’s okay but there are things I want to change to make reading and reporting of health checks easier.

Formatting health checks for reporting

To report health checks to InfluxDB I found it to be easier if I output results as an JSON array. Grafana loves numbers and instead of names of health status I went with integer values from HealthStatus enum.

public enum HealthStatus
{
    //
    // Summary:
    //     Indicates that the health check determined that the component was unhealthy,
    //     or an unhandled exception was thrown while executing the health check.
    Unhealthy = 0,
    //
    // Summary:
    //     Indicates that the health check determined that the component was in a degraded
    //     state.
    Degraded = 1,
    //
    // Summary:
    //     Indicates that the health check determined that the component was healthy.
    Healthy = 2
}

I created DTO class for array elements like shown here.

public class ServiceStatus
{
    public string Service { get; set; }
    public int Status { get; set; }
}

Health checks are set up and configured in Startup class like shown here.

public void ConfigureServices(IServiceCollection services)
{
    services.AddHealthChecks()
            .AddCheck("ERP", new PingHealthCheck("www.google.com", 100))
            .AddCheck("Accounting", new PingHealthCheck("www.bing.com", 10))
            .AddCheck("Database", new PingHealthCheck("www.__Dbing1.com", 100));

    services.AddControllersWithViews();
    services.AddRazorPages();

    services.AddSingleton<IAlertService, EmailAlertService>();
    services.AddScoped<IAlertService, SmsAlertService>();
}

public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
{
    if (env.IsDevelopment())
    {
        app.UseDeveloperExceptionPage();
    }
    else
    {
        app.UseExceptionHandler("/Home/Error");
    }

    var options = new HealthCheckOptions();
    options.ResponseWriter = async (c, r) => {

        c.Response.ContentType = "application/json";
        var result = new List<ServiceStatus>();
        result.Add(new ServiceStatus { Service = "OverAll", Status = (int)r.Status });
        result.AddRange(r.Entries.Select(e => new ServiceStatus { Service = e.Key, Status = (int)e.Value.Status }));

        var json = JsonConvert.SerializeObject(result);

        await c.Response.WriteAsync(json);
    };

    app.UseHealthChecks("/hc", options);
    app.UseStaticFiles();
    app.UseRouting();
    app.UseAuthorization();

    app.UseEndpoints(endpoints =>
    {
        endpoints.MapControllerRoute(
            name: "default",
            pattern: "{controller=Home}/{action=Index}/{id?}");
        endpoints.MapRazorPages();
    });
}

Take a look at how ResponseWriter is defined for HealthCheckOptions. I will create a list of ServiceStatus first. The first row in list is overall health status of system. Next rows are statuses returned by individual health checks. In the end I serialize this list to JSON and write to response stream. Here’s the result.

[
  {
    "Service": "OverAll",
    "Status": 0
  },
  {
    "Service": "ERP",
    "Status": 2
  },
  {
    "Service": "Accounting",
    "Status": 2
  },
  {
    "Service": "Database",
    "Status": 0
  }
]

With this work done we can start working on data collector.

Sending health checks to InfluxDB

As I’m too noobie for TIG-stack and I don’t know much about internals of Telegraf I decided to write simple data collector on C#. I can run with Windows Task Scheduler by example. Also reporting data to Influxdb is easy. It’s just a simple HTTP POST-request with simple formatted data.

class Program
{
    private const string HealthCheckUrl = "http://localhost:52494/hc";
    private const string InfluxdbWriteUrl = "http://192.168.10.117:8086/write?db=telegraf";
    private const string WebHostName = "gpf1";

    private class HealthCheckResult
    {
        public string Service { get; set; }
        public int Status { get; set; }
    }

    static async Task Main(string[] args)
    {
        var statuses = await GetHealthStatus();

        await PostToInfluxDb(statuses);
    }

    private static async Task<List<HealthCheckResult>> GetHealthStatus()
    {
        using (var client = new HttpClient())
        {
            var response = await client.GetAsync(HealthCheckUrl);
            var json = await response.Content.ReadAsStringAsync();

            return JsonConvert.DeserializeObject<List<HealthCheckResult>>(json);
        }
    }

    private static async Task PostToInfluxDb(List<HealthCheckResult> statuses)
    {
        foreach (var status in statuses)
        {
            var body = $"health,host={WebHostName},service={status.Service} value={status.Status}";
               
            using (var content = new StringContent(body))
            using (var client = new HttpClient())
            {
                var response = await client.PostAsync(InfluxdbWriteUrl, content);
            }
        }
    }
}

It would be polite to read settings from configuration file but let’s keep things simple until everything works as expected.

Building health checks dashboard

As health checks data is running to InfluxDB from our small data collector it’s time to build dashboard on Grafana. This is how my demo dashboard looks like.

Grafana dashboard with ASP.NET Core health checks

Singlestat panel on top shows the overall health status of system. Smaller ones below show health status of their specific components or services. This way it is easy to see which external dependencies or components are actually problematic or failing.

For every singlestat panel we have to configure metrics, options and value mappings. I will show on screenshots the configuration for overall health status. Configuring other panels is similar – just change the service name.

At metrics tab we must define query that provides us with data. We have to specify OverAll as service and in my case gpf1 as host.

Configuring singlestat metrics for health checks

Options are more tricky. I reverted coloring so green is in the end of scale. To show slices of values on graph filled like on screenshot above I defined graph range from –1 to 2 and assigned 0.001 and 1.001 as thresholds.

Configuring singlestat options for health checks

Value mappings let us use other values instead of ones that come in with query. Remember that our health statuses come in as integer values. On graph we want to show status names instead of numbers.

Configuring singlestat value mappings for health checks

After saving singlestat panels and dashboard hit click on Refresh icon to refresh the dashboard. It’s possible to make dashboard automatically refresh itself and dashboard like this can be shown on big screens on the wall.

Wrapping up

It’s not hard to get ASP.NET Core health checks data to Grafana dashboard. Instead of Telegraf we built our own small data collector between ASP.NET Core web application and InfluxDB. It was easy on Grafana to build the dashboard for health checks and make it look nice by configuring few settings. Now we have nice dashboard of health checks to show on office wall.

Gunnar Peipman

Gunnar Peipman is ASP.NET, Azure and SharePoint fan, Estonian Microsoft user group leader, blogger, conference speaker, teacher, and tech maniac. Since 2008 he is Microsoft MVP specialized on ASP.NET.

    Leave a Reply

    Your email address will not be published. Required fields are marked *