Avoiding ping flood in ASP.NET Core health checks
Thing I left out from my post about ASP.NET Core health checks was the old legacy system we all know. It works and nobody wants to touch it. Other systems must be very careful with it because it is easy to break it down with load. Of course, there’s no way for us to replace or fix the elder monster. Here’s how to make sure we don’t take it accidentally down with too frequent ping checks or ping flood.
Quick jump to ASP.NET Core health checks
My previous health checks blog post introduced configurable ping health check. When called then it pings given address using ping timeout to avoid piling of health check requests when other end is sometimes too slow to answer.
public class PingHealthCheck : IHealthCheck
{
private string _host;
private int _timeout;
public PingHealthCheck(string host, int timeout)
{
_host = host;
_timeout = timeout;
}
public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken = default)
{
try
{
using (var ping = new Ping())
{
var reply = await ping.SendPingAsync(_host, _timeout);
if (reply.Status != IPStatus.Success)
{
return HealthCheckResult.Unhealthy();
}
if (reply.RoundtripTime >= _timeout)
{
return HealthCheckResult.Degraded();
}
return HealthCheckResult.Healthy();
}
}
catch
{
return HealthCheckResult.Unhealthy();
}
}
}
Here is how I added ping checks to application Startup.
public void ConfigureServices(IServiceCollection services)
{
// ...
services.AddHealthChecks()
.AddCheck("ping1", new PingHealthCheck("www.google.com", 100))
.AddCheck("ping2", new PingHealthCheck("www.bing.com", 100));
// ...
}
In Configure method I told ASP.NET Core to use health checks with /hc end-point.
public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
{
// ...
app.UseHealthChecks("/hc");
// ...
}
To try out health check with failed status then change host name in one ping check to something that doesn’t exist. If one of checks fails then the whole check is considered as failed. Same goes for degraded status.
Avoiding ping flood by caching ping check results
Let’s comme back to our old system that cannot handle too much load and that must be handled with care. We have to make sure that our elder monster is not pinged too often. Same time we may have other checks that can be run more frequently and we don’t want to stop effective monitoring just because of this one legacy system. What can we do?
We can use local ping health check cache. If cache is not expired then we serve last status from cache when health checks are asked. For this I added private attributes for last ping time, last ping result and ping interval.
public class PingHealthCheck : IHealthCheck
{
private string _host;
private int _timeout;
private int _pingInterval;
private DateTime _lastPingTime = DateTime.MinValue;
private HealthCheckResult _lastPingResult = HealthCheckResult.Healthy();
public PingHealthCheck(string host, int timeout, int pingInterval = 0)
{
_host = host;
_timeout = timeout;
_pingInterval = pingInterval;
}
public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken = default)
{
if(_pingInterval != 0 && _lastPingTime.AddSeconds(_pingInterval) > DateTime.Now)
{
return _lastPingResult;
}
try
{
using (var ping = new Ping())
{
_lastPingTime = DateTime.Now;
var reply = await ping.SendPingAsync(_host, _timeout);
if (reply.Status != IPStatus.Success)
{
_lastPingResult = HealthCheckResult.Unhealthy();
}
else if (reply.RoundtripTime >= _timeout)
{
_lastPingResult = HealthCheckResult.Degraded();
}
else
{
_lastPingResult = HealthCheckResult.Healthy();
}
}
}
catch
{
_lastPingResult = HealthCheckResult.Unhealthy();
}
return _lastPingResult;
}
}
Now we can configure ping check for legacy monster to use cache for given amount of seconds. We specify cache duration in ConfigureServices() method of Startup class.
public void ConfigureServices(IServiceCollection services)
{
// ...
services.AddHealthChecks()
.AddCheck("ping1", new PingHealthCheck("www.google.com", 100))
.AddCheck("ping2", new PingHealthCheck("www.bing.com", 100, 30));
// ...
}
For simple cases where there is one client asking for health status it is enough.
Supporting multiple health check clients
We are not quite ready to support multiple clients that check our system health. We have caching but when cache is expired then it may happen that two requests for health check come in at almost same time and then multiple ping checks are run at same time. We have to let through one request that updates our cache. Same time other requests have to wait until cache is updated.
It’s not perfect solution but I will go here with something I call as sandwich caching. It comes with price of locking but for our scenario it’s okay. Remember how we worked with ASP.NET Forms and MVC cache in one box scenario? Exactly the same thing comes into use here. The trick is simple – we check if cache is valid. If it is then we just return cached result. If cache is outdated then we use lock. First requests gets the lock and updates cache. All other requests waiting for cache must check cache validity again because otherwise they will all update cache again.
public class PingHealthCheck : IHealthCheck
{
private string _host;
private int _timeout;
private int _pingInterval;
private DateTime _lastPingTime = DateTime.MinValue;
private HealthCheckResult _lastPingResult = HealthCheckResult.Healthy();
private static object _locker = new object();
public PingHealthCheck(string host, int timeout, int pingInterval = 0)
{
_host = host;
_timeout = timeout;
_pingInterval = pingInterval;
}
private bool IsCacheExpired()
{
return (_pingInterval == 0 || _lastPingTime.AddSeconds(_pingInterval) <= DateTime.Now);
}
public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken = default)
{
if(!IsCacheExpired())
{
return await Task.FromResult(_lastPingResult);
}
if(Monitor.TryEnter(_locker))
{
try
{
if (IsCacheExpired())
{
PingService();
}
}
finally
{
Monitor.Exit(_locker);
}
}
return await Task.FromResult(_lastPingResult);
}
private void PingService()
{
try
{
using (var ping = new Ping())
{
_lastPingTime = DateTime.Now;
var reply = ping.Send(_host, _timeout);
if (reply.Status != IPStatus.Success)
{
_lastPingResult = HealthCheckResult.Unhealthy();
}
else if (reply.RoundtripTime >= _timeout)
{
_lastPingResult = HealthCheckResult.Degraded();
}
else
{
_lastPingResult = HealthCheckResult.Healthy();
}
}
}
catch
{
_lastPingResult = HealthCheckResult.Unhealthy();
}
}
}
This is fastest we can do. If we don’t give timeout to Monitor.TryEnter() then method returns immediately no matter if it got the lock or not. We have additional try-finally block inside Monitor.TryEnter() check. It is recommended by Monitor.TryEnter method documentation to make sure that acquired locks are always released.
Why not async call to ping? There’s no point to use await in lock because it releases all locks. If we want locks to work we cannot use async. You can find out more from Monitor.TryEnter documentation.
Wrapping up
Keeping health checks fast and reliable isn’t always simple thing to do. When checking external dependencies we have to consider performance and load characteristics of external system but we cannot forget our own system. Both must stay healhty and cannot get heavy hits. Ping check against easy-to-destabilize legacy system was good example. We applied advanced internal caching to make sure that clients cannot bomb down the legacy system through ping health checks.
Hi Gunnar,
I think you can await the call to ping if you switch to a SemaphoreSlim. Could this be an option?
Shouldn’t _locker be static?
Yes, of course. Thanks for pointing out. Made a fix to code.
I think the timeout parameter is not useful the way it is used for signaling Degradation, since the ping status will never be successful if RoundtripTime > timeout. The default timeout is 5 seconds, so it would make more sense to also cap to 5 here (or some other hard limit), or introduce another parameter something like “maxHealthyResponseTime” that can be used to recognize “degradation”.
Hi, thanks for the article :)
But as already mentioned by Luis Barbosa you can use SemaphoreSlim and .WaitAsync to make your checks async and still guarantee at most one execution at a time.
Thanks for the article :)
I have noticed that the health check constructor is always called when I ‘refresh’ the health check endpoint.
Meaning we have a new instance for every health check (Is that assumption true?)
Therefore, the IsExpired, will return ‘true’ in that cases., and re-calculated with the ‘heavy-ping-logic’.
We can prevent it with a singelton cached result model, that will be injected to the health check constructor, and get updated with our ‘updated’ results.
What do you think?