What is an Application Health Check?

An Health Check measures the health status of an application.

A check can be seen as a ping command: an healthy response means that the application connects to the service and that the service is operational, a failing check won’t tell you what is wrong, but it can quickly point you in the right direction1!

The missing piece

Prometheus is a powerful monitoring system that gives you an insight of your application, and comes with different metric types.

Because of its simplicity (but also beauty) it doesn’t come with an Health Check System, the only information available is the automatically generated time serie up which is a gauge that monitors a job’s instance and can have two values:

  • 1 if the instance is healthy, i.e. reachable
  • or 0 if the scrape failed

What if…

…we want to monitor the health status of a service from which our java application depends?

With the Prometheus JVM Client we could build a probe using a gauge and programmatically set a value of 1 or 0 depending on whether the system responds or not. This solution come with a defect: it activates the “probe” action if and only if an event triggers the gauge metric collection.

But what if we could not only abstract this mechanism but also automate (schedule?!) the collection of health checks metrics?

…Custom Collector

This task requires a “Custom Collector”.

Implementing the io.prometheus.client.Collector abstract class I’ve written the HealthChecksCollector, its duty is to keep a reference of every Health Check object and report the result to the CollectorRegistry using a gauge metric.

Every Health Check object must extends the HealthCheck abstract class, the HealthStatus check() method must perform the health check logic and returns the result using the HealthStatus enum (it can also throw an exception which will result in a failed health check).

You can find more details in the API Document.

Here is a simple example:

class DbHealthCheck extends HealthCheck {
  @Override
  public HealthStatus check() throws Exception {
    return checkDbConnection() ? HealthStatus.HEALTHY : HealthStatus.UNHEALTHY;
  }
  boolean checkDbConnection() {}
}

HealthChecksCollector healthchecksMetrics =
    HealthChecksCollector.newInstance().register();

DbHealthCheck dbHealthCheck = new DbHealthCheck();
FsHealthCheck fsHealthCheck = // ...

healthchecksMetrics.addHealthCheck("database", dbHealthCheck)
    .addHealthCheck("filesystem", fsHealthCheck);

Resulting in these samples:

# HELP health_check_status Health check status results
# TYPE health_check_status gauge
health_check_status{system="database",} 1.0
health_check_status{system="filesystem",} 0.0

Where can I find it?

Strengthened Projects is a GitHub organization where I’m going to put all my open-source software.

You can find more details about the project and other examples on its site. The artifacts are available in the Central Maven Repository using the dependency:

<dependency>
  <groupId>com.github.strengthened</groupId>
  <artifactId>prometheus-healthchecks</artifactId>
  <version>LATEST</version>
</dependency>

Let’s keep in touch

What do you think about this project?

Do you have any question/suggestion/criticism/bug report? Feel free to submit an issue or contact me through one of my social network accounts. I’ll be happy to hear your feedback!

  1. This is the reason health checks are recommended to be used for simple up/down checks and not for variable/metric-related checks.