HTTPS Certificate Expiry Monitoring Script

Title: HTTPS Certificate Expiry Monitoring Script

1. Introduction

TLS certificates are a fundamental security layer for any HTTPS-enabled service. When a certificate expires—or worse, is revoked—clients will fail to connect, user-facing applications will break, and integrations may stop working unexpectedly. Monitoring certificates proactively helps avoid outages by alerting operations teams before a certificate becomes a problem.

This custom command script for NetBeez agents automatically checks multiple HTTPS endpoints, validates their certificates, determines time remaining before expiration, and detects critical states including:

  • Expiring soon (within a configurable number of days)
  • Expired (already invalid)
  • Revoked (detected via OpenSSL OCSP)
  • Errors (network failures, TLS errors, parsing issues)

When any endpoint is in a bad state, the script returns an error status and prints a readable summary listing affected hosts. Numeric metrics are also printed for ingestion into NetBeez dashboards.

This makes it easy to continuously monitor certificate status across internal services, public APIs, VIPs, and load balancers—especially in distributed or zero-trust environments where agents run across multiple network segments.

2. Use Case

Modern networks rely heavily on HTTPS for application traffic, APIs, microservices, and administrative interfaces. Certificates must be renewed before they expire and must be validated regularly to ensure they haven’t been revoked.

Common scenarios where this script is valuable:

  • Monitoring certificates on internal services (e.g., Jira, Confluence, Kubernetes dashboards)
  • Catching early renewals that failed or did not propagate through load balancer nodes
  • Detecting revoked certificates that would break production services
  • Auditing certificate lifecycles for compliance-sensitive environments
  • Proactively alerting before customer‑facing applications go down

By running this script at regular intervals from NetBeez agents, engineering and SRE teams get continuous, distributed verification of certificate validity.

3. The Script

How the Script Works (Summary)

This script processes each HTTPS endpoint in four steps:

  1. Connects to the host using TLS and retrieves the certificate.
  2. Parses the certificate’s expiration date to calculate how many days remain.
  3. Uses OpenSSL OCSP checks to detect whether the certificate has been revoked.
  4. Classifies each host into: expiring soon, expired, revoked, or error.

For every host, it prints a key=value metric (the number of days until certificate expiration) and then computes summary totals. If any host is in a bad state, it prints a final ERROR: line identifying the affected endpoints.

Below is the Python script that checks multiple HTTPS endpoints and outputs standardized metrics.

It provides:

  • Days until expiry (or a sentinel value if unavailable)
  • Summary counts: totalchecked, totalexpiring, totalexpired, totalrevoked, totalerror
  • A human-readable ERROR: message when any issue is detected

The script also uses OpenSSL to detect certificate revocation via OCSP stapling.

#!/usr/bin/env python
# NetBeez Custom Command: HTTPS certificate expiry check for multiple URLs
# - Prints numeric metrics as key=value (one per line)
# - If any host is expiring/expired/revoked or errors, prints a final ERROR message
# - Exit 1 on expiry/error; Exit 0 otherwise

import socket
import ssl
import sys
import subprocess
from urllib.parse import urlparse
from datetime import datetime, timezone

# === Define your HTTPS URLs here ===
URLS = [
    "https://jira.netbeez.net:8443",
    "https://ims.netbeez.net",
    "https://www.netbeez.net",
    "https://example.com",
    "test-ev-rsa.ssl.com",
    "expired-rsa-dv.ssl.com",
    "revoked-rsa-dv.ssl.com",
]

TIMEOUT_SECS = 8
EXPIRY_THRESHOLD_DAYS = 7

def sanitize_key(s):
    return "".join(ch for ch in s if ch.isalnum()) or "host"

def host_port_from_url(u):
    # Support both full URLs and plain host[:port]
    if "://" not in u:
        host, _, port_str = u.partition(":")
        host = host.strip()
        port = int(port_str) if port_str else 443
        return host, port
    p = urlparse(u)
    host = p.hostname
    port = p.port or 443
    return host, port

def is_cert_revoked_openssl(host, port):
    """
    Best-effort revocation check using OpenSSL OCSP stapling:
    returns True if output suggests the cert is revoked, False otherwise.
    """
    try:
        cmd = [
            "openssl",
            "s_client",
            "-connect",
            f"{host}:{port}",
            "-servername",
            host,
            "-status",
            "-verify_return_error",
        ]
        proc = subprocess.run(
            cmd,
            input=b"Q",
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            timeout=TIMEOUT_SECS,
        )
        out = proc.stdout.decode(errors="ignore").lower()
        # Typical lines include "Cert Status: revoked" when revocation is detected
        if "cert status: revoked" in out or "revoked" in out:
            return True
    except Exception:
        # On any failure, just say "not detected as revoked"
        pass
    return False

def check_cert(host, port):
    """
    Return dict with:
      status: ok | expiring | expired | revoked | error
      days:   int or None (days until expiry; negative if already expired)
    """
    verify_error_reason = None
    cert = None

    try:
        ctx = ssl.create_default_context()
        with socket.create_connection((host, port), timeout=TIMEOUT_SECS) as sock:
            with ctx.wrap_socket(sock, server_hostname=host) as ssock:
                cert = ssock.getpeercert()
    except ssl.SSLCertVerificationError as e:
        verify_error_reason = str(e).lower()
        # Try again without verification to at least read the certificate dates
        try:
            unverified_ctx = ssl._create_unverified_context()
            with socket.create_connection((host, port), timeout=TIMEOUT_SECS) as sock:
                with unverified_ctx.wrap_socket(sock, server_hostname=host) as ssock:
                    cert = ssock.getpeercert()
        except Exception:
            cert = None
    except Exception:
        return {"status": "error", "days": None}

    days = None
    if cert:
        not_after = cert.get("notAfter")
        if not not_after:
            return {"status": "error", "days": None}
        try:
            expires = datetime.strptime(
                not_after, "%b %d %H:%M:%S %Y %Z"
            ).replace(tzinfo=timezone.utc)
            now = datetime.now(timezone.utc)
            days = int((expires - now).total_seconds() // 86400)
        except Exception:
            return {"status": "error", "days": None}

    # First, try to detect revocation explicitly via OpenSSL
    revoked = is_cert_revoked_openssl(host, port)

    if revoked:
        status = "revoked"
        return {"status": status, "days": days}

    # Then classify based on verification error / days
    if verify_error_reason:
        if "revoked" in verify_error_reason:
            status = "revoked"
        elif "expired" in verify_error_reason:
            status = "expired"
        else:
            if days is not None and days < 0:
                status = "expired"
            elif days is not None and days <= EXPIRY_THRESHOLD_DAYS:
                status = "expiring"
            elif days is not None:
                status = "ok"
            else:
                status = "error"
    else:
        if days is None:
            status = "error"
        elif days < 0:
            status = "expired"
        elif days <= EXPIRY_THRESHOLD_DAYS:
            status = "expiring"
        else:
            status = "ok"

    return {"status": status, "days": days}

def main():
    seen_keys = set()

    expiring_hosts = []
    expired_hosts = []
    revoked_hosts = []
    error_hosts = []

    totalchecked = 0
    totalexpiring = 0
    totalexpired = 0
    totalrevoked = 0
    totalerror = 0

    for url in URLS:
        host, port = host_port_from_url(url)
        key_base = sanitize_key(f"{host}{port}days")
        key = key_base
        i = 2
        while key in seen_keys:
            key = f"{key_base}{i}"
            i += 1
        seen_keys.add(key)

        result = check_cert(host, port)
        status = result["status"]
        days = result["days"]

        totalchecked += 1

        # Per-host numeric metric:
        #   days until expiry when known
        #   -2 for expired when days is unknown
        #   -3 for revoked when days is unknown
        #   -1 for generic error / unknown
        if days is not None:
            print(f"{key}={days}")
        else:
            if status == "expired":
                print(f"{key}={-2}")
            elif status == "revoked":
                print(f"{key}={-3}")
            else:
                print(f"{key}={-1}")

        if status == "expiring":
            totalexpiring += 1
            expiring_hosts.append(host)
        elif status == "expired":
            totalexpired += 1
            expired_hosts.append(host)
        elif status == "revoked":
            totalrevoked += 1
            revoked_hosts.append(host)
        elif status == "error":
            totalerror += 1
            error_hosts.append(host)

    # Summary metrics
    print(f"totalchecked={totalchecked}")
    print(f"totalexpiring={totalexpiring}")
    print(f"totalexpired={totalexpired}")
    print(f"totalrevoked={totalrevoked}")
    print(f"totalerror={totalerror}")

    # Build ERROR message if needed
    if totalexpiring > 0 or totalexpired > 0 or totalrevoked > 0 or totalerror > 0:
        parts = []

        if expired_hosts:
            parts.append(
                "These hosts have expired certificates: " +
                ", ".join(expired_hosts)
            )

        if expiring_hosts:
            parts.append(
                "These hosts have certificates that will expire soon: " +
                ", ".join(expiring_hosts)
            )

        if revoked_hosts:
            parts.append(
                "These hosts have revoked certificates: " +
                ", ".join(revoked_hosts)
            )

        if error_hosts:
            parts.append(
                "These hosts responded with an error: " +
                ", ".join(error_hosts)
            )

        print("ERROR: " + ". ".join(parts))
        sys.exit(1)

    sys.exit(0)

if __name__ == "__main__":
    main()

4. Example Output

A typical execution produces output like:

jiranetbeeznet8443days=48
imsnetbeeznet443days=31
wwwnetbeeznet443days=73
examplecom443days=60
testevrsasslcom443days=230
expiredrsadvsslcom443days=-2
revokedrsadvsslcom443days=-3
totalchecked=7
totalexpiring=0
totalexpired=1
totalrevoked=1
totalerror=0
ERROR: These hosts have expired certificates: expired-rsa-dv.ssl.com. These hosts have revoked certificates: revoked-rsa-dv.ssl.com

Each result line follows the NetBeez parsing requirement of key=value with numeric values.

5. How to Use

Recommended schedule: Running this script once per day is sufficient for most environments, since certificate expiration is a slow-moving process. Daily checks provide ample time to renew certificates and react to revocation events.

To ensure visibility and timely notifications, users should attach an Alert Profile to the test. This will automatically notify the team whenever a certificate is expiring soon, already expired, revoked, or if the script encounters errors.

  1. Define the list of HTTPS hostnames you want to monitor inside the script, then copy it onto a NetBeez agent (in /tmp/). To easily paste the script via the interactive console, you can use:

    cat > /tmp/cert_check.sh << 'EOF'
    # Paste the full script here
    EOF
    
  2. Make it executable:

    chmod +x cert_check.py
    
  3. Run it manually to validate settings:

    ./cert_check.py
    
  4. Add it as a Custom Command in the NetBeez dashboard.

  5. Schedule it to run periodically (e.g., every 30 minutes or hourly).

  6. Alerts are automatically triggered based on the errors reported by this test once an Alert Profile is attached (for example, when any of totalexpired, totalrevoked, totalexpiring, or totalerror is non-zero).

This ensures NetBeez notifies your team before certificates become service-impacting.

6. Closing Remarks

Certificate-related outages are entirely preventable with proper monitoring. This script, combined with NetBeez’s distributed agent architecture, provides real-time visibility into certificate health across multiple networks and data centers.

By integrating certificate checks into your monitoring pipeline, you reduce operational risk, prevent downtime, and maintain strong security hygiene across your infrastructure.

Let me know what you think in the comments! Any adjustments we could make?

1 Like