Introduction
As your Network evolves and needs change, one of the biggest questions we’ve been asked is, what should we monitor? This overview should provide some insight, explain the thought process involved and provide some high level examples of targets and testing.
Targets vs Tests (Terminology)
When we refer to a target, we refer to every test and resource configured under it. A Target (named Google) contains one or more resources (e.g. mail.google.com and/or 8.8.8.8) and each resource can contain 1 or more tests (PING, DNS, HTTP, Traceroute & PA). So what you see here is 1 target, 4 resources (mail.google.com, 8.8.8.8, 8.8.4.4 & www.google.com) and 11 tests (4+2+2+3).
Types of Targets
Targets can typically be broken down into 4 categories: SaaS, Internal, External & Cloud. Each network is going to be a bit different in what the data looks like such as VPN/tunneling, SD WAN deployment, etc. But typically this is an overview of each:
- SaaS: Targets for popular applications such as Google, Zoom, Microsoft, Azure, AWS, etc.
- Internal: Monitoring switches or other networking devices, including other NetBeez Agents (mesh test)
- External: This is usually a data center or other external sites
- Cloud: AWS, Azure, Oracle, Google, etc
Mistakes to Avoid
One of the biggest mistakes most people make is to set up everything as a target. There are a few reasons why this is counterproductive.
- Information overload
- Excessive noise
- Poor experience
- Difficult to manage
NetBeez is an incredibly powerful tool that can spot some of the smaller blips in a network in as little as 25 seconds out of the box. Most applications, especially when monitoring over WiFi, will not have 100% availability/uptime. In many cases, this may not even have anything to do with your network. By default, a NetBeez agent will conduct 17,280 pings per test, per agent, per day. Even at 99.9% uptime that is still 17 failures per agent, per test, per day. If you have 20 agents running 20 ping tests each, that is almost 7 MILLION data points per day.
So the goal is to focus on targeting key applications and what is important to your users. Ask yourself, if this resource is not reachable in 25s (we recommend 60s for noncritical tests), would I want to be notified? If the answer is universally no, then that target may not be the right target to monitor. There are some cases where you may configure a target and not be notified for situations like having the data for troubleshooting user tickets. But we recommend keeping this brief.
What Should I Target?
While there is not a one size fits all approach that works for everyone, below are some suggestions based on extensive testing & feedback.
- Monitor Google or any extremely high uptime/availability resource
- VoIP applications
- Gateway test (especially for WiFi Agents/RWA)
- Your Cloud (You can also deploy an agent in the cloud and set it as a target too)
- Major SaaS applications used by your users
- Your datacenter
Here is an example of one of our dashboards that we use everyday. We have several duplicates such as Genesys and Ring Central. You’ll notice that tests like Gateway & VPN tests do not have DNS & HTTP (see more below). If you eliminated test examples, these would be the targets we would use: Google, Salesforce (or any CRM tool), VPN, Gateway test, Zoom, NetBeez (website) and our ticketing system.
One thing to remember is, do not just blindly assign PING, DNS, HTTP & PA to each target. For optimal performance, we recommend 50 tests per network agent, however each agent can run different tests from each other. This number is 25 for remote worker agents. So with that in mind, running an HTTP test to an FQDN or IP that has no loading mechanism such as a web page is a wasted test. This is what we typically see:
- SaaS: All 4
- VoIP applications: Ping & PA
- Cloud: Ping, DNS & PA (HTTP if hosting SaaS Apps in your cloud)
- High Availability tests: All 4
- Datacenter: Ping & PA
There are exceptions but these are a good guiding principle to start with and feel free to make changes along the way.
Traceroute vs Path Analysis
Traceroute is a great tool but most of our customers prefer path analysis. A traceroute is more times than not used in a troubleshooting sense vs a proactive sense because a singular traceroute occasionally having issues is often not an indication of any actionable problem. Path Analysis gives you a much larger view of pathing and potential issues. We also recommend against alerting for Traceroute but we will cover that in another post.
Conclusion
Less is more and in the case of NetBeez this is definitely the case. NetBeez offers precision metrics that can alert in a matter of seconds and provide extremely in-depth and detailed data to be analyzed. But it’s important to have NetBeez watching all of the ring things and not add unneeded complexity to your monitoring toolkit.