I’m frequently asked about the best practice for monitoring. Partners often want to know if they should monitor a few things or if they should monitor everything. From talking with them, I’ve found partners fall into a few distinct categories when it comes to how they treat monitoring and alerting. Partners either:
- Monitor everything and alert on everything (which makes things “noisy”)
- Monitor everything but ignore 90% of it and only alert on the important things
- Monitor everything but tweak thresholds so only things they care about show as failed—and use the rest for data collection
- Monitor what they want and remove everything they find unnecessary or noisy
- Monitor the least amount possible
- Don’t alert on anything
What do you want to achieve with monitoring?
So what should you do? As usual there is no one-size-fits-all approach. Instead, you should decide what’s important to you, what you want to react to, and what data you want to collect. And your RMM platform will inform those decisions. Each platform has different monitoring and alerting capabilities, as well as different pre-built and custom monitoring features you can use or add.
You should review your platform’s monitoring and alerting capabilities, as well as any additional custom monitoring. For example, if you use SolarWinds® RMM or N-central®, check out the Automation Cookbook to see what’s available.
After seeing what you can monitor, determine what’s important to you. Let’s look at SQL Server monitoring as an example. Partners often tell me they get alerts for SQL monitoring but they turn it off because they don’t know how to act on it. If you don’t use SQL much, getting an alert stating the Buffer Cache Hit Ratio is below 70% may not mean much to you.
If the information isn’t important to you, decide whether you should have it for reporting purposes in case a customer calls, have it show as failed, or have it as something you should alert on it. Typically, monitoring more isn’t a bad thing. However, if you don’t plan to act on it, you should probably disable alerting and/or tweak any thresholds so you aren’t alerted.
How to decide what to monitor
To determine what you should monitor, look at the list of alerts in your RMM dashboard and figure out which ones are important to you. Then, look at the various services and service templates. See what is available and what you’re not using that you may want to use. Over the years, lots of MSPs have removed/disabled templates and services because they felt they were noisy or didn’t know what to do with them—so it’s good to revisit and see what you may be missing.
So where does that leave you? Personally, I like to monitor more rather than less if you already paid for the license to manage the device with your RMM. If that’s the case, adding more monitoring isn’t a cost, it’s a feature.
The biggest mistake I see partners make is leaving default thresholds or turning on alerting on everything. When they do that, it becomes too noisy and they tend to switch everything off. Most RMM and PSA integrations give you the flexibility to choose which monitoring will trigger alerts. This is where I recommend you only alert on what you care about but keep monitoring things so you can gather data for that phone call from your customer asking for help troubleshooting some obscure error that nobody knows how to fix.
In short, in my opinion, more is better—and if your RMM can do it, why not use it?
If you have suggestions as to other things to add to this, please reach out to me directly at [email protected] or on twitter at @automation_nerd.
If you have created an automation policy and would like to share it with the community, please feel free to email me at [email protected].
As always, don’t forget to go look in the Automation Cookbook at www.solarwindsmsp.com/cookbook if you are interested in other automation policies, script checks, and custom services.
Marc-Andre Tanguay is head automation nerd. You can follow him on Twitter at @automation_nerd.