There are numerous tools available to choose from in the network monitoring realm. This space includes historically popular tools such as SolarWinds as well as custom-built in-house ones tailored to specific business uses. Each network monitoring tool has typically mastered a distinct domain. Some excel in the variety of data collection, some in documentation while there are others that make troubleshooting easier.
Extensive monitoring alone is not enough for today’s complex network environment. Network monitoring should also complement network automation. After numerous interactions with customers and prospects, I have identified seven essential requirements designed to deliver a comprehensive network monitoring strategy.
1. Extensive Collection Capabilities
Most of today’s networks, large and small, are filled with devices across multiple vendors. Routers from Cisco and Juniper, firewalls from Palo Alto and A10 Networks, and load balancers from F5 are a common occurrence. While SNMP, SNMP Traps and Syslog have been the traditional go-to’s for data collection, streaming telemetry is the future. Unlike SNMP, streaming telemetry doesn’t involve constant polling for information. The device pushes the subscribed data continuously at the configured interval. Thus, massive amounts of data can be collected in a short span of time increasing the efficiency of the monitoring solution. Consequently, all network monitoring strategies should include an advanced monitoring tool that has the capability to collect data across various sources.
2. Detailed Visualization
Collected and stored data is useless unless accompanied by simple and customizable visualization. Data has to be collated and interpreted in order to take actionable insights through various formats such as pie and bar charts, heatmaps, and stacked graphs. The metrics to be monitored will vary with every network and every business based on need. However, any visualization should be easily customizable to suit the specific business use case. Converting present and historical data into downloadable reports can reveal various hidden issues in the network. Generating and emailing diverse customizable reports automatically also helps administrators keep a close watch on their network.
3. Baselining & Alert Routing
Identifying anomalies provides vital benefits to any administrator. However, to identify anomalies accurately, baselining of the network is essential. Consider an alert threshold of 70% for a router CPU. This threshold usually corresponds to max CPU allowed during peak hours. However, this threshold fails to determine CPU spikes during off-peak hours. If the average off-peak CPU is 40%, then a CPU spike of 60% should be considered an anomaly and trigger alerts even though it is within the prescribed limit. This kind of advanced alerting is possible only after studying and effectively baselining the network across various parameters. Routing of alerts across various channels is equally important. Traditionally administrators preferred emails, but today Slack and other collaboration tools have gained popularity. Network monitoring solutions should also publish to various notification methods.
4. Event Correlation and Alert Grouping
Noise and incessant chatter within the network distract administrators from troubleshooting efforts. A route failure may trigger alerts from all surrounding devices but identifying a needle in the haystack is complex. Grouping alerts based on time or by similarity is important to reduce this excessive clutter. Identifying and consolidating related events across the network also helps administrators narrow root cause faster thereby reducing MTTR. Subsequently, its critical to identify and adopt solutions with comprehensive event correlation and alert grouping capabilities.
5. Monitored and Automated Troubleshooting
Open-loop systems have become a relic of the past. Any network monitoring strategy should evolve into closed loop automation. Monitoring thresholds and identifying network and device issues alone do not adequately address administrators’ concerns in today’s networks. Monitoring solutions are expected to resolve frequently occurring and known configuration, compliance and network problems. Network administrators need not only a monitoring solution to identify a configuration drift or a non-compliant configuration, but also alert, recommend fixes, seek approvals and generate comprehensive reporting automatically.
6. Integration with the Network Ecosystem
Any monitoring strategy should encompass the entire network. Network monitoring solutions have to integrate with business process entities for a seamless end-to-end operation. Identifying an anomaly should also trigger the monitoring solution to raise tickets automatically on ServiceNow, Jira, BMC Remedy or any other business process element. Similarly, on resolving a known issue, the solution should close the ticket automatically. Network operation workflows and business process workflows must also unify delivery of services rapidly and provide an enhanced user experience.
Today’s complex networks demand a comprehensive end-to-end network monitoring and closed loop automation platform. Unlike the past, a network monitoring strategy cannot be relegated to visualization alone. Monitoring should also include notification capabilities, reduction of alert clutter and auto-remediation of simple and complex troubleshooting scenarios.
7. Scalability for Hybrid Multi-Cloud Deployments
Most organizations have a hybrid multi-cloud strategy. Given security concerns, part of data centers are still on their premise, but most reside in multiple public clouds for operational efficiency. In a hybrid multi-cloud environment, monitoring and effective visibility of the network directly impacts billing and operation expenditures. A single advanced monitoring solution to monitor on-premise as well as multi-cloud deployments are essential for complete network visibility.
Through the application of these key strategies, we have helped our customers transition from a simple point-to-point monitoring framework to an advanced automation platform. The increasing complexity of networks are driving network administrators towards data consolidation and tool footprint reduction. Monitoring is no longer a standalone deliverable. Rather, it goes hand-in-hand with automation. A forward looking monitoring strategy must focus on unifying isolated monitoring and automation functions to provide a complete end-to-end closed-loop automation framework.