...

Blogs

ATOM is a NetDevOps Game Changer – Part II 

Introduction

When it comes to management systems, most enterprises and communication service providers typically deploy a mix of tools procured externally as well as developed in-house. More often than not, many internally developed core tools store network data. There are several reasons for that. First, the need for control over the development process. Some additional considerations include time to market, time to revenue, a better understanding of problems that need to be solved, and a lack of off-the-shelf software that adequately addresses the issues.

However, this practice has its downsides. One is using legacy technologies to collect network configuration and metrics. After all, network operators can’t become software companies overnight. Additionally, the underlying technology is generally old, not just within the tool itself but also in how the tools interact with network components. For example, CLI, developed mainly for human interaction with devices, is still the primary way of collecting information from network devices even though technologies like netconf, restconf, etc., are more efficient and have existed for a decade. Today’s devices can also act like web servers and transact more structured payloads like XML and JSON to transfer configuration and operational data. At the end of the day, many enhancements are readily available, but adoption seems very slow. As a case in point, Cisco enabled netconf and restconf in its complete range of enterprise routers via the earliest versions of IOS-XE years ago, but it is very rare to see this enabled and used even today.

Some of the lack of adoption can be attributed to lower general awareness and the disconnect between network engineering and software engineering. Almost every vendor has attempted to create awareness, but there are very few takers despite a strong network engineering staff.

So what’s the solution? Network automation from the likes of companies such as  Anuta Networks. I would like to share a personal experience with configuration compliance. 

Configuration Compliance

Many enterprise customers generally opt for a CSP-installed Layer3 CE router for last-mile connectivity, given the simplicity that comes in a managed service. 

These routers are often configured by team members, and groups using Notepad or Notepad++. Config templates are usually textual commands and can easily range from 700 to 2000+ lines. Often doubling if the site has dual connectivity with 2 CE and 2 PE combinations.

Consequently, the probability of missing lines or getting a few variable inputs wrong is very high in such a scenario. There are also other elements that can create failure points, such as lower or upper casing names for descriptions, converting bandwidth units for commands and interface descriptions, and more. There is a high probability that even the most experienced team members can get one or two of these things wrong.

The Jinja2 templating that I referenced previously did help, but that was a one-off function for the customers that I was handling. It was not a company-wide solution. These errors are sometimes caught while applying device configs, but they can waste time with the need for troubleshooting during migration windows. The worst case scenario is when someone else in the team must troubleshoot a config error several weeks or months down the line when an anomaly occurs. The config error doesn’t facilitate proper failovers, and the quality of service can degrade. After hours of troubleshooting in a severely escalated case, the reason comes out to be an error in the config, and your team or company is put in a face-saving mode in front of the customer.

My company had a workaround solution, but it was flawed for many reasons.  It involved running-config of the device and analyzing it against a set of rules written in bash scripts. These rules were managed by a central team, and this was the logical reason for some of the shortcomings in the solution. Let me explain:

If you examine a config for a CE Router, it has three components at a high level. The common config portion includes SNMP, NTP, TACACS, Banner, COS Policies, and more. The WAN config portion has interface and routing protocol level config information vital for communication with the PE Side. The LAN config portion includes the interface and protocol level config details for the customer connection with the CE Router. The central team had excellent knowledge of the first portion as it was part of the company-wide policy and applied to all customer CE Routers. 

The second portion (WAN side) tended to be similar but had, in some cases, very different-looking configs between various enterprise customers. For example, some designers decided to use prefix-list directly on WAN BGP neighbor, whereas others used route-maps applied to BGP neighbor, which had prefix-list, tags, and ACLs inside them to map various routes. So, this was an example of an instance where the central team’s policies performed reasonably well. However, the central team was performing poorly in the LAN portion of the config. This config was customizable as some connected routers on the LAN side. In contrast, some customers had switches (in both L2 and L3 capacity), and others even had Firewalls. This caused so many permutations and combinations of protocols and config sections that central teams could not comprehend the entire chain.

                                                                             Fig: CE Lan Setups

In the situation above, it would be great if the team that prepares this different LAN config for every customer and the central team could work together much closer in a single tool with an easy-to-use interface. This eliminates the need to have everyone go through a central team to build policies for every customer. The teams responsible for a particular customer could instead focus on building compliance policies specific to that customer. They could also import some policies for the common config portion built by the central team much more easily. The end result would equate to every device being in the ideal state of configuration with much less effort and time.

In contrast, Anuta Networks ATOM provides an optimal solution:

At a high level, the config compliance feature in ATOM has five steps to accomplish things more efficiently.

  1. Compliance Policies: Policies are built around different config components like NTP, SNMP, Route policies, routing protocol config, and more. Besides building compliance rules using regular expressions and Jinja templates, one can define the nature of the alert (CRITICAL, MAJOR, and more.) to be raised, and even the FIX CLI/NETCONF payload can be generated using Jinja templating.

2. Compliance Profiles and Execution: ATOM allows selecting a group of policies in a profile so that the complete config of a device in a particular role within the network can be audited.

3. Compliance Reporting: Once all the policies that are part of a profile are matched against a device’s config, reports are generated indicating the state of config compliance for the device.

4. Compliance Remediation: Once reports are generated, complete remediation options, such as the ability to bring devices in compliance with policy rules manually or automatically, can be realized. It can also be accomplished in a scheduled fashion.

5. Config Compliance Dashboard: This view makes the holistic information related to config compliance for a particular network much more understandable and actionable with the help of different graphs and tables.

Let’s try to understand these capabilities with an example. Let’s say you had a site with dual WAN connectivity, i.e., two routers, one with an MPLS connection and the other with an Internet connection.

The connectivity and protocols are shown in the diagram above. Now, these routers will have the same three config portions described above but with or without variability. The compliance policies created for the common config portion will be the same in both routers, and thus, the policies created for things like NTP, SNMP, TACACS, and AAA can be used as it is for both routers.

Now, in the WAN config portion, the internet-connected router has some extra config related to the tunnel interface and local internet breakout, so except for policies pertaining to these elements, policies should be more or less the same.

Now, depending on the LAN setup at the site, as explained earlier, policies created for the LAN config portion should be the same for the routers because minor changes in config arising due to the primary and secondary nature of the two devices can be handled within the same policy using regular expressions and Jinja templating. Thus, the config compliance structure is built to allow for the re-use of policies with very minor to no modifications and is very hierarchical in nature, as shown in the diagram. ATOM’s intuitive GUI also allows policies and profiles to be created quickly and easily. 

I have taken examples of CLI commands-based config compliance, but every feature shown above can be run with XML Config Payload for devices that support NETCONF/RESTCONF protocol.

Conclusion

I hope this blog demonstrates how ATOM could have helped my team when reports in CSV format were dumped weekly for hundreds of routers with thousands of config policies that required manual remediation. This job was so dull and mundane that we had to divide it amongst ourselves so that every team member had a fair share of exciting and tedious work. In contrast, ATOM provides a better approach. To see it in action, please watch the following video and visit www.anutanetworks.com to learn more.

About Author

You will also like...