Network OS upgrades constitute an imperative for the network. Learn how to do them well
IT stacks, from networks to servers, are always evolving. They cannot afford to stay the same. Factors, such as dynamic customer needs, rapidly mushrooming applications, and new threats add to the complexity that networks deal with. Software Image Management is a necessary exercise. Every organization has to go through continuous network OS upgrade in their networks now and then. In fact, for large organizations, this could be an ongoing activity wherein thousands of devices have to be addressed. It is a scenario where a lot of dedicated teams can be. So, what’s the challenge?
Network OS Upgrade – Why, after all?
Any software is amenable to changes and vulnerabilities over time. Whether it is new technology features or fresh ways of exploiting the gaps in software – it becomes critical that software gets an upgrade at regular intervals. It helps to preserve its robustness and relevance. The same applies to network devices. There are various reasons to upgrade the software on devices.
For example,
- EoS or EoL of Software releases
- Bug fixes
- New features
- When the base image on new devices is not the company-approved image
- Software compliance – when the version is kept consistent across the network
That said – an upgrade is not easy. It entails seriousness and details.
Challenges with Network OS upgrade
The challenges that emerge during an upgrade can be broadly classified into two categories: Device-specific or Organization-specific
Device-Specific
These challenges manifest at the device level:
- No two vendors are the same. Each vendor has a different set of procedures to upgrade one’s devices
- Also, each device platform for a given vendor can differ on upgrade procedures
- Access to devices can differ, such as – SSH, Telnet, API, and others
- Various OS Image transfer protocols can add complexity to this area – Like FTP, TFTP, SCP, SFTP, USB
- On some devices, the management station comes into play instead of a device-level execution
- Multi-stage upgrade – For certain vendors, upgrade between 2 different versions might require a multi-stage upgrade. Such upgrades can entail intermediate version upgrades.
- Some features might cover command syntax might change. Additional configuration could also be needed.
- Network outages might trigger a roll-back of the upgrade. In some cases, the roll-back procedures are different from the upgrade procedures.
Organization-Specific
Besides the hurdles outlined above, some other challenges can also transpire:
- Every organization has its own set of best practices or MoPs that consist of
- Pre-checks
- Post-checks
- Roll-backs on error
- Open tickets in the ticketing systems – this can manifest strongly in case of errors
- User intervention at critical stages – such as review before device & reload in case of errors
- Device role (Core, Distribution, Security Gateway, Server Farm etc) and their impact to business in case of outage impact the MoPs as well.
- Redundant Pair of devices do not get upgraded simultaneously.
- MoPs for critical devices with high business impact might have to be broken down into into low-impact versus high-impact activities.For example, Image upload which is a low impact activity can be done prior to a high impact activity ,such as, device reload.
- Devices can be put on an upgrade in a parallel mode, especially if organizations are working on tight timelines.
Does that mean enterprises struggle with these issues whenever they embrace an upgrade? Do they have a choice?
Can we automate? How
As is evident from the vast array of challenges discussed above that it is futile to expect one solution that fits all the requirements. At best, one can come up with the vendor or device-specific automation suites that can adapt as per an organization’s requirements.
What is required here?
Easy-to-code and simple-to-maintain
Enterprises can find relief with an automation platform that provides a framework to code the MoPs easily. One should be able to code using commonly-used programming languages or scripts. Maintaining these MoPs should be easy.
Scale and Stability
The platform must be highly scalable. It should support both sequential and parallel upgrades of hundreds of devices if required. The platform must be robust; it should be able to manage all kinds of scenarios.
Integrates with 3rd-party systems
It should be easy to integrate a solution with 3rd-party systems; a good platform must provide options to invoke external system APIs (Application Programming Interfaces).
It can run configuration-compliance and auto remediate violations
It would be a big bonus point if the platform also supports configuration-compliance that can run on the upgraded devices. Such capability is extremely useful for reporting any violations. It should also be able to remediate when the situation necessitates so
Time for the Deep-Dive: Network OS Upgrades using Anuta ATOM
It must be clear by now that a network OS upgrade process is fraught with many. That’s why one needs a platform that can give an enterprise the absolute confidence to navigate these challenges. Like Anuta ATOM.
Workflow automation in Anuta ATOM can design Network OS Upgrade MoPs efficiently. ATOM has all the capabilities and precursors required for a good automation platform.
For instance, consider this real-world scenario. ATOM is deployed at a major entertainment company to perform network OS upgrades. The customer was looking for a solution that supports multi-vendor network OS upgrades – Cisco, Juniper, F5, Palo Alto, and others. These were using different transport protocols – SSH, API, and more. It needed something that must be easy to automate their software MoPs. It needed a way that equips them to create and maintain these upgrade scripts independently. It needed to mitigate reliance on the vendor.
ATOM could easily convert Customer MoPs into Network OS Upgrade workflows. It took care of redundancies in the areas of pre-checks and memory requirements well. The platform also ensured that a clean-up of existing unused software packages was undertaken, if required. ATOM also took care of backup configurations. It brought in the capacity to capture the state of the device (that includes commands such as interface status and neighboring peers).
The image was then loaded on the device using various file transfer protocols, including FTP, SCP, and TFTP. ATOM also enabled configuration of timeouts into the workflow. This helped to accommodate any delays in the file uploads and reduce a time-consuming step during a network OS upgrade.
The workflow verified MD5 checksum of the OS image once the image was loaded and the boot statement was updated as required. Next, it proceeded to reload the device. Herein, it added a step to seek network operator approval before device-reload. ATOM would periodically check for the device to become reachable based on a user-defined timer. The workflow also was armed with the capability to listen to any notifications. It could pick, for instance, the SNMP-trap from the device when booting up. It could handle triggers of post-checks on the device. The solution could also compare the state of the device (before and after the upgrade) after the post-checks. It notified the status of the upgrade to the user. In case of an undesirable outcome, the ATOM platform also enabled the workflow to open a trouble ticket in a ticketing system like ServiceNow. In addition, the solution easily automated auto-rollback procedures using the workflow as well.
ATOM transforms the workflow into a more detailed, comprehensive, water-tight, and meticulous process. ATOM workflow supports parallel and sequential upgrades. Network administrators can now use the workflow diagrams as documentation for their internal change-approval processes.
ATOM, of course, supports Configuration Compliance. After the upgrade, the platform can run compliance policies on the devices. In case of any compliance issues, it can handle remediation actions.
Workflows created and designed through the ATOM platform are BPMN compliant. They support a variety of scripts such as Python, Groovy, Javascript, and others. Workflows also can invoke external scripts.
ATOM can deliver transformative results in the area of network OS upgrades.