Network Redundancy and Why It Matters (2024)

What is network redundancy?

Network redundancy is the process of providing multiple paths for traffic so that data can keep flowing even in the event of a failure. Put simply: more redundancy equals more reliability. It also helps with distributed site management. The idea is that if one device fails, another can automatically take over. By adding a little bit of complexity, we reduce the probability that a failure will take the network down.

But complexity is also an enemy of reliability. The more complex something is, the harder it is to understand, the greater the chance of human error, and the greater the chance of a software bug causing a new failure mode. So, when designing a network, it’s important to balance redundancy against complexity.

What are the different types of network redundancy?

There are two main forms that network redundancy can take. The first is fault tolerance, which uses full hardware redundancy—there’s at least one complete duplicate of the system hardware running side-by-side with the primary system. Should one system fail, the other will take over simultaneously, with no loss of service.

The second type of network redundancy is high availability. In this structure, rather than duplicate all of the physical hardware, a cluster of servers is run together. The servers monitor each other and have failover capacities, so if there is a problem on one server, a backup can take action.

If you’re curious about the benefits and drawbacks of each, consider this: fault tolerance systems deliver next to zero downtime but are highly expensive to implement, while high availability infrastructure is less expensive to implement but may come with a risk of minor impacts to service during outages.

Designing for redundancy

There are useful network redundancy protocols at many different OSI layers. The first thing to think about is what happens at each layer if you lose any individual link or piece of equipment.

If you’re new to this, I suggest creating detailed Layer 1, Layer 2, and Layer 3 network diagrams showing every box and every link. Put your pencil or your mouse on each line or box in succession and ask these questions for each element:

What happens at Layer 1 if this box or link goes down? Do you still have connectivity?
What happens at Layer 2? Do you still have continuity of all VLANs throughout the network?
What happens at Layer 3? Do you still have a default gateway on each segment?

There are a lot of different redundancy protocols around, not all of which are equally robust. You’ll need to choose appropriate protocols for your equipment and network, but here are the ones I generally use.

At Layer 1 and 2, I like to use Link Aggregation Control Protocol (LACP) for link redundancy. This includes multi-chassis LACP variations like Cisco’s Virtual Port Channel (VPC) technology, available on all Nexus switches. Note, however, that most multi-chassis link aggregation protocols have serious limitations. HP’s Distributed Trunking, for example, is best used for providing redundant connectivity for servers and can have strange behavior when interconnecting pairs of switches.

The other important Layer 2 protocol to use is Spanning Tree Protocol (STP). I prefer the modern fast converging STP variants, MSTP and RSTP. (I’ve written about spanning tree protocol before).

DDoS attacks and network redundancy

Distributed Denial of Service or DDoS attacks are cyberattacks no network admin wants to deal with. The goal of an attack like this is to render a network or service inoperable. Luckily, network redundancy can help to mitigate the impact of DDoS attacks, because it improves network security.

By using multiple ISPs, data centers can reroute network services in the event of an attempted DDoS attack. That’s why it is crucial to have redundant networks with flexible internet access. Businesses can’t operate if the network is down—continuous internet connections and functioning technology are essential these days. If your network lacks redundancy, especially a redundant internet connection, the failure of a single device could result in hours of downtime for the entire network.

Network redundancy and infrastructure

The terms ‘fail’ and ‘failure’ have come up a lot so far, because when planning network redundancy it’s important to think about all of the ways a network can fail. Beyond the software issues, think about the physical and environmental factors that can impact the performance of a device.

The enemies of device uptime like heat, water, and power are all things to think about when planning network redundancy. Ensure that you’re using redundant electrical supplies including UPSs and possibly even backup generators, have redundant cooling systems in place, and have redundant environmental sensors to warn you if the physical environment is becoming less-than-ideal for network devices.

Tips for achieving minimal complexity in network redundancy

Here’s a list of things to keep in mind for implementing network redundancy while minimizing complexity. Consider these some network design best practices.

1. Identical systems with identical connections

I like to provide redundancy by implementing exact duplicate systems in key spots in the network. For example, a core switch will be two identical switches. When I say identical, I mean they should be the same model, running the same software, and they should have the same connections, as much as possible. The easiest way to do this with switches is to use stackable switches. Then there’s really nothing to do—connect up the stacking cable and you have redundancy out of the box.

2. Simple redundancy protocols

There are a lot of ways to implement network redundancy. The most reliable ones involve the simplest configuration on the fewest devices. For example, if I need a highly available firewall, I’ll implement a pair of devices. And I’ll always use the vendor’s fail-over mechanisms. Then I don’t need to worry about making the firewall take part in any routing protocols. Unless there’s a compelling reason for the firewall to run a routing protocol, it only introduces unnecessary complexity. Always use the simplest configuration that meets the requirements!

4. Keep everything parallel

One thing that often trips people up is how to connect successive layers of redundant devices. The trick is to keep it all parallel. Create an A path and a B path with a cross-over connection at each layer. The idea is that any one device can fail completely without disrupting the end-to-end path.

For example, suppose I have a pair of access switches, a pair of core switches, and a pair of firewalls. I’d connect access switch A to core switch A, which also supports firewall A. Similarly, access switch B connects to core switch B, which connects to firewall B. I’d also connect the two access switches to one another and the two core switches to one another.

In this example, you may be tempted to further connect access switch A to core switch B and access switch B to core switch A. It’s certainly a common configuration, but as soon as you do this, you need to know what you’re doing in terms of link aggregation and spanning tree. That could add considerable complexity if you’re new to network design.

5. Never do more than you need to

As the previous example suggests, it’s easy to go further in implementing redundancy than is absolutely required. In many cases, the extra redundancy is warranted and could provide additional functionality. But carefully consider every piece of equipment, every link, and every protocol. For each one, ask whether it’s providing enough additional functionality to warrant the additional complexity.

6. Cookie cutters

Finally, it’s extremely useful to follow a standard model when implementing your networks. If you have multiple data centers, make them as nearly identical as possible in terms of topology.

Similarly, make your access switches as nearly identical as possible. Use common VLAN assignments everywhere, and have a common IP addressing scheme that works everywhere. Make the default gateway on every segment follow a common rule such as the first or the last IP address. If you use redundancy protocols like HSRP, use them everywhere, and configure them the same way everywhere.

All this similarity helps limit the possibility of human error. Maybe the new engineer has never looked at this particular device before. But if it’s exactly the same as every other device performing a similar function, then it’s much less likely that he or she will miss some obscure bit of protocol magic that was implemented on this device and only this device.

Maximum availability with minimum complexity

The goal is maximum availability with minimum complexity. So it’s vitally important to keep the configuration simple. Don’t implement multiple redundancy mechanisms that are trying to accomplish the same logical function or network navigation will become very difficult.

When it comes to routing protocols in particular, think about whether you can get away with a static route pointing to an HSRP default gateway. Routing protocols have to distribute a lot of information among a lot of devices, and that always takes time. HSRP and VRRP are both faster and simpler so you should use them if you can.

If you have stacked switches, think about what happens to upstream and downstream connections if one stack member fails. Where possible, you should distribute these links among the various stack members.

Above all, remember that building a real-world network is not a test where you have to demonstrate your deep understanding of network redundancy examples. Points won’t be deducted for using static routes and trivial default configurations. Keep it simple.

Planning for the year ahead?

Find out how IT professionals are managing day-to-day operations, dealing with talent shortages, and preparing for future needs.

Download the Report