On Monday there has been some outage for several Internet service providers. This is accepted for small companies with limited online resources but for major ISP’s situation change. Level3 is one of the companies affected by this problem due to software update for juniper network router (update from version 10.2 to 10.3).
This equipment failure not only affect ISP but most of its customer across the world, Now applying patches blindly can lead to serious problems so in my opinion before applying any update it is very important to test this update on a similar platform for several reason:
- Make sure that the update really fixes what it intended to fix.
- Check if we really need this update from the security prospective (sometimes we find update that are intended to fix protocols or services that we do not use)
- Also and most important to test the update on a similar environment to check if we have no conflict or computability problems with the existing environment.
On this kind of incident and to make customer aware about the situation one important step is to send notice for all customers (this can be a blog post or an email) with details regarding the problem, this should at least justify the situation. But it is always important to have a clustered systems that we can use as a failover to guarantee network and system availability.