We all know that our IT infrastructure is ruled by one major imperative: Overlays! We used to be so afraid of overlay no longer than ten years back. Now day we simply do not feel satisfied unless we add another layer for a technology, a service or a policy. Yes, of course we know these overlays make monitoring and troubleshooting hell but that is OK, we will deal with that later. For now we focus on the revenue opportunity, we deal with costs later.
In preparation for that storm, let us talk about monitoring and troubleshooting this layered cake full of surprises. What options do we have here? And PLEASE don’t blurb out: “Big Data!” Between the cost for the tools to manage all the data streams, the cost of managing the heuristics specific to your environment and the cost for HAL you would have spent your budget in the first 2 months of operations. Either way, visibility within each layer and correlation across will be needed. You will need “X-ray” vision so you can peel the layers. One practical approach is to simulate users for each overlay. On one hand this will give you an idea of what is happening with each layer and proactively address issues, on the other, you can test any hypothesis that HAL cooks up through correlations.
Of course, IPv6 is going to be an overlay in your existing environment. Well, at least for a while when IPv4 becomes one and then it disappears. So even for something simple like IPv6 enablement you will need the superpower of X-ray vision. Here is an example. First image shows response time for a service as seen by multiple v6Sonar agents.
Clearly there is an event but hard to tell what is going on there but is specific to a layer or is it just an overall path issue? Now lets us peel off the IPv4 layer.
Oops, IPv6 really had a bad day, a really bad day. At 5x usual average latency somebody really needs to look into it. Now assuming all things are equal and happy eyeballs is working well for everyone, users will have a backup, the old IPv4. Unfortunately, things are not always equal the issue will show up on someone’s radar. And as Murphy would have it, it will probably be your IPv6 loving CEO.
By the way, did you see that one of the agents showed no issues? That was the agent in the same hosting environment as the service. Now you can do quick fault domain isolation. This is the power of instrumentation.