I put this on another forum, but I figured you guys would like to read this too:
For those interested, Our network operates in 4 AWS regions that we keep siloed from one another (meaning no region knows about the existence of another one). When the connectivity issues started, we disabled the network connectivity for all Skyetel assets in the two impacted AWS regions which caused our network to fully failover. (Because the impacted regions had partial connectivity, our network did not fully fail over and tried to limp along with all 4. This is by design; we don't want to automate disabling network routers of our network for obvious reasons... so an engineer needed to click the buttons).
The impact of this was some calls failed to establish, but if they did establish, they would work normally. This is because we are not in the audiopath of the calls. Once the distressed regions were fully down, our network could fully fail over and 100% of all calls completed normally.
The total impact time was 19 minutes, and we estimate about 7% of our calls failed to establish during that period. Sorry for the inconvenience 🙂
Did I mention @Skyetel is awesome lately? Being fixed almost before the customers noticed was insanely awesome.