Performance-related issues are among the hardest IT problems to solve, period. When something is broken alarm bells sound (metaphorically in most cases) and alerts are sent to let IT ops know there’s an issue. But when performance slows for end-user applications there’s often no notification until someone calls to complain. Yet, in reality, losing 10 or 15 seconds every minute for an hour is at least as bad as an hour of downtime in an 8 hour day. At least as bad, and maybe worse – there’s the productivity hit but there’s also significant frustration. At least when the system is really down, users can turn their attention to other tasks while the fixes are installed.
One reason why this remains an ongoing issue for many IT organizations is that there are few management tools that provide an overall view of the entire IT infrastructure with the ability to correlate between all of its different components. It’s not that there aren’t individual tools to monitor and manage everything in the system, it’s that coordinating results from these different tools is time consuming and hard to do. There are lots of choices when it comes to server monitoring, desk-top monitoring, application monitoring, network monitoring, cloud monitoring etc., and there are suites of products that cover many of the bases. The challenge is that in most cases these management system components never get fully integrated to the point where the overall solution can find the problem and quickly identify root-cause.
If IT was a static science it’s a good bet that this problem would have been solved a long time ago. But as we know, IT is a hot bed for innovation. New services, capabilities and approaches are released regularly and the immense variety of infrastructure components supporting today’s IT environments make it difficult for monitoring and management vendors to keep up. New management tools appear very frequently too, but the cost and effort of addressing existing infrastructures is often cost-prohibitive for start-ups trying to get their new management capabilities to market quickly.
The complexity and pace of change lead some IT organizations to rely on open source technologies and “freeware” with the benefit that capital costs or operational expenses are kept to a minimum. Yet the results of using such tools are often less than satisfactory. While users can benefit greatly from the community of developers, it’s often hard to get a comprehensive product without buying a commercially supported version. Another issue for open source IT management solutions is that they’re generally not architected to support a large and increasingly complex IT infrastructure – at least in a way that makes it possible to quickly perform sophisticated root-cause analyses. The result is that while the tools may be inexpensive, the time and resources needed to use them can be much greater and their impact less than satisfactory.
IT management is its own “big data” problem.
As IT infrastructure continues to become ever more complex, IT management is becoming its own big data problem. Querying an individual device or server to check status and performance may retrieve only a relatively small amount of data to be sent to the management or monitoring system; a small volume of data but likely a diverse set of information indicating the status of numerous parameters and configuration items. Polling mobile devices and desk-tops, servers, applications, cloud services, hypervisors, routers, switches, firewalls ….generates a whole lot of data, each different item having its unique set of parameters and configurations to retrieve. Polling hundreds, thousands or even tens of thousands of devices every few minutes (so that the management system will be current with device status) can create significant network traffic volume that must be supported without impacting business applications. On top of that the volume of data, the polling frequency, and the resultant velocity of traffic must be accommodated to support storage, trend analysis and real-time processing. System management information is usually stored for months or even years, in order that historical trends and analyses can be performed. But most importantly the management system needs to rapidly process information in real-time to correlate cause and effect, disable downstream alert and alarm conditions and perform predictive analysis so that valid messages can be proactively sent to alert IT ops. Now system management architecture becomes important. Add to that the need for flexibility to accommodate the ever changing IT landscape and management system design to support this “big data application” becomes a critical issue.
This, in part, is why IT management vendors are migrating their solutions to the cloud. As IT infrastructures continue to expand in terms of size, complexity and diversity the ability to support the volumes of data generated and the performance needed to undertake comprehensive root-cause analysis becomes more and more challenging for purely on-premise solutions. In addition, cloud solutions offer the promise of “best practice” advice that can only be derived from a shared-service model, with the caveat that security and privacy remain paramount.
Of course, cloud solutions, with their pay-as-you-use pricing and (on premise) infrastructure free installations are also becoming far more attractive than perpetual licensing arrangements. However, the bottom line is cloud-architected solutions are extremely extensible and able to more quickly and easily accommodate new functionality to the benefit of all users. Not the least of which is the ability to deploy better diagnostic tools and capabilities to support the needs of today’s diverse IT user communities for high levels of systems availability AND performance.