During my work life of the last decade at Login Consultants I have come across many large-scale RDSH and VDI environments. From my point of view, a lot of these environments suffer from similar structural issues.
Over and over again, I see RDSH and VDI projects that do not live up to the sky-high expectations of the IT organization. Projects are over time, over budget and the resulting environments are way more complex and time consuming to manage than expected.
It is peculiar that this still happens so often even though technologies have significantly matured in the past 10 years. Even more so since plenty of impressive innovations, white papers, blog post and blueprints have become available from vendors, consulting companies and community leaders on how to do it right… Right?!
So what is it that makes these larger enterprise scale hosted desktop infrastructure projects falter in my opinion?
The reality in today’s enterprise datacenters is that many of the windows systems, including the hosted desktop environments, are still largely deployed and maintained manually. Truth is that this is happening in way more ‘high-end’ enterprise datacenters than we would like to admit.
Furthermore, with so many imaging, cloning, streaming and layering technologies available nowadays, it almost seems like it has become the standard to simply manually create or update a desktop or server image and then just replicate it. My point is that manual administration seems to be promoted as the ultimate ‘easy’ button while it complicates management long term in many ways.
So what are my issues with manual change?
- Manual changes cannot be repeated 100% identically, period! Fact is, we humans are not as consistent as we would like to be. Doesn’t matter if we are performing seemingly simple tasks on multiple targets or executing a long checklist on a single target; we are simply not good at it. The only consistency in manual work is inconsistency.
- To apply manual changes properly, one needs to understand the context and the underlying technology, especially in a VDI/RDSH context. As a result, chances increase considerably that you ‘screw up’ when you do not have the required expertise when you are performing a manual update.
- Manual changes are typically not properly documented. It is therefore very likely that over time, you can’t remember exactly how you applied your changes or even what those changes were all about. Documentation and actual configuration drift further apart and your systems start behaving in an increasingly unpredictable manner. At this stage, reproducing your systems for testing purposes becomes next to impossible. The devil is in those details!
- It is tricky and sometimes very difficult to (fully) revert manual changes.
- You would be amazed how much time goes into seemingly simple tasks like importing some configuration settings or updating windows with the latest security fixes in an operating system image. And when you do this manually, I can assure you that it will never stop. It will only get more with each image, persona, department, datacenter you add to your environment.
- Manual changes (even small ones) drive work outside business hours since you have to be present while you perform them. Often, users can not be bothered during office hours so there you go… Can become quite costly when overtime becomes the norm.
- Most organization have invested heavily into automation solutions: but the reality is that operating those automation solution often require a lot manual input for (re-configuration).
I have to admit: it feels weird writing about manual administration anno 2014. But it is still a reality today, especially in larger and more complicated organizations. This is in stark contrast to what we see how modern infrastructures and applications are managed and operated by cloud providers.
I strongly feel that extensive automation of the deployment and maintenance of these environments is key. Why don’t you take all the time and effort you would normally put into documenting and manually executing changes into the automation of them?
Automation workflows are generally not open for interpretation; they either work or they don’t. Computers are, unlike people, extremely good at performing the same task over and over again or executing a huge list of instructions without skipping any or executing them ‘in their own way’.
So why aren’t all these environments automating the crap out of their installations? And why is it that even if IT departments do automate, the way they do it is often not very efficient.
Let me explain what I mean by ‘not very efficient’. Automation of the installation of an infrastructure component or application package based on MSI is not overly complicated. However, automation scripts and tooling are often only configured for a specific environment like ‘production’, ‘department A’ or ‘Customer 9’.
Scripts and packages often lack the essential parameters so they can effectively be reused in different environments or ‘flow’ as part of the change process from one environment to the next.
Another problem with most automation efforts we see is that the product methodologies that are being used require a considerable amount of effort and back-end infrastructure themselves. This is where we have a typical hen and egg problem. For example, if you use System Center Configuration Manager you need a SQL Server before you can actually start installing SCCM and before you can actually start automating. Once you have finally built a functional infrastructure it is difficult (if not impossible) to extract the configuration data and reproduce your work somewhere else.
Let’s continue on the SCCM example. Sure, you can export a package. But all the parameters we discussed before are locked up in a package or hardcoded in some scripts. So if you have a test environment and create an application package with specific settings and variables and you import it into the next SCCM environment they will not be automatically transformed. So now you’ve got an automation solution but you basically still manually configuring your work.
Yep, this is the reality I see too many times today.
So how can we fix all this?
Maybe we can learn from another movement that is happening in an area where similar problems exists: software development. This movement is called DevOps! The concept of DevOps is bringing two teams together – the development and the operations team. The use of very short development cycles enables constant validation if the developed code is usable and can be operated. This is done in general by complete automation of the whole operations process from developer to end-user. From an organizational standpoint it is also about bringing the responsibility for the whole chain together. Not one team is responsible for step a, b and c and the next team for step d and e but both teams are responsible for the end result.
If we want to improve our quality and agility, what are the main concepts that we could to borrow from DevOps? To me, these concepts are:
- Cross silo automation (and teams)
- Reusable automation
- Incremental updates
- Automated testing
They may not all be easily implemented and most people I talk to are quite skeptical about the feasibility but I have seen this work and I have also seen the results! With fairly straightforward tools and ideas in the hands of the right people you can achieve something great.
Let’s dive deeper into these concepts:
With silo in this context I mean that organizations are still too often divided into departmental silos or often called towers. For example one department is responsible for Active Directory, one for virtual machines, one for the OS, one for Terminal Server and so forth. What I see often is that each silo has a certain maturity of automation but they are isolated. This isolation leads to many issues during the process of application provisioning. Imagine a standard change for an application rollout in a hosted desktop environment. One team is responsible for the installation of the application, the other team for the active directory group, another team for the publishing. Now imagine that during the weekend the change should happen. These kind of changes often fail because if only one team makes a small mistake the whole change fails – sounds familiar? With cross silo automation all aspects of this change are in one package and handled by one team. This dramatically increases the success rate of such changes.
But having this package alone without cross environment automation gives you another problem. In this context an environment can either be a staging environment like a test or production environment*. How do you know, if the package really works and which results it might have in your production environment. As described above I have seen examples where you export a package from an SCCM environment and import it to the next. But because the variables of this package are not visible you might forget to define the collection variables and the installation of the package fails, this will lead to frustration and that these people switch back to manual changes.
You need a smart automation solution that can handle multiple environments. Optimally, that solution allows you to easily create multiple separated environments that are not linked in any way to each other to prevent accidental changes to the production environment while you are working in your test environment. The export/import of a package should be extremely easy and it should be possible to automate it. The reality is that most automation products are great at automating a change in a single environment, but truly supporting the flow a change through multiple environments during the staging process is lacking.
* I believe the best are 4 stages, Development, technical acceptance test, User acceptance test and production
In current times of AppStores, Cloud infrastructures and continuously updating applications the business in general does not accept release changes once every 3 months any more. But why don’t we continuously apply changes to terminal servers or VDI master images? Is it because we are afraid they might break the environment and fixing simply takes too long? Or is it because testing our releases and the involvement of the business in user acceptance test takes a lot of time? Imagine you have automated the entire process of building and deploying changes and you have broken down your work into manageable and measurable chunks so you can make them available to any user in any environment automatically!
Of course you still need the base engineering and automation of the changes. But the next steps can be more dynamic. What about this real life example: a new application is automatically installed in a technical test environment (TAT) where it’s deployment and configuration is tested. Next, the application is automatically transferred to a UAT environment while the business owner of that application receives an email stating that he can logon to a portal and test the application. If the business confirms, the application is scheduled to be deployed to production, fully automated.
With this approach you can much faster and employ much shorter cycles release cycles for changes like new applications. What about weekly? Sounds great?
I have customers who are taking this route. They have even enabled the business to change minor things during the release process, like for example a registry key. This leads to a dramatic reduction in time and material for the whole release process and allows to business to become much more Agile.
Change should be a routine!
A lot of tests in the TAT can be automated.* For starters, setting up your test environment and testing the availability of basic functionality after deploying one or more changes. Are all services still running? Can I still log in to the environment? Is that new application appearing on the website? Over time, automated testing will greatly reduce the amount of manual testing but it will also help you catch and fix problems early.
When you are thinking of setting up automated testing make sure that the testing framework is easily extendible, because setting up tests like this is a lot like implementing a good monitoring solution. First make sure you monitor the big picture and don’t trip over all the details. Start by automating the build of your test environment and monitoring your ability to log on to your environment to make sure your applications are running and the performance is up to par. Then as time progresses, you can start to dive into more detailed monitoring.
* This is the point where people normally start listing the things that cannot be automated.
Now for what may very well be the biggest productivity killer in Enterprise IT – transparency!
Reality is that business and end-users often have no clue what is happening on the IT side. What is the status of my request? Why can’t I have application ABC? Who’s working on my changes and what is taking them so long? How long should I wait until they are available in production?
I have had major incident calls with enterprise customers where the back-end system teams were blamed for releases not being ready in time while it was totally unclear what was expected from IT. Same story towards the business. Why didn’t they hear about our issues until the very last minute?
Imagine you have automated your delivery process and that you have turned ‘change’ into something you routinely do. What if you had, for example, a number of staging environments like development, TAT, UAT and production and that you could easily see where a specific change currently resides and even predict when the change will become available.
Communication and transparency make all the difference. Involve the business in your process and let them help you decide what has priority. Make your progress measurable and visible. After a couple of weeks you’ll be able to improve your forecast which changes will make it to the next release.
I believe enterprise IT is missing out and I believe enterprise IT should take on an approach of continuous deployment and far better transparency. These are core principles of DevOps and we enterprise administrators can learn a lot from it. My story is about smart automation and bringing teams and responsibilities together to foster a predictable stream of change that is much more closely aligned to immediate business needs.
What do you think? Should enterprise IT take the next step in optimizing their delivery process? Is DevOps or a derivative the answer to these age old issues?