The Hidden Risks of Virtualization

November 10th, 2008 by Michael Lohr

Like so many bloggers I am here to talk about lurking issues with virtualization.  However, I’m not here to talk about the technical details of items like promiscuous mode on a vSwitch or setting proper permissions in Virtual Center (which by the way these are both important).  I’m here to talk about a more fundamental issue that plagues many IT shops–proper process around change management.

I talk with many customers today about their change management process and the vast majority still struggle to effectively manage change.  They all have a change management process but they have varying degrees of success with it.  Most customers enter change information into a change management system but for the most part the data entered is inconsistent.  If I ask a few probing questions like can you tell about all the changes that took place to a particular asset or how do you know a ticket was completed successfully they generally do not have this information. 

One of my favorite examples of a poor process was a very large customer who used an enterprise ticketing system yet could really not track anything of relevance with this system.  It was not the change management system’s issue because it had all the bells and whistles.  It was the customer who had put a terrible work flow around this system.  I sat with the customer to understand their existing procedure for submitting a ticket and was astounded to see that when it was time to fill in the asset name/group that was getting the change they used a drop down to select “Production.”  This is an organization that had 2000+ production servers so their filter was just to say production versus staging or QA.  Suffice it to say this left some room for improvement.

I’d like to say this was an anomaly but it really is not.  In many cases when I sit with a customer to understand their change process I see similar results.  They do not enter the asset’s name or instead put it in a giant memo field so this obviously makes it impossible to report upon.  They may or may not document what exactly is going to change or keep it very generic like “Updating the Firewall Rules.”  Well, if I need to roll back a change that definitely tells me what I need to know!

So why is this relevant to virtualization?  Obviously, this is a process framework that should not only govern the physical world but also the virtual world.   So, if an organization manages changes poorly today this problem will grow by a large factor.  It so much easier to make changes in the virtual world and these changes often go under the radar.  Since you can easily add a brand new machine or even add a new network with a few mouse clicks most people do this without the proper authorization.  This leads to undocumented changes but even when people do document the changes they put in the bare minimum that is required for the change system.

One of the greatest features of virtualization is that it runs at lighting speed and you can react to the business’ needs faster than ever possible.  However, in this case its greatest strength could become its greatest weakness if it paired together with an organization that does not have a well defined and mature change process.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Virtualization: Conflicting signals

October 27th, 2008 by Dwayne Melancon

Just catching up on my virtualization news feeds after a week of heavy travel, and I noticed that there are a lot of interesting things going on - mostly involving mixed messages:

  1. A well-known research firm, IDC is apparently providing some numbers around virtualization adoption that aren’t sitting well with VMware.  They think IDC is giving Microsoft too much credit for gaining adoption / market share.  IDC says Microsoft’s Hyper-V accounted for 23 percent of new shipments.
  2. VMware announced strong earnings for the 3rd quarter, and are forecasting heavy growth in 2009.  However, VMware is freezing hiring for the foreseeable future in spite of that expected growth.  The article also says they are doing some sort of “realignment” (aka “reorg”).

I think this will be very interesting as it unfolds.  Some things I think will make it more interesting:

  • Many IT organizations are deploying (and will continue to deploy) hypervisors from multiple vendors (VMware, Microsoft, and Xen most commonly, but virtualization from Sun and IBM are already common in our customer base).  This will make the counting more difficult, and will make for some interesting posturing from the virtualization vendors.
  • The economy is anybody’s guess, but with virtualization technology it’s trickier to figure out what to do.  Companies will be more frugal with their dollars, but the ROI on virtualization may make it less likely to get cut.
  • To make it even more tricky, organizations seem to be deploying virtual infrastructure ahead of their ability to manage it effectively.  That may mean they need to spend money on management tools to attain the visibility, capability, and integration required to realize virtualization’s ROI.  So, how good are you at convincing management they need to spend money to save money?

Watching the vendor landscape will be interesting, as well.  Obviously, the big virtualization vendors have plenty of cash in the bank and will make it through the storm in some fashion.  However, the “upstarts & startups” in the field will have more of a challenge to deal with.  I fully expect many small vendors to fail or get bought up for pennies on the dollar.   It’s times like these when I’m glad I’m in an established, growing, and profitable company.

What do you think?  How (if at all) has your outlook, strategy, budget, etc. changed in recent weeks?

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Smoking More, Enjoying It Less, Vol. 5…

October 15th, 2008 by Gene Kim

I’ve heard this twice in the last three weeks!  The IT group could return millions of dollars to the bottom line, if only they could win the argument with Finance to buy new servers and allocate the four man-months to complete the infrastructure replacement project.

  • IT guy: “You know that virtualization project?  Well, we need to replace 120 servers which are about seven years old, and incidentally, no longer under maintenance.  And we’re four versions behind on the operating system.”
  • Finance guy: “Are those servers paid for?”
  • IT guy: “Yes.”
  • Finance guy: “And they’re fully depreciated?”
  • IT guy: “Yes.”
  • Finance guy: “So, they’re free.  Request denied.  Keep running them until they fail!”

Of course, they already are failing (at random times), causing urgent firefighting, taking even more time away from other urgent infrastructure improvement projects.

And thus, we keep swirling down…  Even though we purchased the VMware licenses.

Does this resonate with anyone else out there?

Image courtesy: John angry/Oboyah Za3lan

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Tripwire Enterprise 7.5V Part One

October 3rd, 2008 by Gavin Millard

With the announcement of Tripwire Enterprise 7.5V, like new parents, we are eager to show it off and smile lovingly at our new little baby. With this in mind I thought I’d spend a few posts showing some of the new functionality that makes TE 7.5V so great. 

Click to EnlargeMy first post is to give some detail on our new role based Home Pages. This functionality allows customers to create their own widget based pages that give them the information that is important to them without having to see or understand the underlying technology. With this being a Virtualization security blog I’ve taken an example of what a VI admin or a member of the security team might want to have as a Home Page. Each widget is customizable to show the key pieces of information you are interested in. In the example to the right I have four widgets that I’ve created for a “Virtual Infrastructure” Home Page (click to enlarge).

The first widget “Virtual Infrastructure Security” contains scorecards for configuration assessment against security best practices for the Hypervisors, vSwitches and workloads contained in the monitored Virtual Center. These reports are clickable to get more information. If I click into the vSwitch VMware Hardening report I can see the different vSwitches I am monitoring and whether they are compliant to the best practices laid down by VMware or not.

Click to enlargeIn the vSwitch VMware Hardening report, we can see that vSwitch1 failing some of the tests that we’ve designated to run on that switch. These failures are due to both the MAC address changes and Forged Transmits not being set to reject. Drilling further allows us to see the actual settings and also how to remedy the issue if desired. This remediation advice is included in the policies to help customers not only find out where they are failing on security parameters but also how to fix it. This remediation advice can be edited to suit different environments or processes and could even be used to create a change ticket and be included as a run book of actions that need to take place by other admin staff.

 Moving to the right of the home page we have another dashboard widget called “VI PCI Compliance” which allows us to see how well the environment is configured against VISA’s PCI policies. Read this post to see why Mike Lohr so eloquently pointed out that the virtual infrastructure should be compliant to PCI. Any graphical report could be added to these dashboard widgets, for example guest OS complaince to CIS, NIST, DISA or ISO27001.

Directly below that widget we have an alert widget that is configured to show any alerts on the virtual infrastructure which includes new guests or hosts being added and changes to the infrastructure that reduces or increases the compliance to the security policies. This is a great alert for a VI admin to use to see who is adding new guests to the Hypervisors to control VMSprawl.

Click to enlarge

Below that alert widget we also have a “Report Repository” widget that allows us to add reports an admin might want front and center. A very useful report, that I’ll discuss in a future post, is the unmonitored guest systems and hypervisors which allows us to show any guest or hypervisor that have been deployed without security monitoring.

As you can hopefully see, this new piece of functionality we really help staff gain control of their virtualized environment. My next post will be about how Tripwire can automate discovery of all virtual assets.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Smoking More, Enjoying It Less, Vol. 4…

October 1st, 2008 by Gene Kim

In today’s tougher spending climate, I’ve found that virtualization is one of the few IT projects that don’t have to do with compliance that are being funded these days.  They often hinge upon reducing operating expenses and increasing computing densities, but I wonder how many of these projects will deliver on the goals of real reduction in operating costs…

Image courtesy: John angry/Oboyah Za3lan

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Tearing down walls and bringing people together

September 30th, 2008 by Dwayne Melancon

I got a lot of input on my last post about moving live VM’s around, so I thought I’d do two things:

  1. Clarify the point I was on about;
  2. Give an example of what I mean by a technical feature that has - in itself - business value.

Business value: It’s about results, not just activity

Some people thought I was dissing VMware’s VMotion in my previous post.  I wasn’t - in fact, we use it in our lab environment and it is very cool and powerful.  Rather, the point of my post was that faster movement of VM’s, in itself, may not solve any high-level business problems for at some customers.  Therefore it may not even make the list of features that drive “the decision” when choosing a virtualization platform for the business.

Organizations should start with a top-down analysis of the business problem they wish to solve, detail the business requirements / expectations associated with the business problem, then implement processes and technologies that best support their requirements and measure the results vs. their expectations. 

The trouble with this approach is that organizations often shy away from dealing with the process changes required to really solve their problems, since that requires getting people to change how they work. 

Getting people to change can be hard and painful, so organizations often avoid it.  Capital expenditures are much easier to justify than organizational change initiatives, aren’t they?

What’s the consequence?  Organizations often turn to a hot feature in a technology, hoping that it will solve or mask their business problem.

Here’s a story about that:  A couple of years ago I worked with a large financal institution whose VP of Ops told me, “I can hide most of my unplanned downtime just by overinvesting in capacity and failover.” 

He still had availability problems - it was just harder for people to notice because he was masking the symptoms.

So be careful when you’re selecting technology, and don’t get distracted by features for features’ sake - make sure the feature really solves the business’s problem.

And, of course: if VMware’s VMotion and/or DRS solve or reduce your business problem, then that’s good thing.

Technology can tear down walls

Now, let me give you an example of a technical capability I learned more about at the recent VMworld 2008 coference that I believe does add business value, all on its own:  the Cisco Nexus 1000V distributed virtual switch.  What is that, and why do I think it delivers business value? 

This technology will provide a seamless integration of Cisco’s security, policy enforcement, automated provisioning and diagnostics features into dynamic VMware environments.  More importantly, it will allow an enterprise’s certified Cisco experts to manage virtual network infrastructure using the same commands, tools, concepts, etc. they already use for physical Cisco switches.

Where is the business value in this?  It provides a way to get Cisco network engineers engaged in the management of virtual networking without forcing them to get new training, use new tools, or significantly alter their workflow.  More buy-in with lower effort == better business value.

This kind of integration is a huge step in enabling faster, more secure scaling of virtual infrastructure because it eliminates a lot of the “hard and painful” aspects of this kind of business change.

Bravo, VMware and Cisco for making it easier for more people to get more business value out of your products, and capitalize on their past investment in skills development!

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

My favorite talk at VMworld was…

September 29th, 2008 by Gene Kim

I guess I really am a blue-collar type of guy…  Certainly, there were many interesting technologies previewed at VMworld, including some mind-blowing features and capabilities that allow transparent failover and transactional integrity using pedestrian hardware and software. Pretty astounding stuff!

But, to be honest, my favorite talk was given not by the technologists and marketers, but by two folks from the IT Operations group from VMware. Tayloe Stansbury and Drew Kramer gave a fantastic talk called, “How VMWare Virtualizes Its Own IT Environment.”

Tayloe and Drew talked about delivering some of the most mission critical IT services for VMware, including the financial reporting systems and email, to VMware through periods of incredible growth, resulting in pressures to increase capacity and provide an infrastructure more resilient to failures.

I wish they had talked more about the change and configuration processes required to achieve their stabilization of service levels, which mostly likely had far more to do with their results than virtualization. Some of their interesting lessons learned included (Gene’s commentary in parentheses):

  • The real lurking problem of virtualization sprawl (in the absence of good preventive and detection controls, beyond “hey, why is my ESX cluster running slowly”)
  • The not-insignificant upfront investment required to really architect an IT service correctly for virtualization (they gave an example of their Microsoft Exchange environment: 1 quarter to archtect, 1quarter to implement)
  • That the uptime risks aren’t from the technology, but from the daily operational tasks (only 1 ESX server crash in the last two years, but I’m sure they’ve had plenty of IT service interruptions not caused by ESX)
  • Monitoring VMs for low usage, which make them candidates for decommisioning (apparently, like reclaiming bodies for food in the movie Silent Running.)

For those of you who want to study their slides, as I do, look out for presentation TA3808, which will hopefully be posted on the VMworld website soon.

Great job, Tayloe and Drew!  You make hard working IT operations folks everywhere proud!  :-)

Okay, guys, what was your favorite VMworld session?

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

‘Motional reactions

September 25th, 2008 by Dwayne Melancon

A few days ago, I wrote about how Microsoft and VMware were reaching feature parity on the ability to move “live” VM’s from one host to another, and that the real test for virtualization was who could deliver the best value and return to the business.

Well, it turns out Microsoft’s Live Migration feature won’t be available unti 2010, so we have a while to wait.  My first reaction was, “well, that sucks for Microsoft.”  However, as I thought more about it, I don’t think it’s such a big deal after all.

Motion, motion, who’s got the motion?

From a feature perspective, VMware can move a VM from one host to another in a few milliseconds - if you satisfy a lot of very specific technical requirements for near-identical hardware. 

Microsoft’s current “Quick Migration” capability suspends a VM, moves it to a new host, and resumes the session about 6 seconds later.  The underlying hardware doesn’t have to be near-identical.

Is Microsoft’s Quick Migration fast enough?  Depends on the application.  In some use cases, 6 seconds is too slow, but I’d venture that 6 seconds is fast enough for most of the apps the typical IT department manages.  And, in any case, it is significantly better than the physical world. 

The bottom line:  I don’t doubt this (very) technical feature will be a deciding factor for certain availability-critical applications & systems, but I think Quick Migration will be sufficient for most circumstances.

Business value trumps technical coolness

So - that means we’re back to a battle over delivering strong business value.  Today, there are a lot of arguments floating around about whose virtualization platform is cheaper, better, etc.

The cost of the platform is not the primary factor in the equation, in my opinion - it’s the ease of incorporating the platform into your IT operations in a beneficial, effective way.  And when it comes to costs, management an operational costs will trump the cost of your virtualization platform in no time at all.

In this area, I (currently) give Microsoft the edge because they can (already) manage physical and virtual infrastructure, and have broader support for cross-silo IT operations and application management than any other virtualization vendor.

What do you think?  Is a “one stop” solution like Microsoft better for your business than what you get from a virtualization specialist like VMware?  Would love to hear your perspective.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Breakout

September 24th, 2008 by Gavin Millard

I got into a bit of a heated debate last night concerning whether the threat of hackers exploiting ESX flaws to break out from one host system to exploit another was a real issue or just fearmongering. Now I’m always up for a bit of mongering especially when fear or scare are added to it so I was on the side of hackers utilizing flaws to attack the wider infrastructure.

My esteemed colleague Steve* was rather determined that these kinds of breaches, whilst theoretically possible, was something that would not really happen. Kind of like every one waiting for the end of the World because some Swiss scientists built a big circular pipe with a few atom guns in it. Something could happen, but it’s not worth buying the local supermarket out of their tinned goods yet.

The thing is, he’s right on the money here. Whilst it has been shown as something possible and an exploit released (although on the cheaper desktop versions of VMware) nothing has really been seen out in the wider World. So why was I on the side of mongering you may ask?

Because although breaking out from one VM to another is really rather difficult utilizing the underlying hypervisor, why would a hacker bother doing that? They would happily look for those hundreds of machines that have sprung up from nowhere without consent from Ops or Security, that are never patched, checked for security parameters and probably all have the same admin credentials. They would break into those, rob all the good stuff and then use the IP network to look for other targets, irrelevant of whether they are physical or virtual and break out anyway. They don’t need to do all the clever stuff to breakout, they’ll use they network and myriad of poorly configured systems to springboard from one machine to another.

I’ve got a great example of this. I was working onsite with a rather large vendor partner, getting our solution up and running in their demo lab so that they could show their clients how lovely both of our technologies are. As I’m clicking through the install, pretending I know what I’m doing, my RDP connection got dropped to the demo lab system I was working on. Spinning around on my chair I asked what was going on. One of the engineers from a few desks down rather sheepishly tells me “We’ve just realised that none of the demo vmware images have AV installed and we’ve picked up a virus.” True story, if you buy me a pint I’ll even name and shame.

My point is, why worry about something theoretical when you are missing so much anyway? Good security is not about trying to figure out exotic ways to protect that 1% chance; it’s about ensuring that you have a good level of security for the other 99%. Whilst VM breakouts are a theoretical threat, poorly configured systems makes it pretty pointless to use that approach anyway and will use the flaws you’ve introduced by not having control of your ever increasing vm library.

*Names not changed to belittle the guilty.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Compliance is not a river in Egypt

September 22nd, 2008 by Dwayne Melancon

I was on a compliance panel at VMworld last week.  It was well-attended and followed by a bunch of good questions from the audience.  One of the most common questions was “Is VMware PCI compliant?”

The answer is “No.  VMware is not inherently ‘compliant’ with PCI or any other regulation.”   Why?  Because technology is not a silver bullet when it comes to compliance, and virtualization is no exception.

Compliance is about your organization’s policies and controls - and whether you can demonstrate that you are adhering to them consistently.  Therefore, the biggest obstacle to compliance is generally not the technology you use; it’s the people involved in defining, implementing, monitoring, and enforcing your controls.

Sure, some technologies make it easier to achive and demonstrate compliance.  Virtualization, for example, can make it easier for service providers to segregate customer data so they don’t mingle data from different clients.  But you still need to understand your risks, and create controls that enable you to manage and mitigate those risks.

Someone asked me what the biggest obstacles to compliance were.  In my experience, it usually boil down to one of the following problems:

  • inadequate “tone at the top” - no management-level commitment;
  • lack of clearly defined processes, roles, and risks;
  • insufficient communication of expectations to staff and stakeholders;
  • lack of documentation (policies, processes, roles, etc.);
  • inadequate detective controls, meaning you can’t systematically detect when policies and processes are circumvented;
  • lack of defined and enforced consequences (i.e. people break the rules, but nothing happens).

If you are subject to compliance and you’re adopting virtualization, I’ve got good news - there is now a VMware Compliance Center availble to you, and it’s filled with some excellent materials to help you get there faster.  Click here to check it out - you’ll be glad you did.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]