Oops! VMware's VI 3.5 Update 2 locks out customers, pulls Update 2 download. - Gabe Knuth - BrianMadden.com
Brian Madden Logo
Your independent source for desktop virtualization, consumerization, and enterprise mobility management.
Gabe Knuth's Blog

Past Articles

Oops! VMware's VI 3.5 Update 2 locks out customers, pulls Update 2 download.

Written on Aug 12 2008 12,195 views, 31 comments


by Gabe Knuth

Word on the street is that VMware Infrastructure 3.5 Update 2 and ESXi 3.5 Update 2 have been affected by a licensing bug left over from development. This bug renders any powered off machine incapable of being turned back on, suspended machines incapable of being resumed, and admins unable to use VMotion to migrate machines, only reporting a "general error." Searches through the logs unearth this as the real problem:

Aug 12 10:40:10.792: vmx| http://msg.License.product.expired This product has expired.
Aug 12 10:40:10.792: vmx| Be sure that your host machine's date and time are set correctly.
Aug 12 10:40:10.792: vmx| There is a more recent version available at the VMware Web site: "http://www.vmware.com/info?id=4".

The workaround involves disabling NTP and setting your clock back to August 10, and can be found in full detail here.

According the KB article at VMware.com, engineering "will reissue the various upgrade media including the ESX 3.5 Update 2 ISO, ESXi 3.5 Update 2 ISO, ESX 3.5 Update 2 upgrade tar and zip files by noon, PST on August 13. These will be available from the page: http://www.vmware.com/download/vi. Until then, VMware advises against upgrading to ESX/ESXi 3.5 Update 2."

Let's hope that it actually is fixed in that amount of time, otherwise people will be resetting their VI 3.5 servers' clocks as nightly maintenance.

Update 2 was released on July 28th, and I can't begin to guess how many organizations this has affected. Update 2 contained many important and popular feature adds, like Windows Server 2008 Support, enhanced VMotion compatibility, live cloning of virtual machines, hot virtual disk extensions, 10GbE NFS and ISCSI support, and much more. All these features may have led admins to deploy the update a little earlier than normal (I'm speaking to the wait-and-see folks like myself).

The people who truly utilize all the functionality of virtualization are probably the ones who are losing the most here. Imagine having your systems dynamically reallocate virtualized hardware resources based on the time of day or day of week. Maybe the weekends are spent running batch jobs on 80% of your servers while during the week it's just 20%. In that case, those servers fired up Friday night and shut down early Monday morning just like normal. But, with the licensing bug, the servers required to get through the week (a full 60% of the servers!) can't come back up.

VMware really messed up here, but they don't need me to tell them that--I'm pretty sure the ringers in their phones wore out a few hours ago. It will be interesting to see how this pans out and how people using Update 2 in production were affected. It's sure to be a humbling experience for VMware, and how they handle it should speak volumes about the state of company and the responsibility it has to its customers.

 

 
 




Our Books


Comments

Guest wrote Huge issue
on Tue, Aug 12 2008 11:05 AM Link To This Comment
Many of our customers were hit by this today, it is a huge issue, one that is not going away quickly enough.  Problem is that once VMware releases the hotfix, you can't Vmotion the vm's since the hotfix hasn't been applied yet.  One possible work around is have a swing server, patch that server, Vmotion the vm's over to that server, then go from there.  Still you are talking about an outage.  VMware royally screwed up on this one.
Guest wrote Conspiracy theory...
on Tue, Aug 12 2008 12:19 PM Link To This Comment

Where did the guy who's running Vmware used to work?...  I'm only joking...Vmware's rivals would never do something like that....would they :)

btw.  The bug also exists in the free esx standalone product they have just released...shot yourself in the foot why don't you.

Guest wrote August 12th, Diane Greene's Birthday ?? maybe?
on Tue, Aug 12 2008 12:45 PM Link To This Comment

 

Guest wrote To Make things worse....
on Tue, Aug 12 2008 12:47 PM Link To This Comment
Today is patch tuesday when many Windows machines get patched and reboot.... huh.. wonder how that's going to go...
Guest wrote Testing Results
on Tue, Aug 12 2008 1:17 PM Link To This Comment
Guess that will further delay the testing results we are waiting in the Qumranet, Citrix, VMWARE not bakeoff :-)
Guest wrote Enterprise Virtualisation?
on Tue, Aug 12 2008 2:27 PM Link To This Comment

I think it is the bug of the year. There is no information from VMWare because the site is down (is it running on 3.5U2 :-) ). If you call support they tell you they will fix it in 36 hours (or so).

If this is VMWares statement of Enterprise Virtualisation I will look more at XenServer!

Gabe Knuth wrote Re: To Make things worse....
on Tue, Aug 12 2008 2:51 PM Link To This Comment
I came across that very thing when I was looking into the issue.  It appears that rebooting a VM is ok, it's just the act of actually powering on a machine that triggers it.
Guest wrote Re: Enterprise Virtualisation?
on Tue, Aug 12 2008 3:45 PM Link To This Comment

Wondered how long before the first Citrix fanboy raised their head.  If you think Citrix is gonna be such a huge improvement i suggest you haven't suffered the pain of Presentation Server / Access gateway/ AAC over the past few years. 

Guest wrote Re: Enterprise Virtualisation?
on Tue, Aug 12 2008 3:46 PM Link To This Comment

To be fair I don't remember something of this magnitude happening to VMware and still feel this product is pretty solid. You can bet at least this won't happen again. Let's not be too critical and start tossing stones, unless you can tell me another time where a foul up like this has happened?

 ~Gabriel Medrano

Shanetech wrote Re: Enterprise Virtualisation?
on Tue, Aug 12 2008 4:15 PM Link To This Comment

Agreed, we had our VMware rep call us first thing this morning and let us know about this issue so that we could get the information out to our Global Customer Base. Are we happy? Of course not, however it is software, like someone else said it's not like Citrix has not had something like this happen, or even Microsoft, geezzz any you guys rememeber NT 4 Service Pack hell?

Guest wrote Zero-tolerance policy with VMware ?
on Tue, Aug 12 2008 9:56 PM Link To This Comment

Citrix had problems with hotfixes, and constantly retires hotfixes when they have problems. Not sure why you guys tend to be so critics with VMware, and so kind with Citrix, which is a newbie on the virtualization would.

MS Exchange 5.5 SP3 messes up the exchange database. This is the solely only time we hear something from VMware. 

Also, how much people is imprudent enough to patch ALL hosts at the same time without going through a validation process first ?

 Why this "Zero-tolerance" policy with VMware ?
Jason Boche wrote Re: Enterprise Virtualisation?
on Tue, Aug 12 2008 10:17 PM Link To This Comment
Truer words were never spoken.
Jason Boche wrote Re: Zero-tolerance policy with VMware ?
on Tue, Aug 12 2008 10:18 PM Link To This Comment
Also very good points.  Nobody from the Citrix world should be thumping their chests about track history of QA and software updates.
Jason Conomos wrote Perception
on Tue, Aug 12 2008 11:09 PM Link To This Comment

I think some of the issue here is that it is not a Microsoft patch which takes out a server or a few, nor is it a Citrix patch that screwed up a server, but a virtualisation platform which takes out groups of servers which in some instances, there may not be a feasible work around.

So why are people having this zero-tolerance approach?  Well quite simple.  It is because people have reaslied that even if you implement HA and DR with a virtualisation platform, you still have that SPOF, which is...  your virtualisation platform vendor.

So the question now is, what to do about it?  Should companies now consider having a blend of hypervisors to ensure if this does occur again (and it probably will, lets be honest) they could perform a manual failover and conversion somehow to another virtualisation platform.

Food for thought

Guest wrote Re: Perception
on Wed, Aug 13 2008 12:26 AM Link To This Comment

Answer to this is to have a good BC/DR plan. You cannot account for everything, so you prepare for every possibility. I think having a blend of hypervisors is the total wrong approach, you need a good testing methodology that will sift out issues like this.

 

~Gabe M.

Guest wrote what happened to having a SEPERATE ***test* small farm to test updates before applying?
on Wed, Aug 13 2008 4:22 AM Link To This Comment

 

Guest wrote Re: Zero-tolerance policy with VMware ?
on Wed, Aug 13 2008 8:23 AM Link To This Comment

Because they made the claim to be the only enterprise ready virtualization platform. They bragged about their stability due to the number of years and systems they have been deployed. And they charge twice as much as the nearest competitor. Exchange was hardly the most expensive e-mail solution at the time of SP3, it was not cheap but it was not the most expensive. When you pay good money for something you do so because you expect it to be a better product. How many of the Citrix buggy hotfixes shut down entire systems or crippled all of its advanced (pricey) functionality.

Guest wrote Re: Perception
on Wed, Aug 13 2008 8:30 AM Link To This Comment
So every customer should have been expected to test something nobody could have expected. They don't have access to the code, so they could not have known there was going to be a time based kill switch on production code they had a license for. It would be like buying Windows 2003 Server for your system, running it for months and one day it wont turn on because a date had passed. No admin would have ever thought of testing for that. Now if it was one of those 180 day trial copies that they installed then they would know that the day would come when there system would cease to function. Putting the blame on the customer for lack of testing is pure bull@hit. VMware shoulders all the blame for not catching this bug, surely someone at their company intentionally placed this code in the product, but then everyone forgot about it before they released it to the entire world. And not as a beta, but as production ready. Yeah, if everyone had waited a month before they deployed it to all their systems then it would not have impacted anyone. Then again, if nobody deployed it who would have uncovered to problem. Or what if it happened to be September 12th, or October 12th that it took effect.
Guest wrote Re: what happened to having a SEPERATE ***test* small farm to test updates before applying?
on Wed, Aug 13 2008 8:38 AM Link To This Comment
Don't be retarded. This was not a buggy NIC driver, system instability at high load, or advanced feature that screwed up a vdisk that we are talking about here. If I had a test farm and put this code on it for a week and hammered away at it I would not have seen this issue if I completed my testing before August 11th. So no testing plan or test environment would have changed the results. VMware needs to make sure they include an uninstall feature in their ESX software. If I install an upgrade and it totally tanks my environment, let me uninstall it and roll back to the previous version. That ability would have saved everybody in this situation. Except for those that deployed this new version right off the bat.
Matt Dean wrote Re: what happened to having a SEPERATE ***test* small farm to test updates before applying?
on Wed, Aug 13 2008 8:41 AM Link To This Comment

Yeah, small test environment....

Software is released on July 28th, I install it in my test farm August 2nd and test for a full week and then install it into production on August 9th.  Then August 12th, stops working...  all because I wasn't savy enough to roll my clocks into the future to test for time based licensing code errors.

good call.

 

Fortunately, this didn't happen to me, but perfectly reasonable for someone who was waiting on those new features.

Shanetech wrote Re: what happened to having a SEPERATE ***test* small farm to test updates before applying?
on Wed, Aug 13 2008 8:48 AM Link To This Comment

I would hope anyone in an enterprise environment would allow for more than 30 days of testing before deployment into production, hell we cant even get out of qualification in 45 days.....I think this probably had a bigger impact on those who rolled Update 2 out of the gate and those smaller shops with point and click happy update & defrag engineers

Guest wrote Wow
on Wed, Aug 13 2008 8:49 AM Link To This Comment

Seriously. Who applies patches immediately without testing? Especially a large update to the core of your infrastructure. Basic Administration folks....

For most Engineers, you wait for the user community to test non critical patches. Thanks everyone for keeping me from installing update 2 !

 

Guest wrote Re: Perception
on Wed, Aug 13 2008 9:22 AM Link To This Comment

Perhaps what I was trying to say was not clear. There is no way you can catch everything, even through "proper" testing, which necessitates a good DR plan. Not trying to take away or shift responsibility, but finding it interesting that people are acting like this was intentional. I am sure there will be opportunists that will use this time to spin how incompetent VMware is, even though history dictactes otherwise.

 ~Gabe M.

Guest wrote Re: Wow
on Wed, Aug 13 2008 12:35 PM Link To This Comment

I can't believe the number of people who think this is an "administrators fault" for not testing appropriately?!?!?  This is a royal f-up on VMWare's part.  Not every admin works in a perfect environment that managment gives 30 days to test patches and updates before rolling them out....does this mean your 1 month behind in all your MS security patches?  In that case my friend your out of compliance with just about any validation organization in the world.

 As the previous poster mentioned...this isn't a simple patch that effected printing on a single published application, or an OS hotfix that causes 1 server to Blue Screen.  We're talking about hypervisors and multiple instances.  VMWare should be ashamed of their QA and regression testing.  From the looks of it, they do all their testing on eval licensed software....maybe they're the ones that need a lab with a small farm in it!!!!

 

Guest wrote time issue
on Wed, Aug 13 2008 12:46 PM Link To This Comment
I think one thing to take out of this - perhaps we should add 'adjust system clock to future date' to our testing routines.  Sounds like a stupid thing to have to do - but would take very little time to test and may save some major headaches.
Shanetech wrote Re: Wow
on Wed, Aug 13 2008 3:24 PM Link To This Comment
"We're talking about hypervisors and multiple instances"......Exactly why I would hope Administrators would have a defined, documented, & approved qualification plan for existing implementations, because as you said, this isn’t just a Hotfix that causes a blue screen, therefore why should it be in lock step with the other technologies... Let’s face it, there has no doubt been a paradigm shift with regards to MS patching and Anti-Virus Installation. 8 Years ago, only a small percentage of Servers had A/V installed and were patched regularly (especially in the enterprise), now, speaking from our environment, we are about 30 to 45 days out from Black Tuesday release. Zero day exploits are certainly handled differently... I feel we have been lucky (where I work)...MS05-019 was just one case of what it could have been...From my perspective, the remediation of 80,000 pc's is just as much of an impact, if not more than losing an ESX host farm(s).In the Enterprise I would hope that one would look at the host as a disposable commodity, I know we certainly do. Yes, this could have impacted us, however "if" it did (mind you we can barely get 3.5 update 1 to press) we also would have had a sound DR methodology.

Your point is valid.. This is no doubt a shot to VMwares reputation and definitely exposed some shoddy QC/QA processes. However this also should be a lesson to us and that trusting our vendors and upgrading to the latest and greatest “just because” will always set you up for a fall.. Its just like drinking and driving.. Its not a matter of if you get caught it’s when……  :-)

 

Jason Conomos wrote Re: what happened to having a SEPERATE ***test* small farm to test updates before applying?
on Wed, Aug 13 2008 6:16 PM Link To This Comment

I like this point especially.  Many other peices of major software in the enterprise has a method of rolling back the patch/update in case something unforseen occurs.  VMware does not include this functionality which make this problem much larger.  Why is this not included?

Guest wrote Expect more to come
on Wed, Aug 13 2008 6:57 PM Link To This Comment

Expect VMware to have more issues like this. Up until now they have been free to move at their own pace because nobody else in the marketplace was close to them and they felt no pressure to push the envelope faster than they felt comfortable doing so. Now with the increased competition from Citrix and Microsoft they are trying to sustain their products competive advantage. The at your own pace approach to software development reduces the chance of these kind of bugs making it into release code. But start rushing to add more and more features more quickly than one is accustomed to will cause issues like this bug to be missed until it is too late.

I would suggest that their development methodologies and how their developers work together needs to adjust to cope with the new pressures to release updates to their software in a more competive environment.

Guest wrote Re: Perception
on Thu, Aug 14 2008 1:58 PM Link To This Comment

Dear Mr. bull@hit,

I've tried emailing you at the above address.  Please check your mail server at the .hit domain please.

Guest wrote not a perfect company
on Fri, Aug 15 2008 8:26 AM Link To This Comment

Lot's of VMware haters were waiting for something like this to happen.  This was a royal screw up on VMware's part.  But you have to remember there were several work arounds to this issue.  Not a preferred method, but yes a workaround for keeping up your VMs up and running.  Like someone else that posted, I also recieved both a call and an email the morning of the incident.  Long term, VMware may have some additional mess ups, but so does every other software company. 

To reply to another post, I would also like to point out through our TCO/ROI analysis, we did find Citix/Xen cheaper at face value, but just with the higher density we saw with VM's VMware ended up being cheaper after all.  MS Hyper-V was a joke, less features and addition cost to come close to what VMware does.  I can foresee a lot of bullets flying through the security of this hypervisor as it gets implemented more.  I don't even want to get into advanced features and the management suite offered by VMware.  We are going to also purchase Site Recover Manager, no other hypervisor vendor offers anything close to this. For me, VMware is the undisputed leader in the industry right now and the non-influenced vision they carry is taking them in the right direction. I just hope they follow through on some of the next generation features of VI.    

Guest wrote Re: not a perfect company
on Fri, Aug 15 2008 2:25 PM Link To This Comment
Very well stated.  Our company found the same thing.  Lots of fancy talk from the Citrix/XenApp folks about the entire suite and it's options, but WAAAY to complicated to implement with any unique variables and not nearly as scalable as VMware's solution.  We didn't get hit by the bug since we didn't implement the patch, but we watched the Communities thread as things developed.  Funny how - when it comes to hypervisors and virtualization in general - folks want to treat Microsoft and Citrix as the disadvantaged minority female disabled veterans....

(Note: You must be logged in to post a comment.)

If you log in and nothing happens, delete your cookies from BrianMadden.com and try again. Sorry about that, but we had to make a one-time change to the cookie path when we migrated web servers.