Wednesday, November 16, 2011

Recover from a missing kernel : The Problem

You read right. A missing kernel. Although this sounds terminal, the fix was fortunately simple enough. If you are in a jam the solution is here. But the journey how it got to this is a cautionary tale of "a little knowledge is a dangerous thing'. This is a long post.
A novice sysadmin in a small company had problems with CentOS VMs on VMWare ESX version 3. I had set it up for them a few years ago and had been maintaining it for a while after that. I recommended to the management to send their sysadmin for training on VMWare administration, even if it wasn't for certification. They agreed on principle but never did anything about it. Don't get me wrong. The guy was smart. He shadowed my work and understood what I was doing and knew to ask questions when he didn't. Not formally trained but experienced in administrating services (e.g. Samba, printing), I think it was a normal progression for him to take on more work closer-related to installation and configuration.

The VMware config - Linux Kernel Dance
I had not heard from them for a while when I began getting calls about "network problems". A quick look and I figured out the VMs that was running their DHCP and DNS servers had frozen up (if you are wondering, the  Magic SysReq key a.k.a. Ctrl-Alt-SysReq BUSIER works in the VMI console). Apparently the VMs were running out of resources with the CPU hitting and sustaining 100% average utilization. There weren't many VMs on the server and being Linux, I knew I could cram more than they were running currently. A more closer look revealed that it was caused by vmware-tools not being loaded. It wasn't being loaded because the sysadmin had updated the kernel but not reconfigured vmware-tools. This was happening for some time despite the message during bootup warning him about it.
I call this the linux vmware-config dance. For reasons known to VMWare, Linux is a second-class citizen. Even though, VMware ESX and ESXi and their flagship product VSphere, run on a Linux kernel, Linux support comes second in everything. The all-powerful VMware Machine Interface (VMI) client is Windows only. Don't point to that pathetic web-based management system. On Linux, we could start and stop servers but console access is broken or at best, works sporadically. We can't even create a VM using it. It's better with VMware Server Free (previously ESX). The web interface provides full access but it requires an ssl 1.0 support which is insecure and requires manual parameter configuration in Firefox to work .
The Vmware-tools service provides the kernel optimized access to memory, disk and network access. If it's not running, the VM can't do things like share memory with other VMs. Basically, it'll run slower and eat up more memory. And apparently, run it long enough, some resource gets gobbled up bit by bit without being properly released. The kicker is that since re-configuring vmware-tools affects network access, you can't do it remotely via ssh. It must be done via the console either through the web interface (VMware Server Free only) or the VMI for the paid stuff.
VMWare requires that vmware-tools to be reconfigured every time there is a kernel update. Updated kernels need to be loaded first, so a reboot is required. If you use a RedHat or SuSe kernel, the related Vmware-tools modules will load ok. But if it doesn't it'll recompile the modules. So you will need gblic and at least kernel headers to recompile. Depending on the distro, you may need to load kernel sources to get the headers. It's also good to restart the server after reconfiguration and reloading of vmware-tools to test whether there are knock-on effects on other modules. So to recap: restart the machine to load new kernel, reconfig vmware-tools and restart to cleanly load and test vmware tools and other modules. Now times that with the number of VMs you have and look at spending a lot of time doing this.

Recover from a missing kernel : The Solution

This is part two of two parts. You can read about the problem here.
The Solution
The solution was simple. I needed to install a new kernel.
I found that the sysadmin had an iso of the CentOS installation DVD on the VMWare server's datastore. The beauty of most modern distros is that their installation CDs or DVD come with a Repair Mode boot option. I modified the VM's setting to mount the iso as a cdrom for the VM. You may also have to change the VM's BIOS boot options to boot the CD-ROM drive before the hard disk. The VM's settings under Boot has an option to boot straight into the VM's BIOS setting. By default, the wait is too short for you to press the F2 key to enter the BIOS.
So I booted in to installation DVD's repair mode. It was all automatic. That is one of the nice things about using a VM environment: no hardware issues. Your distro supports them on bootup or not (commonly the network interface driver). CentOS found the network interface and configured it, found and mounted volumes and offered advice as to how to chroot to the mounted disks. Which I took. This makes the system think the root directory is the one mounted and not the DVD. Basically it boots into your system from the DVD and then makes the system think that it booted from the hard disk. Other then the running kernel, everything else is going to be loaded from the hard disk. /lib, /usr and /etc were where they should be. If there is no major incompatibility with the kernel, the existing utilities should run fine. I found yum was running ok. Why not? All the rpm databases and config files were right where it expected them to be. I installed latest kernel with yum. No problems because the network card was detected and was up. Once installed, I shut down the VM, removed the ISO from the CD-ROM settings and restarted a-okay.

Tuesday, November 01, 2011

The introduction to the comments section on http://www.ritholtz.com :
Please use the comments to demonstrate your own ignorance, unfamiliarity with empirical data, ability to repeat discredited memes, and lack of respect for scientific knowledge. Also, be sure to create straw men and argue against things I have neither said nor even implied. Any irrelevancies you can mention will also be appreciated. Lastly, kindly forgo all civility in your discourse . . . you are, after all, anonymous.
It goes without saying..

Recently Popular