Returning to investigations

Returning to last week’s investigations regarding VMware virtual machines and Linux recent kernels.

Back from Munich

31467058443_cb98c2a0d0_mBack from old Munich. This is a beautiful city that I discovered for the first time (I only knew the airport before). Nice people too. And I come back with a Red Hat šŸ™‚ and some other nice photos.

Back to trying to figure out the various issues with VMware and recent Linux kernels. I’ll probably need to check with my management if it’s OK for me to spend so much time on this. But for now, I would really appreciate if things did work. I made some progress this week-end, and got my internal partition to boot.

Reminder of the current state

As of late last week, I had the following issues:

  1. Booting a regularĀ VM image file with a recent kernel hangs. I filed a Bugzilla. I bisected the problem to a specific commit, butĀ later realized that the same version could boot or not boot based on “something else” which I have not identified yet. Specifically, a version as old a 4.9 can sometimes fail to boot, and more recent versions that I had marked as “good” also later failed to boot. Something else is at play there.
  2. Booting a physical partition in VMware proved a bit complicated. I added a physical disk with the relevant partitions, but that would not work.

Progress on the two fronts has been a bit slow, but steady.

Physical partition finally boots

To boot the physical partition, after a lot of trial and errors, I finally managed to get it to boot under the following conditions:

  1. Set the firmware to EFI, but adding the following line to the .vmx file:
    firmware = "efi"
  2. Use scsiĀ and not sata for the disk interface ā€“Ā so copying the macOS VM was not such a good idea after all:
    scsi0:0.present = "TRUE"
    scsi0:0.fileName = "PhysicalLinuxPartitions.vmdk"
    sata0:0.present = "FALSE"
    
  3. Wait until network boot fails. Then, and only then, will it attempt to boot from hard disk. I’ve tried to force boot ordering with the following, but it does not seem to help:
    bios.bootOrder = "CDROM,hdd,ethernet1"
    bios.hddOrder = "scsi0:0"
  4. Boot the recovery mode image. The other images (including some I built myself) fail to find the hard disk to boot from,Ā and end up in the dracut emergency shell. I’m a bit puzzled by that, and I want to figure it out. But at least, that gives me a workable physical partition.

Bisecting again

I am not sure yet of what prevents graphical mode from booting under VMware. One thing I noticed by diff-ing a caseĀ that boots OK against a case that does not boot is the following line that only shows when it works:

[ 0.000000] Hypervisor detected: VMware

There are more oddities, e.g. what looks like a bad number of CPUs in the case that does not work:

[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:2065 __generic_processor_info+0x28c/0x370
[    0.000000] Only 63 processors supported.Processor 64/0x80 and the rest are ignored.
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc4+ #24
[    0.000000] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[    0.000000]  ffffffff81e03cb0 ffffffff8132bb47 ffffffff81e03d00 0000000000000000
[    0.000000]  ffffffff81e03cf0 ffffffff81059c26 0000081100001000 0000000000000040
[    0.000000]  0000000000000015 0000000000000000 0000000000000000 0000000000000080
[    0.000000] Call Trace:
[    0.000000]  [] dump_stack+0x4d/0x66
[    0.000000]  [] __warn+0xc6/0xe0
[    0.000000]  [] warn_slowpath_fmt+0x4a/0x50
[    0.000000]  [] __generic_processor_info+0x28c/0x370
[    0.000000]  [] acpi_register_lapic+0x32/0x80
[    0.000000]  [] acpi_parse_lapic+0x46/0x4e
[    0.000000]  [] acpi_parse_entries_array+0xf2/0x14d
[    0.000000]  [] acpi_table_parse_entries_array+0xae/0xd0
[    0.000000]  [] acpi_boot_init+0xdf/0x4a7
[    0.000000]  [] ? acpi_parse_x2apic_nmi+0x46/0x46
[    0.000000]  [] ? dmi_ignore_irq0_timer_override+0x2e/0x2e
[    0.000000]  [] setup_arch+0xafa/0xc00
[    0.000000]  [] ? printk+0x43/0x4b
[    0.000000]  [] start_kernel+0x59/0x3c7
[    0.000000]  [] x86_64_start_reservations+0x2a/0x2c
[    0.000000]  [] x86_64_start_kernel+0x178/0x18b
[    0.000000] ---[ end trace 01d8505d0b85a6ae ]---

Indeed, the default kernel configuration limits the number of CPUs to 64, which is good enough for most people, and should not be a problem under VMware. What this seems to tell me is that the ACPI description coming from VMware reports more than 64 CPUs, which is odd (I understand that you reserve a few spares for CPU hotplug, but not zillions). Maybe this is related to Linux not detecting it’s running within a hypervisor.

With 4.9, where I had both “good” and “bad” boots, I noticed that a build after make distclean and the default config booted again. So I’mĀ attempting a new bisect building like this at every step, and we’ll see where it leads me, if anywhere.

VMware performance

Something is badly affecting my VMware performance. It may be when I have more than one VM running, and the two VMs are busy. Or something I did not identify yet. But it really makes bisecting a bit painful.

Shuttle host

I’m going to try to setup the Shuttle host for comfortable remote and local use.Ā It’s a nice machine, but if I want it to be connected over 1Gb ethernet to the file server, then it has to be in a spot that does not make it very convenient to use as a primary machine, by lack of screen real estate.

I opened the machine this morning to check about a possible RAM transfer from the tower PC. Unfortunately, at 16G, the shuttle is at its max, so I’ll have to leave some VMs on the tower, using Windows as a host (24G).

System-wide update while I’m reading mail and bisecting my kernelā€¦

Built-in remote access with VNC

The built in remote access I get using the default Gnome settings is not compatible with the macOS Screen Sharing application. According to this page, it’s because the encryption used by Screen Sharing is not supported.

Tried to set it to ['none'] using dconf-editor as explained on the page, but neither Screen Sharing nor Chicken of the VNC are happy with it. The first one says that this version of the software is not supported by Screen Sharing. The second one complains about unknown authType 18. This is about as user-friendly as it can get šŸ˜‰

Another option I then changed was “require authentication”. With that off, Screen Sharing no longer complains, but it spins forever, and I never actually see the screen, although the server states that the desktop is controlled by another machine.

OK, finally found the combination that works with Screen Sharing: switch “require encryption” to off, use use [‘vnc’] as allowed method of authentication, and provide the VNC password to Screen Sharing. Weird, but it works. Time to stuff this machine choke full with VMs.

First Shuttle hangā€¦

Ouch, the shuttle PC did a hard hang while I was attempting to drag a window around. That’s a bit annoying. That PC has always been a bit fussy, but I did not expect it to start acting up so quickly. This is an example of my personal experience with PCs, which often are quite fast and inexpensive, but supremely unstable.

Linux is not at fault, it’s generally things like weak connectors, PCI cards that move inside, Ā and I’ve seen the sameĀ stability issues with Windows as well. It’s still pretty annoying. I should not have opened that box this morning šŸ™‚

Trying to install NFS-backed VMs (again)

Since the Shuttle PC is on the same 1Gb loop as the disk server, I’m tempted to try hosting the VMs on remote storage again. Testing with Virtual MachineĀ Manager.

Attempt 1:Ā I use a manual mount to the VM disk file. ItĀ fails with an access permission error from Qemu, stating that it cannot modify the disk as user 107 (qemu). I added the correct permission, but still no go. This may be again related to the SE Linux warnings I gotĀ when trying to mount a disk overĀ NFS with Boxes. But I ran the SE Linux command to enable it.

If I instead select theĀ same location as being an NFS mount point using the storage manager. This works a bit better. However, this time, when starting the VM, I get:

Unable to complete install: ‘Failed to connect socket to ‘/var/run/libvirt/virtlogd-sock’: No such file or directory’

Two nested single quotes :-O

Trying systemctl start virtlogd,Ā which works, and allows me to go one step further, to a rather scary looking error message:

screen-shot-2017-01-16-at-16-25-01

Switching the virtual disk back from SCSI to the original IDE gives me a message complaining aboutĀ “Permission denied” on the ISO file. So clearly, Virtual Machines Manager / Qemu and Boxes do not work the same way here. Boxes is subjected to SE Linux permissions, VMM/QEmu has some other set of rules. Another little recursive chmod on teh server, and I finally get a VM that looks like it boots.

Temporary conclusion: Can’t add a SCSI disk over NFS for now.

But then, it’s really fast! Until it dies, that is. Gnome shell crash in the middle of the installation. Can’t get through.Ā Tried to report the bug, but Gnome shell crashes again before I even get a chance to fill enough information for it to report it šŸ˜¦

So far, I’ve not been very lucky. I’m still trying to find some VM configuration that actually worksā€¦ VMware won’t boot and is quite slow, Boxes dies on me for no reason and does not seem very happy with NFS,Ā Qemu + KVM dies during installationā€¦ OK, every time, I’m trying configurations that are just a little bit off-road, but stillā€¦

Remote access using Spice

Of course, I’m more interested in having remote access using Spice. It would be nice if I could use the OSX work Christophe Fergeau did a while back. The steps are documented here.

Advertisements

Author: Christophe de Dinechin

I try to change the world, but that's work in progress. If you want to know me, google "Christophe de Dinechin". Keywords: concept programming, virtualization, OS design, programming languages, video games, 3D, modern physics. Some stuff I did that I'm proud of: the first "true" 3D game for the PC, HP's big iron virtualization, real-time test systems for car electronics, some of the best games for the HP48 calculator, a theory of physics that makes sense (at least to me).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s