Virtual Machines

In the '60s people came up with the idea that it would be desirable to allow different users to access a computer as if they had each total control of the physical hardware. IBM in 1972 introduced the System/370 computer system together with the VM virtual machine operating system. This operating system, which now we would call a Virtual Machine Monitor [VMM] or Hypervisor, runs on the physical machine and provides to the programs (operating systems) that run on it the same environment as if they were running on the physical hardware. That means that the VMM creates for each guest OS in virtual form all the facilities of the underlying hardware. In this discussion we conside only virtual machines that are identical/similar to the underlying hardware, not to virtual machines as in "Java virtual machine".
Virtual machines remained limited to high end hardware and special needs until 1998 when VMware came up with a different strategy for building virtual machines. We will talk of VMware later.

Why Virtual Machines?

Here are some reasons that make VMs desirable:

Principal Obstacle to Virtual Machines

The VMM has to run in kernel mode so as to be able to execute privileged instructions and be protected from all other software. The x86 architecture provides four levels of protection, 0,1,2,3, with 0 used by the kernel, 3 by application software, and 1 and 2 unused. The kernels of the guest operating systems, by their nature, are supposed to run in kernel mode, but on a VM so as not to interfere with the VMM they must run in user mode (or at least in mode 1 or 2). But they execute privileged instructions. These instructions that were intended to execute in kernel mode but are actually executed in user mode will cause a trap to the kernel mode, thus to the VMM. This switching from user mode to kernel mode on each privileged instruction executed by the guest OS is expensive, but is not the greatest problem. The greatest problem is that there could be instructions whose behavior depends on the mode of the hardware (this is the case for the x86 architecture, up to and excluding the most recent models) or may allow the guest operating system to modify systems status or to recognize that it (the guest OS) is not executing at its intended mode.
The difficulties of virtualization can be reduced in the case of OS Virtualization, that is in the case that we use only guest operating systems that are identical to the host operating system.

Popek and Goldberg Virtualization Requirements

Popek and Goldberg in 1974 defined the requirements for an architecture to efficiently support virtualization. They enunciated three requirements for the VMs provided by a VMM:
  • Equivalence: A program running on a VM should behave essentially identically to a program executing directly on the hardware.
  • Resource Control: The VMM must be in total control of the virtualized resources.
  • Efficiency: The great majority of the machine instructions must be executed without intervention of the VMM (i.e. without trap).

    They classified instructions into 4 classes:

    1. Privileged Instructions: instructions that if executed in user mode trap to kernel mode, but if executed in kernel mode they do not trap.
    2. Control Sensitive Instructions: instructions that modify the system registers. Examples of such instructions in x86 are: PUSHF,POPF,SGDT,SIDT,SLDT,SMSW.
    3. Behavior Sensitive Instructions: instructions whose behavior depends on the mode or configuration of the hardware. Examples of such instructions on x86 are POP, PUSH, CALL, JMP, INT n, RET, LAR, LSL, VERR, VERW, MOV.
    4. Normal Instructions: The remaining instructions.
    Then they claimed the theorem

    Theorem: An architecture is virtualizable if its sensitive instructions are a subset of its privileged instructions.

    The IBM/370 is a directly virtualizable architecture, x86 is not. Yet our discussion will focus mainly on x86.

    Hardware Virtualization

    To virtualize a machine we need to worry about

    CPU Virtualization

    For privileged instructions we have seen that they result in a trap to the VMM where their effect has to be simulated. The greatest problem are the sensitive instructions that are not privileged. The solution for them, and for some privileged instructions, is Binary Translation (see below) as introduced for example by VMware, and Para Virtualization (see below) as introduced for example by XEN.

    Memory Virtualization

    Most modern architectures have Memory Management Units (MMU) and Translation Lookaside Buffers (TLB). Memory is divided in pages of equal size that are mapped to physical memory frames of that same size. The mapping from page id to frame id is done by the TLB, if that page id was in the TLB, or by a slower translation involving page tables [in x86 two levels of page tables]. Then if the page is not in physical memory we have a page fault. When virtualizing memory the guest operating system cannot manage the page frames. That has to be done by the VMM. Actually the guest OS can use the page tables in address translations, it just cannot manage itself the frames associated to pages. As is the case in the switch between address spaces in a single OS, the TLB needs to be flushed when moving from a space to another. Or the virtual addresses need to be tagged with ids for the address spaces. The same is true in the case of switches between VMs. x86 is moving to tagged virtual addresses in the TLB.

    Device and I/O Virtualization

    The VMM usually provides the VMs with a set of virtual devices (virtual device drivers) corresponding to the usual devices but allowing the VMM to schedule use and manage interrupts. For example a virtual Network Interface Device (NIC) may allow communication between co-residing VMs at memory speed and not mediated by networks. The management of interrupts is particularly complex since guest OSs may enable/disable interrupts frequently and the VMM, to reduce latency, may desire to map directly interrupts to guest OS handlers.

    Binary Translation

    Binary translation could be static or dynamic. Static binary translation means using a processor to translate an image from an architecture to another [or may be the same architecture but with constraints on the instructions used or the way they are used] before execution. It is not relevant to our current discussion which centers on dynamic binary translation. In dynamic binary translation individual instructions or groups of instructions are on the fly translated and the translation is cached to allow for reuse in iterations without repeated translation. To be more precise, a segment of the the original code is first executed in an interpreted mode and collected as a segment. It is determined if the code has been execute at least N times, if so, if not already translated it is translated, and it is executed and from now the translated code will be executed.

    Para Virtualization

    Where the original intent for a VMM and VM is to run an unmodified guest OS on the VM as if it were executing directly on the hardware, an alternative is to modify the guest OS replacing hard to virtualize instructions and/or providing more convenient abstractions especially of IO devices. Paravirtualization often results in more efficient code, but of course limits the range of usable guest OSs and the convenience of their use.

    Evolution of x86 Architecture to support Virtualization

    Recent x86 systems try to facilitate virtualization by dealing with the sensitive instructions that are not privileged and by facilitating memory management with hardware. They have introduced a new mode, root mode. Now the mode will consist of the usual four levels paired with a normal/root mode. All sensitive instructions cause transfer to the root mode.

    Hosted Virtualization

    Up to now we have talked of virual machines as executing guest OSs on top of a VMM on top of the hardware (bare-metal virtualization). Alternatively a VM and its guest OS could execute on top of a host OS (hosted virtualization). That is the case for example with the VMware Player. One of the images executing on a host OS with the VM ware Player is Hadoop, an open source extension of Google's Map-Reduce framework.