Efficiency: The great majority of the machine instructions must be
executed without intervention of the VMM (i.e. without trap).
They classified instructions into 4 classes:
- Privileged Instructions: instructions that if executed in user
mode trap to kernel mode, but if executed in kernel mode they do not trap.
- Control Sensitive Instructions: instructions that modify the
system registers. Examples of such instructions in x86 are:
PUSHF,POPF,SGDT,SIDT,SLDT,SMSW.
- Behavior Sensitive Instructions: instructions whose behavior
depends on the mode or configuration of the hardware. Examples of such instructions on
x86 are POP, PUSH, CALL, JMP, INT n, RET, LAR, LSL, VERR, VERW, MOV.
- Normal Instructions: The remaining instructions.
Then they claimed the theorem
Theorem: An architecture is virtualizable if its sensitive
instructions are a subset of its privileged instructions.
The IBM/370 is a directly virtualizable architecture, x86 is not. Yet our
discussion will focus mainly on x86.
Hardware Virtualization
To virtualize a machine we need to worry about
- Virtualize the processor control functions and the privileged instructions
- Virtualize memory and virtual memory
- Virtualize devices and I/O
CPU Virtualization
For privileged instructions we have seen that they result in a trap to the VMM where
their effect has to be simulated. The greatest problem are the sensitive instructions
that are not privileged. The solution for them, and for some privileged instructions,
is Binary Translation (see below) as introduced for example by VMware, and
Para Virtualization (see below) as introduced for example by XEN.
Memory Virtualization
Most modern architectures have Memory Management Units (MMU) and
Translation Lookaside Buffers (TLB). Memory is divided in pages of equal
size that are mapped to physical memory frames of that same size. The
mapping from page id to frame id is done by the TLB, if that page id was
in the TLB, or by a slower translation involving page tables [in x86 two
levels of page tables]. Then if the page is not in physical memory we have
a page fault. When virtualizing memory the guest operating system
cannot manage the page frames. That has to be done by the VMM.
Actually the guest OS can use the page tables in address translations, it
just cannot manage itself the frames associated to pages.
As is the
case in the switch between address spaces in a single OS, the TLB needs to
be flushed when moving from a space to another. Or the virtual addresses
need to be tagged with ids for the address spaces. The same is true in the
case of switches between VMs. x86 is moving to tagged virtual addresses in
the TLB.
Device and I/O Virtualization
The VMM usually provides the VMs with a set of virtual devices
(virtual device drivers)
corresponding to the usual devices but allowing the VMM to schedule use
and manage interrupts. For example a virtual Network Interface Device
(NIC) may allow communication between co-residing VMs at memory speed and
not mediated by networks. The management of interrupts is particularly
complex since guest OSs may enable/disable interrupts frequently
and the VMM, to reduce latency, may desire to map directly interrupts to
guest OS handlers.
Binary Translation
Binary translation could be static or dynamic. Static binary translation means using a
processor to translate an image from an architecture to another [or may be the
same architecture but with constraints on the instructions used or the way they
are used] before execution. It
is not relevant to our current discussion which centers on dynamic binary translation.
In dynamic binary translation individual instructions or groups of instructions are
on the fly translated and the translation is cached to allow for reuse in iterations
without repeated translation. To be more precise, a segment of the the original code
is first executed in an interpreted mode and collected as a segment. It is determined
if the code has been execute at least N times, if so, if not already translated it is
translated, and it is executed and from now the translated code will be executed.
Para Virtualization
Where the original intent for a VMM and VM is to run an unmodified guest OS on the VM
as if it were executing directly on the hardware, an alternative is to modify the
guest OS replacing hard to virtualize instructions and/or providing more convenient
abstractions especially of IO devices. Paravirtualization often results in more
efficient code, but of course limits the range of usable guest OSs and the convenience
of their use.
Evolution of x86 Architecture to support Virtualization
Recent x86 systems try to facilitate virtualization by dealing with the
sensitive instructions that are not privileged and by facilitating memory
management with hardware. They have introduced a new mode, root mode.
Now the mode will consist of the usual four levels paired with a
normal/root mode. All sensitive instructions cause transfer to the root
mode.
Hosted Virtualization
Up to now we have talked of virual machines as executing guest OSs
on top of a VMM on top of the hardware (bare-metal virtualization).
Alternatively a VM and its guest
OS could execute on top of a host OS (hosted virtualization). That is the
case for example with
the VMware Player. One of the images executing on a host OS with the VM
ware Player is Hadoop, an open source extension of Google's Map-Reduce
framework.