
Memory ballooning
Memory ballooning tells the guest operating system that it does not have enough memory from the host so that the guest operating system could free some of its memory. When there is a memory crisis, the hypervisor tells the balloon driver to request some number of megabytes from the guest operating system. The hypervisor knows that pages occupied by the balloon driver will never store data, so the pages of pRAM requested by the balloon driver can then be reallocated safely to other VMs. It is the guest operating system's call to decide which pages of vRAM it should allocate to the balloon driver, and it will start with free pages. If it has plenty of free or idle guest physical memory, inflating the balloon will induce no guest-level paging and thus it will not affect guest performance. However, in the case of memory contention within the guest, the VM OS decides which guest physical pages are to be paged out to the virtual swap device in order to satisfy the balloon driver's allocation requests.
The balloon driver reclaims the guest operating system's allocated memory using Idle Memory Tax (IMT). IMT may reclaim up to 75 percent of idle memory. A guest operating system page file is necessary in order to prevent guest operating system kernel starvation. The vmmemctl driver should aggressively reclaim memory due to severe host contention (make sure that the guest operating system page file is at least 65 percent of the configured vRAM). Even here, the guest operating system can make intelligent guesses about which pages of data are least likely to be requested in future. (You'll see this in contrast with hypervisor-level swapping, which is discussed next.) Look at the following pictorial representation of memory page mapping to host memory:

Host-level swapping in ESXi is not sufficient to reclaim memory during TPS and ballooning. To support this, when you start a VM, the hypervisor creates a separate swap file for the VM. This is primarily because if it frees pRAM for other VMs, the hypervisor can directly swap out vRAM with the swap file.
Swapping is a guaranteed technique to reclaim a specific amount of memory within a specific amount of time. However, you should be concerned about host-level swapping because it can severely penalize guest performance. This occurs when the hypervisor has no knowledge about which guest physical pages should be swapped, and the swapping might cause unintended interactions with the native memory management policies in the guest operating system. The following is a pictorial representation of host-level memory page swapping:
