Planning for existing systems
If you will be moving existing software loads into your new Hyper-V Server cluster, you'll need to determine what sort of resources they'll need to have access to. If you followed the earlier chapter, you've already determined which workloads cannot be virtualized. If not, make that determination before you start gathering metrics.
Deciding how you will virtualize physical systems
There are two basic approaches to moving physical systems into a Hyper-V Server cluster. The first is to perform a physical-to-virtual (P2V) conversion. The P2V process is a direct transfer of an operating system environment installed directly on hardware that copies it into a virtual equivalent. Aside from the hardware components, the server and installed applications do not change. The second approach is to create an empty virtual operating system environment and migrate applications and data.
Both P2V and migration approaches have their merits and drawbacks. P2V has a much higher failure rate but the process is usually straightforward. Microsoft does not provide any free method to perform a complete P2V function. It is available in System Center Virtual Machine Manager 2012 but has been removed in R2. You can use Disk2Vhd, part of Microsoft's Sysinternals suite, to create a VHD from the disks of a physical system, although it has a number of limitations. You can then create a new VM and attach the VHD. You can find this tool at http://www.sysinternals.com. Note that P2V conversions may have operating system and application licensing implications.
Migration is usually more involved but results in a clean system. The basic process is to build a completely new virtual machine or copy from an empty template. Applications are migrated according to their manufacturer's instructions. You can contact any relevant software or systems support teams for assistance.
This book will not directly cover either approach, as its focus is on the operation of Hyper-V Server in a cluster.
Determining requirements for existing systems
For many software applications and servers, you can work with the vendor to come up with a configuration for virtual operating systems that will handle the load. Unfortunately, few environments are that simple. It is highly likely that at least a few systems are handling multiple applications simultaneously. Other systems may host applications that have been designed in-house and as such have no formal requirement lists. It is tempting to simply duplicate the parameters of existing deployments into virtual counterparts. This will certainly address the issue, but because it is common to over-provision standalone systems, it is almost as certain to waste resources. Since it is fairly trivial to add power to a virtualized operating system, it is best to size individual virtual machines to their anticipated load. Build your hosts to match those expectations with some extra capacity.
There are two basic approaches to sizing virtualization hosts for existing physical workloads. The first, and easiest, is to use the Microsoft Assessment and Planning Toolkit. This is a largely automated system that measures your existing workload and then compares it to a defined hardware set. The second method, which requires more effort on your part but provides more detailed information, uses Performance Monitor to gather usage statistics. Either choice has its own benefits and drawbacks, so you may select a hybrid approach.
Microsoft Assessment and Planning Toolkit
The Microsoft Assessment and Planning Toolkit (MAP) is a free solution accelerator that is intended to aid you in planning for several scenarios. Two of these are server virtualization and desktop virtualization.
The toolkit is an installable application. It is periodically updated, so the best place to look for installation requirements and downloads is on its primary TechNet page: http://aka.ms/map. The tool will run on a single computer and remotely scan all computers that you ask it to. It will store its results in a database; if you don't indicate otherwise, it will create a local instance of Microsoft SQL Server 2008 Express for the purpose. For ease of use and to ensure that the toolkit itself does not interfere with results, it is recommended that you run it from a management workstation and allow it to create the local database.
Not urgent to remove, but this was mentioned in the previous paragraph. The following instructions and images were taken from Version 8.5. If you are using a later version, don't worry. The tool is well-designed and you should have little trouble finding what you need. If you are using an older version, it may not support current products, so you are encouraged to update prior to continuing.
To use MAP to prepare physical machines for deployment as Hyper-V Server guests:
- Download and run the installation package. If any prerequisites are missing, the installer will stop and provide links. You'll need to meet those prerequisites and then restart the installer.
- On the tool's first run, it will ask you to create a database or connect to an existing one, as shown in the following screenshot. Data that is collected about your environment will be stored in this database and can be referred to or extended at any time. While not covered in this book, the tool can be used for other purposes besides physical-to-virtual planning, so you may wish to use a generic database name. Upon giving your database a name and optionally a description, click on OK.
- After the database has been created (or populated if you connected to an existing one), you will be left at the primary screen. Feel free to explore the application as much as you wish. This book only discusses server virtualization, so when you are ready to proceed, switch to that tab. The Server Virtualization tab is partially shown in the following screenshot:
- There are five possible steps to perform. You're likely to only run four; you probably won't need both steps 4 and 5. Begin by clicking on 1 Collect inventory data.
- The inventory screen, partially shown in the following screenshot (the missing portion includes only the Previous, Next, Cancel, and Finish buttons), displays all possible inventory scenarios for the application. For the purposes of server virtualization, the first three are the only ones that will be used in this process, although you can include any in your inventory that you wish. Once you have made your selections, click on Next.
- The next screen, not shown, lists the methods that MAP can use to discover and connect to network computers. These should be fairly self-explanatory. In general, you'll want to choose AD DS (Active Directory) for domain computers, Windows protocols for non-domain-joined Windows computers, and IP ranges to find non-Windows computers. Firewalls will need to be opened on the target machine(s). Refer to the help files if you need further assistance. The subsequent screens will ask you for the necessary information to satisfy the discovery methods that you chose. Windows targets will use the credentials you specify for WMI. Linux and UNIX targets will use the credentials you specify for SSH. Ensure that an SSH server is available on those units and that they only require password authentication and do not use SSL keys; confer with the SSL server documentation in use on your distribution for more information. Upon clicking on Finish, the discovery will run.
- Once the inventory process has completed, close that window to return to the main screen, which should still be on the Server Virtualization tab. Click on 2 Collect Performance Data.
- This screen, also not shown, asks you to choose whether to run performance metrics for Windows machines, Linux machines, or both. You'll also need to select the amount of time for the performance trace to run. The longer the gatherers are allowed to run, the more thorough the results will be. It is recommended that you allow them to run for a week or two over a time when standard workloads will run and at least one expected heavy workload will occur. The following screens will look similar to those from the inventory screen, although now you'll be able to choose from computers that were already discovered. Continue through to the last screen.
- The final screen of the performance scan should simply indicate that it is running on the designated machines. However, it will also notify you if there are any problems connecting to or operating on any of the targets. You can close this window and the scan will continue running. However, you must leave the main application open for the duration of the scan.
- Once the scan completes, you can run through 3 Create Hardware Configuration, where you can edit or define the hardware that you'll be running your comparison against. You can choose to edit or create single hosts or infrastructures, which adds in shared storage. The problem with the infrastructure option is that you must set a minimum of four hosts. If you'll be creating a two-node cluster, do not select infrastructure; a failover event will reduce your cluster to one operative node, so you need to ensure that a single node can handle your projected workload.
- When prepared, click on 4 Run the Server Consolidation Wizard. On the first screen, select the option for Windows Server 2012 Hyper-V and click on Next. You can edit hardware on the next screen much like under option three.
- After the hardware screens, you'll be presented with the Utilization Settings page, shown in the following screenshot. Select your target utilization percentages; they default to 100%, but you'll want to reduce them at least to a percentage that will allow for one or more host failures, especially if you intend to add virtual machines later.
- On the Computer List screen, as seen in the following screenshot, you'll be given a choice between selecting computers from a previous scan or importing from a file. Accept the default choice and continue to the next screen. Notice here that the tool can identify some special load types. Choose the computers that you want to include in the plan and click on Next.
- The assessment will run. Click on the Close button when it completes. You will be returned to the main window. Underneath the Scenarios section, shown as follows, will be listed the number of hosts you'll need to handle the indicated workload with the hardware that you've specified.
- If you click on the panel, you'll be taken to a screen with more details about the plan. Underneath the synopsis is a Host Summary section that will show you the projected utilization percentages. An example image is shown as follows:
Performance Monitor
The most comprehensive way to plan for the existing physical loads is to track their performance in Performance Monitor. If you're not familiar with this tool, turn to the chapter on performance monitoring for usage instructions. As you would with MAP, use a fairly idle computer to collect performance metrics from the systems you plan to virtualize and have it track all systems for a representative amount of time. One to two weeks usually provides a fairly accurate performance picture. Unlike MAP, Performance Monitor only works on Windows systems. Third-party monitoring tools are also available if Performance Monitor cannot provide the results you need.
The Performance Monitor metrics that are of most value are as follows:
- Memory: Available Mbytes
- Memory: Pages / sec
- Network interface: Bytes total / sec
- Physical disk: % Disk Time
- Physical disk: Avg Disk Queue Length
- Physical disk: Disk Bytes / sec
- Physical disk: Disk Transfers/sec
- Processor: % Processor Time
- Server work queues: Active Threads
The above metrics were taken from a Windows Server 2012 computer; other Windows versions may be somewhat different, but they will all have similar objects. Also, there may be value in getting separate read and write measurements from disk objects and separate send and receive measurements from network objects. This is because there are ways to balance disk and network configurations that favor one over the other. Balance is usually preferred in a virtualization environment due to load variety, but if your workloads seem to favor a particular direction, you may choose to architect to accommodate that.
Of course, these metrics are generic in nature. If you anticipate a certain need that these metrics won't capture, feel free to design any tracking method that monitors the objects you are interested in. For instance, if you are considering the migration of a specific application and not its entire server, you can set monitors on that application's activity. Additionally, applications can create and register their own counters that Performance Monitor can track. Active Directory and Exchange Server are among these.
General approaches to reading the metrics
What you're mostly interested in is how the machines behave on average. Peak activity periods will come and go, but unless they are sustained you may not want to consider them too strongly. For instance, if a CPU spikes to 100% for three or four seconds ten times a day and is otherwise idle, there is no need to consider that a heavy load. If it hovers above 80% for an hour, that is probably substantial.
If there are periods of high activity, ensure that you understand what they are before adjusting your plan to accommodate them. As an example, you may find regular intervals of very high disk activity. If you simply size a disk subsystem to handle the load, you might find that you have spent extra money on a high-powered SAN just to speed up backups that occur when no one is using the system anyway.
Also, remember that your cluster will be a shared resource environment. If you determine that twenty separate systems need a combined 2,000 disk transfers per second on average but did not note that their peak workloads never occur simultaneously, you may inadvertently oversize your system.
Memory measurements
Your primary goal with memory metrics is to determine how much memory to allocate to a system once it is virtualized. The easiest way seems to be to compare how much physical memory it has to how much physical memory it uses. However, this simple metric may hide a memory starvation issue. If the Pages/sec metric is high and available memory is low, the machine may not have enough physical memory. You could also track for page file usage, but unfortunately this isn't a great metric for determining just how much extra memory the system needs. If possible, try to resolve this prior to virtualization. If it is not possible, make a note that this load will probably benefit from additional memory once virtualized.
Conversely, if available memory is high and remains high, the system has more memory than it needs to perform its role. You may consider reducing the memory assigned to it once it is virtualized. Chapter 7, Memory Planning and Management, contains many additional details about memory, including how to use Hyper-V Server's Dynamic Memory feature for workloads with varying demands.
Network measurements
These are fairly straightforward and easy to understand. Systems with high networking utilization need to be carefully considered as they may not make the best candidates for virtualization. Hyper-V Server 2012 can employ quality of service controls to keep them from choking out other systems, but they may be best left on their own systems. However, if you have one high utilization system among many low utilization systems, teaming with a system with ten-gigabit or greater will probably be sufficient. Also, keep in mind that you are likely to encounter systems with 100 Mb adapters and possibly even 10 Mb adapters. Their usage will change once placed on a gigabit-capable system.
Disk measurements
Disk space consumption is usually the largest concern, but you won't use Performance Monitor for that. The next things you'll want to determine are how much data your disks are moving and how many read/write requests they are performing. These two metrics are not necessarily tied together; it is certainly possible to have a high number of read or write requests that move very little data.
If disk activity seems very high, first ensure that it is not due to memory paging. Heavy paging loads place the burden on the disk.
% Disk Time isn't normally useful by itself. If it is a very low number, then you probably don't need to spend a lot of time digging into the other metrics. The Avg Disk Queue Length metric tells you if your disk system is able to keep up with the load. Usually, if it is consistently above two, there is a problem that needs to be addressed; in many cases, it is actually a sign of insufficient memory. The remaining two, Disk Bytes / sec and Disk Transfers / sec, tell you how much data is being moved and how many moves are occurring, respectively. The latter is a rough estimate of input/output per second (IOPS).
Translating disk numbers from disparate systems into a single set of numbers to size shared storage is extremely difficult. Disk activity usually occurs in spikes, and it is very rare for multiple systems to spike simultaneously. Do not simply add up all the transfers per second and bytes per second and attempt to purchase a system that meets it; such a system may be prohibitively expensive and it will almost always be idle. If you are considering a SAN, many vendors have tools that can help you size for this purpose. Be aware that even the best of those tools are factoring in some guesses.
Processor measurements
The processor measurements are more straightforward but potentially misleading. Sustained high CPU time is what you want to watch for. A high load does not indicate that assigning additional virtual CPUs will be beneficial, but it does indicate that a virtualized instance will probably consume a noticeable amount of CPU time. A high number of active threads may indicate that the system may benefit from more vCPUs when virtualized. Not all applications benefit from having multiple virtual CPUs available; some perform best with only one. You may need to consult with your application vendor's support team or with other organizations who use the product to learn how any given software will behave.
Some applications lean the other way, benefiting not only from additional vCPUs, but from Hyper-V's NUMA-awareness to keep a virtual machine's memory near the physical CPUs that are accessing it. NUMA will be explained in Chapter 7, Memory Planning and Management.