We managed a whole lot of hyper-v servers for our clients, including large clusters but also smaller single server solutions. This makes it difficult to make sure that everyone creates VM’s as they should, and sometimes mistakes are made by engineers or backup software that cause a checkpoint to be left on a production server.
To make sure we don’t get problems along the way we use the following monitoring sets.
Monitoring checkpoints, Snapshots, AVHD’s
We monitor each VM for running checkpoints and snapshots by running the following script. This checks if a snapshot is older than 24 hours and creates an alert based on this. If no snapshots are found it reports that the snapshot state is healthy
1
2
3
4
5
|
$snapshots = Get-VM | Get-VMSnapshot | Where-Object {$_.CreationTime -lt (Get-Date).AddDays(-1) }
foreach($Snapshot in $snapshots){
$SnapshotState += "A snapshot has been found for VM $($snapshot.vmname). The snapshot has been created at $($snapshot.CreationTime) `n"
}
if(!$SnapshotState) { $snapshotstate = "Healthy"}
|
we used this monitoring set for a while, but then found that we had some servers that got restored from a backup that did not have a snapshot available, but did run on an AVHX. That can cause issues as the AVHDX can grow without you noticing as it doesn’t have a complete snapshot available. To also monitor AVHDX’s we’re using the following set
1
2
3
4
5
|
$VHDs = Get-VM | Get-VMHardDiskDrive
foreach($VHD in $VHDs){
if($vhd.path -match "avhd"){ $AVHD += "$($VHD.VMName) is running on AVHD: $($VHD.path) `n"}
}
if(!$AVHD){ $AVHD = "Healthy" }
|
Version of Integration services
Monitoring Integration services on older version of hyper-v, or migrated versions is quite important as the hyper-v integration services also provider driver interfaces to the client VM’s. To solve this we use the following monitoring script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|
$VMMS = gwmi -namespace root\virtualization\v2 Msvm_VirtualSystemManagementService
# 1 == VM friendly name. 123 == Integration State
$RequestedSummaryInformationArray = 1,123
$vmSummaryInformationArray = $VMMS.GetSummaryInformation($null, $RequestedSummaryInformationArray).SummaryInformation
$outputArray = @()
foreach ($vmSummaryInformation in [array] $vmSummaryInformationArray)
{
switch ($vmSummaryInformation.IntegrationServicesVersionState)
{
1 {$vmIntegrationServicesVersionState = "Up-to-date"}
2 {$vmIntegrationServicesVersionState = "Version Mismatch"}
default {$vmIntegrationServicesVersionState = "Unknown"}
}
$vmIntegrationServicesVersion = (get-vm $vmSummaryInformation.ElementName).IntegrationServicesVersion
if ($vmIntegrationServicesVersion -eq $null) {$vmIntegrationServicesVersion = "Unknown"}
$output = new-object psobject
$output | add-member noteproperty "VM Name" $vmSummaryInformation.ElementName
$output | add-member noteproperty "Integration Services Version" $vmIntegrationServicesVersion
$output | add-member noteproperty "Integration Services State" $vmIntegrationServicesVersionState
# Add the PSObject to the output Array
$outputArray += $output
}
foreach ($VM in $outputArray){
if ($VM.'Integration Services State' -contains "Version Mismatch"){
$ISState += "$($VM.'VM Name') Integration Services state is $($VM.'Integration Services State')`n"
}}
if(!$IIState){ $IIState = "Healthy" }
|
NUMA spanning
The next script is made to monitor the NUMA span of virtual machine. You might notice a decrease in performance when your NUMA spanning incorrect, not just in assigned memory but a general performance degradation of up to 80%. For more information, you can check this link and this link.
1
2
3
4
5
6
7
8
9
|
$VMs = Get-VM
foreach ($VM in $VMs){
$GetvCPUCount = Get-VM -Name $VM.Name | select Name,NumaAligned,ProcessorCount,NumaNodesCount,NumaSocketCount
$CPU = Get-WmiObject Win32_Processor
$totalCPU = $CPU.numberoflogicalprocessors[0]*$CPU.count
if ($GetvCPUCount.NumaAligned -eq $False){
$vCPUoutput += "NUMA not aligned for; $($VM.Name). vCPU assigned: $($GetvCPUCount.ProcessorCount) of $totalCPU available`n"
}}
if(!$vCPUOutput){ $vCPUOutput = "Healthy" }
|