Hi All,
As I’ve explained in my previous the series is taking a bit of a turn here and we’re going to start some blogs about remediation instead of just monitoring. I’ll link back to a previous blog and will explain how we automatically react to these issues within our RMM, if you do not have an RMM – Don’t worry! We’ll include the monitoring + remediation script so you can combine the scripts any way you’d like.
The first monitoring and remediation we’re getting on is Hyper-V replication, where we will try to resolve the replication if the server was unreachable for some time, force a resync, and clear the statistics.
We’re getting started on monitoring the current Hyper-V Replication state, to do this we use the following small script:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
$VMList = Get-VMReplication -ErrorAction SilentlyContinue
ForEach ($VM in $VMList) {
if ($VM.health -eq "Critical") { $critical = $critical + 1 }
if ($VM.health -eq "Warning") { $warning = $warning + 1 }
if ($VM.state -ne "Replicating"){ $Critical = $Critical + 1 }
}
if($critical -gt 0){$Critical = "unhealthy"}
if($warning -gt 0){$warning = "unhealthy"}
if(!$critical ){ $Critical = "Healthy"}
if(!$warning ){ $warning = "Healthy"}
|
Now this script gives us the result we need to start our automatic remediation path; We know that VM’s are not replicating if we have a state of $Critical = unhealthy, and we know that we have Replication Warnings (Most likely related to statistics) when $warning is set to unhealthy. At this point in time we’ll get a fork in the road and some options; do we want our monitoring script to also be our remediation script, or do we want separate scripts that are launched based on the condition of the replication?
If you want the former, the script only needs a couple of extra lines, but monitors AND resolves the issues immediately when found, I do not like this approach very much as it can give you skewed results and you do not control when/how the remediation runs.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
$VMList = Get-VMReplication -ErrorAction SilentlyContinue
ForEach ($VM in $VMList) {
if ($VM.state -match "WaitingForStartResynchronize"){ Resume-VMReplication -VMName $VM.Name -Resynchronize -Verbose }
if ($VM.health -eq "Critical") { $critical = $critical + 1 }
if ($VM.health -eq "Warning") { $warning = $warning + 1 }
if ($VM.state -ne "Replicating"){ $Critical = $Critical + 1 }
}
if($critical -gt 0){$Critical = "unhealthy"}
if($warning -gt 0){$warning = "unhealthy"}
if(!$critical ){ $Critical = "Healthy"}
if(!$warning ){ $warning = "Healthy"}
ForEach ($State in $ReplState)
{
$Name = $State.Name
$CurrentState = $State.State
}
}
|
After running this script you will most likely notice that it’s still giving you a warning, you can decide to wait for the replication to run long enough to resolve this on its own, or you can decide to reset the statistics right before self-healing:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
|
$VMList = Get-VMReplication -ErrorAction SilentlyContinue
ForEach ($VM in $VMList) {
write-host $VM.name is $VM.health
if ($VM.state -match "WaitingForStartResynchronize"){
Reset-VMReplicationStatistics -VMName $Name
Resume-VMReplication -VMName $Name -Resynchronize -Verbose
}
if ($VM.health -eq "Critical") { $critical = $critical + 1 }
if ($VM.health -eq "Warning") { $warning = $warning + 1 }
if ($VM.state -ne "Replicating"){ $Critical = $Critical + 1 }
if($critical -gt 0){$Critical = "unhealthy"}
if($warning -gt 0){$warning = "unhealthy"}
if(!$critical ){ $Critical = "Healthy"}
if(!$warning ){ $warning = "Healthy"}
ForEach ($State in $ReplState)
{
$Name = $State.Name
$CurrentState = $State.State
}
}
|
If you decided to split the scripts as per my advice, you can use the first script in this blog post to use as your monitoring set, and the following script as your automated resolution script, remember that this is only the resolution in case the replication failed due to the other side being unreachable, and does not resolve certificate errors or other errors.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
$ReplState = Get-VMReplication
ForEach ($State in $ReplState) {
$ping = Test-Connection -computername $State.ReplicaServerName
if (-Not $ping ) { write-host "Cannot Ping $($state.ReplicaServerName)" }
else {
$Name = $State.Name
$CurrentState = $State.State
if ($State -match "WaitingForStartResynchronize"){
Reset-VMReplicationStatistics -VMName $State.Name
Resume-VMReplication -VMName $State.Name -Resynchronize -Verbose
}
}
}
|
And that’s it! Have fun ?