Featured image of post Blog Series: Monitoring using PowerShell: Part Eight – Monitoring health with PowerShell

Blog Series: Monitoring using PowerShell: Part Eight – Monitoring health with PowerShell

Hi All,

My next couple of blogs will be a series of blogs where I will be explaining on how to use PowerShell for the monitoring of critical infrastructure. I will be releasing a blog every day that will touch on how to monitor specific software components, but also network devices from Ubiquity, third-party API’s and Office365. I will also be showing how you can integrate this monitoring in current RMM packages such as Solarwinds N-Central, Solarwinds RMM MSP and even include the required files to import the monitoring set directly into your system.

Requirements:

  • PowerShell v3 or higher

Creating the monitoring sets:

This blog will contain multiple monitoring sets for checking the health state of the OS. The first sets will leverage the eventviewer as we’ve done in previous blogs but only to check the state of an unclean shutdown or BSOD. The second part we will check the actual performance of the Disk IOPS and CPU by using the cooked value that is presented by the windows performances countes. As always I’ll supply some RMM packages at the bottom of the blog. For performance monitoring I do want you to understand that most RMM packages already have this intergrated and this is just an alternative method.

Monitoring BSOD’s and unclean Shutdowns

BSODs and unclean shutdowns aren’t really presented to the OS in any other way than event log entries. So to get this information we try getting the related logs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$EventShutdown = get-eventlog -LogName System | Where-Object {$_.EventID -eq 6008 -AND $_.timegenerated -gt (get-date).adddays(-7)}| select message
$EventBSOD = get-eventlog -LogName System | Where-Object {$_.EventID -eq 1005 -AND $_.timegenerated -gt (get-date).adddays(-7)}| select message


if($EventShutdown.count -ge 1){
$CleanShutdown = "Unclean shutdown has occured in the past 7 days."
}

if($EventBSOD.count -ge 1){
$CleanBSOD = "BSOD shutdown has occured in the past 7 days. "
}

Pretty simple script, but effective 🙂

Performance monitoring with PowerShell:
For this script we’re grabbing windows counters using get-counter, get-counter presents raw values by default, Raw values aren’t giving us alot of use so we’re requesting the subset that contains the raw values, mixed with the second values. These 2 factors combined give us the cooked value, the cooked value is human readable instead of just a random string of numbers.

When using this type of monitoring I advise to have a very low polling ratio, e.g. have the script run every 1 minute to make sure you capture spikey traffic that is sometimes associated with performance issues.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$DiskQueueLength = (get-counter -counter "\PhysicalDisk(*)\Current Disk Queue Length").countersamples.cookedvalue

foreach($Disk in $DiskQueueLength){
if($Disk -le "5"){
$DiskQueueHealth = "Healthy"
} else {
$DiskQueueHealth +="The Disk Queue Length was $Disk for 30 minutes"
}

}

Monitoring CPU Health

So to monitor CPU health we have a couple of options; of course monitoring the precentage used is very important, but we’re also going to monitor the CPU-queue, which are commands queued by the CPU. When this raises you’ll notice those terrible slowdowns and even the dreaded “Not responding” pop-ups.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
$CPUQueueLenth = (get-counter -counter "\System\Processor Queue Length").countersamples.cookedvalue
$CPUUserTime = (get-counter -counter "\Processor(*)\% User Time").countersamples.cookedvalue
$CPUPrivTime = (get-counter -counter "\Processor(*)\% Privileged Time").countersamples.cookedvalue
if($CPUQueueLenth -le "20"){
$CPUQueueHealth = "Healthy"
} else {
$CPUQueueHealth ="The CPU Queue Length was $CPUQueueLenth for 30 minutes"
}
if($CPUUSerTime-le "98"){
$CPUUserTimeHealth = "Healthy"
} else {
$CPUUserTimeHealth ="The CPU User Time was $CPUUserTime for 30 minutes"
}
if($CPUPrivTime-le "98"){
$CPUPrivTimeHealth = "Healthy"
} else {
$CPUPrivTimeHealth ="The CPU User Time was $CPUUserTime for 30 minutes"
}

So, that’s the end of this series. I’ve actually decided to make this more of a regular thing and join me again next month for a new PowerShell monitoring series where I will be touching on reporting, auto-recovery of issues without human intervention.

Downloads for RMM packages:

N-Central 11.0+ – BSOD Monitoring

N-Central 11.0+ – DiskIO Monitoring

All blogs are posted under AGPL3.0 unless stated otherwise
comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy