PWN fan control in Linux, using Supermicro motherboards
New fans, new problem
My NAS uses a Supermicro X11SSH-F motherboard, which I love if for no other reason than it has IPMI.
One of my 120mm case fans started to die, so since I was opening the case, and all of the fans were the same age (~8 years old), I decided to replace all of the fans.
I bought some Noctua NF-P12 redux-1700 PWM 120mm and Noctua AF-A8 PWN 80mm fans and easily swapped them out.
When I booted up my NAS, the fans were so loud, it sounded like a plane getting ready to take off.
After logging into the IPMI interface, I switched the Fan Control from Full to Optimal.
The fans quickly revved down, but almost as quickly, revved back up....rinse/repeat.
Looking in the the ipmi logs, I saw this:
sudo ipmitool sel list
bf | 09/22/2021 | 19:43:05 | Fan #0x44 | Lower Critical going low | Asserted
c0 | 09/22/2021 | 19:43:05 | Fan #0x44 | Lower Non-recoverable going low | Asserted
Which means the fan was going below the Lower Non-recoverable (lnr) speed, and the assertion would make the motherboard spin the fans up to 100%.
Found out that this was due to the fan threshold settings on the motherboard:
sudo ipmitool sensor | grep FAN # I manually added the header information
Name | Speed | | Status| lnr | lcr | lnc | unc | ucr | unr
FAN1 | 2200.000 | RPM | ok | 300.000 | 500.000 | 700.000 | 25300.000 | 25400.000 | 25500.000
FAN2 | na | | na | na | na | na | na | na | na
FAN3 | 1800.000 | RPM | ok | 100.000 | 200.000 | 300.000 | 25300.000 | 25400.000 | 25500.000
FAN4 | 1800.000 | RPM | ok | 300.000 | 500.000 | 700.000 | 25300.000 | 25400.000 | 25500.000
FANA | 1800.000 | RPM | ok | 200.000 | 300.000 | 400.000 | 25300.000 | 25400.000 | 25500.000
Threshold acronyms:
Acronym | Meaning |
---|---|
lnc | Lower Non-critical |
lnr | Lower Non-recoverable |
lcr | Lower critical |
unc | Upper Non-critical |
unr | Upper Non-Recoverable |
ucr | Upper Critical |
Attempted solution via motherboard control
I tried setting these values to really low (and high) values, but then the motherboard wouldn't keep the CPU cool enough when it did heat up:
sudo stress --cpu 8 --io 3 --vm 4 --vm-bytes 512M --timeout 120s
stress: info: [3544116] dispatching hogs: 8 cpu, 3 io, 4 vm, 0 hdd
sudo ipmitool sensor
CPU Temp | 59.000 | degrees C | ok | 0.000 | 0.000 | 0.000 | 95.000 | 100.000 | 100.000
The CPU fan was spinning at about 700 RPMs at 59 degress, so letting the motherboard control the fan speed wasn't going to work.
Possible pre-build Linux solution
In Ubuntu, there is the fancontrol
software that can be used in Ubuntu, but that requires using pwnconfig
Supermicro and pwnconfig
don't like each other:
sudo pwnconfig
/usr/sbin/pwmconfig: There are no pwm-capable sensor modules installed
Eventual scripted solution
Thankfully, there is the ipmitool
command in Linux, which can send a raw command to the PWN fans to control their speed.
With the ipmitool
, it would be easy enough to script the fan control, based on the temp of the CPU/chassis/hard drive.
Coming from TrueNAS, I remembered seeing heavily-modified, and well-commented script by Kevin Horton, so I used that, and modified it for my needs, along with moving from the BSD-specific commands to commands that would work in Ubuntu/Linux.
I needed to change some of the location of the variables, and change the logic to get the hard drives. The original script used the command camcontrol
, but switched that to lsblk
, as in Linux, camcontrol doesn't provide the same details as BSD.
If not installed, you'll need to install the smartmontools package, in order to use smartctl
.
sudo apt install smartmontools
The modified script (see link in the References section for the original script):
Expand the 'Fan control Perl script' below to get the entire, lengthy, script.
Fan control Perl script
#!/usr/bin/perl
# Modified version of this script:
# https://github.com/khorton/nas_fan_control/blob/master/PID_fan_control.pl
#
# This script is based on the hybrid fan controller script created by @Stux, and posted at:
# https://forums.freenas.org/index.php?threads/script-hybrid-cpu-hd-fan-zone-controller.46159/
# The significant changes from @Stux's script are:
# 1. Replace HD fan control of several discrete duty cycles as a function of hottest HD temperature with a PID controller
# which controls duty cycle in 1% steps as a function of average HD temperature. As a protection, if any HD temperature
# exceeds a specified value, the HD fans are commanded to 100% duty cycle. This covers cases where one HD may be running
# hot, even if the average HD temperature is acceptable, or the PID loop control has gone awry.
# 2. Add optional setting to command CPU fans to 100% duty cycle if needed to assist with HD cooling, to cover scenarios
# where the CPU fan zone also controls chassis exit fans.
# 3. Add optional log of HD fan temperatures, PID loop values and commanded fan duty cycles. The log may optionally contain
# a record of each HD temperature, or only the coolest and warmest HD temperatures.
# 4. Added ability to specify the number of warmest disks to use when calculating the average temperature.
# 5. Added ability to put certain configuration values in a configuration file that is checked each time around the control loop.
# This script can be downloaded from :
# https://forums.freenas.org/index.php?threads/pid-fan-controller-perl-script.50908/
###############################################################################################
# This script is designed to control both the CPU and HD fans in a Supermicro X10 based system according to both
# the CPU and HD temperatures in order to minimize noise while providing sufficient cooling to deal with scrubs
# and CPU torture tests. It may work in X9 based systems, but this has not been tested. It has been found to work on at least
# the X11SSM-F.
# It relies on the motherboard having two fan zones, FAN1..FAN4 and FANA..FANC.
# To use this correctly, you should connect all your PWM HD fans, by splitters if necessary to the FANA..FANC headers, or to
# the numbered FAN1..FAN4 headers. The CPU, case and exhaust fans should then be connected to the other headers. This script
# will then control the HD fans in response to the HD temp, and the other fans in response to CPU temperature. When CPU
# temperature is high the HD fans will be used to provide additional cooling, if you specify cpu/hd shared cooling.
# If the fans should be high, and they are stuck low, or vice-versa, the BMC will be rebooted, thus it is critical to set the
# cpu/hd_max_fan_speed variables correctly.
# NOTE: It is highly likely the "get_hd_temp" function will not work as-is with your HDs. Until a better solution is provided
# you will need to modify this function to properly acquire the temperature. Setting debug=2 will help.
# Tested with a SuperMicro X10SRH-cF, Xeon E5-1650v4, Noctua 120, 90 and 80mm fans in a Norco RPC-4224 4U chassis, with 16 x 4 TB WD Red drives.
# More information on CPU/Peripheral Zone can be found in this post:
# https://forums.freenas.org/index.php?threads/thermal-and-accoustical-design-validation.28364/
# stux (+ editorial changes on Fan Zones from Kevin Horton)
###############################################################################################
# The IPMI fan lower and upper fan speed thresholds must be adjusted to be compatible with the fans used. Do not rely
# completely on manufacturer specs to determine the slowest and fastest possible fan speeds, as some fans have been found
# to run at speeds that differ somewhat from the official specs. See:
# https://forums.freenas.org/index.php?resources/how-to-change-ipmi-sensor-thresholds-using-ipmitool.35/
# The following ipmitool commands can be run when connected to the FreeNAS server via ssh. They are useful to set a desired fan duty cycle before
# checking the fan speeds.
# Set duty cycle in Zone 0 to 100%: ipmitool raw 0x30 0x70 0x66 0x01 0x00 100
# Set duty cycle in Zone 0 to 50%: ipmitool raw 0x30 0x70 0x66 0x01 0x00 50
# Set duty cycle in Zone 0 to 20%: ipmitool raw 0x30 0x70 0x66 0x01 0x00 20
# Set duty cycle in Zone 1 to 100%: ipmitool raw 0x30 0x70 0x66 0x01 0x01 100
# Set duty cycle in Zone 1 to 50%: ipmitool raw 0x30 0x70 0x66 0x01 0x01 50
# Set duty cycle in Zone 1 to 20%: ipmitool raw 0x30 0x70 0x66 0x01 0x01 20
# Check duty cycle in Zone 0: ipmitool raw 0x30 0x70 0x66 0x00 0x00
# result is hex, with 64 being 100% duty cycle. 32 is 50% duty cycle. 14 is 20% duty cycle.
# Check duty cycle in Zone 1: ipmitool raw 0x30 0x70 0x66 0x00 0x01
# result is hex, with 64 being 100% duty cycle. 32 is 50% duty cycle. 14 is 20% duty cycle.
# Check fan speeds using: ipmitool sdr
# Number of warmest disks to average
# Originally, the script would calculate an average temperature for all disks, and vary fan speed as required to achieve the target
# temperature. Later, the option was added to have the script only worry about the warmest X disks, and use the average of those
# disks as the target. This better accomadated the common situation where there are several disks that run several degrees warmer
# than the others, and it is desired to keep those warm disks from exceeding a specified temperature.
# If desired, certain settings may be defined in a configuration file that can be changed on the fly, while the script is running.
# The script will check the latest modification time of the config file each time it determines the new fan duty cycle, and reload
# the configuration data if it has changed. This is useful when testing the script, as the PID control gains, average disk target
# temperature and number of warmest disk temperatures to average
# Kevin Horton
###############################################################################################
# VERSION HISTORY
#####################
# 2016-09-19 Initial Version
# 2016-09-19 Added cpu_hd_override_temp, to prevent HD fans cycling when CPU fans are sufficient for cooling CPU
# 2016-09-26 hd_list is now refreshed before checking HD temps so that we start/stop monitoring devices that
# have been hot inserted/removed.
# "Drives are warm, going to 75%" log message was missing an unless clause causing it to print
# every time
# 2016-10-07 Replaced get_cpu_temp() function with get_cpu_temp_sysctl() which queries the kernel, instead of
# IPMI. This is faster, more accurate and more compatible, hopefully allowing this to work on X9
# systems. The original function is still present and is now called get_cpu_temp_ipmi().
# Because this is a much faster method of reading the temps, and because its actually the max core
# temp, I found that the previous cpu_hd_override_temp of 60 was too sensitive and caused the override
# too often. I've bumped it up to 62, which on my system seems good. This means that if a core gets to
# 62C the HD fans will kick in, and this will generally bring temps back down to around 60C... depending
# on the actual load. Your results will vary, and for best results you should tune controller with
# mprime testing at various thread levels. Updated the cpu threasholds to 35/45/55 because of the improved
# responsiveness of the get_cpu_temp function
#
# The following changes are by Kevin Horton
# 2017-01-14 Reworked get_hd_list() to exclude SSDs
# Added function to calculate maximum and average HD temperatures.
# Replaced original HD fan control scheme with a PID controller, controlling the average HD temp..
# Added safety override if any HD reaches a specified max temperature. If so, the PID loop is overridden,
# and HD fans are set to 100%
# Retain float value of fan duty cycle between loop cycles, so that small duty cycle corrections
# accumulate and eventually push the duty cycle to the next integer value.
# 2017-01-18 Added log file
# 2017-01-21 Refactored code to bump up CPU fan to help cool HD. Drop the variabe CPU duty cycle, and just set to High,
# Added log file option without temps for every HD.
# 2017-01-29 Add header to log file every X hours
#
# 2018-08-24 v1.0 Version optimized for 1500 rpm Noctua NF-F12 fans
#
# 2018-08-25 Revised gains and thresholds for 3000 rpm Noctua NF-F12 iPPC fans
# Added 10s pause before checking fan speed, to allow time for fans to respond to latest gain change
#
# 2018-09-17 Revised HD temp average to only look at warmest X disks.
#
# 2018-09-27 Use config file to determine number of warmest disks to average, PID gains and target average temperature.
# The config file may be revised while the script is running, and the updated values will be read into the script
# each time around the control loop.
#
# 2020-01-01 Merged options for selectable number of disks to average and certain settings in config file to
# Master branch
#
# TO DO
# Do not change fan speed due to calculated Tave changes when switching config scripts
###############################################################################################
## CONFIGURATION
################
##CONFIG FILE
## Read following config file at start and every X minutes to determine number of warmest disks to average,
## target average temperature and PID gains. If file is not available, or corrupt, use defaults specified
## in this script.
$config_file = './PID_fan_control_config.ini';
##DEFAULT VALUES
## Use the values declared below if the config file is not present
$hd_ave_target = 38; # PID control loop will target this average temperature for the warmest N disks
$Kp = 16/3; # PID control loop proportional gain
$Ki = 0; # PID control loop integral gain
$Kd = 24; # PID control loop derivative gain
$hd_num_peak = 2; # Number of warmest HDs to use when calculating average temp
$hd_fan_duty_start = 60; # HD fan duty cycle when script starts
## DEBUG LEVEL
## 0 means no debugging. 1,2,3,4 provide more verbosity
## You should run this script in at least level 1 to verify its working correctly on your system
$debug = 1;
$debug_log = '/root/Debug_PID_fan_control.log';
## LOG
$log = '/root/PID_fan_control.log';
$log_temp_summary_only = 0; # 1 if not logging individual HD temperatures. 0 if logging temp of each HD
$log_header_hourly_interval = 2; # number of hours between log headers. Valid options are 1, 2, 3, 4, 6 & 12.
# log headers will always appear at the start of a log, at midnight and any
# time the list of HDs changes (if individual HD temperatures are logged)
## CPU THRESHOLD TEMPS
## A modern CPU can heat up from 35C to 60C in a second or two. The fan duty cycle is set based on this
$high_cpu_temp = 55; # will go HIGH when we hit
$med_cpu_temp = 45; # will go MEDIUM when we hit, or drop below again
$low_cpu_temp = 35; # will go LOW when we fall below 35 again
## HD THRESHOLD TEMPS
## HD change temperature slowly.
## This is the temperature that we regard as being uncomfortable. The higher this is the
## more silent your system.
## Note, it is possible for your HDs to go above this... but if your cooling is good, they shouldn't.
# $hd_ave_target = 38.0; # define this value in the DEFAULT VALUES block at top of script
$hd_max_allowed_temp = 40; # celsius. PID control aborts and fans set to 100% duty cycle when a HD hits this temp.
# This ensures that no matter how poorly chosen the PID gains are, or how much of a spread
# there is between the average HD temperature and the maximum HD temperature, the HD fans
# will be set to 100% if any drive reaches this temperature.
## NUMBER OF WARMEST HD TO AVERAGE
# $hd_num_peak = 4; # average the temperatures of this many warmest hard drives when calculating the average disk temperature
## CPU TEMP TO OVERRIDE HD FANS
## when the CPU climbs above this temperature, the HD fans will be overridden
## this prevents the HD fans from spinning up when the CPU fans are capable of providing
## sufficient cooling.
$cpu_hd_override_temp = 65;
## CPU/HD SHARED COOLING
## If your HD fans contribute to the cooling of your CPU you should set this value.
## It will mean when you CPU heats up your HD fans will be turned up to help cool the
## case/cpu. This would only not apply if your HDs and fans are in a separate thermal compartment.
$hd_fans_cool_cpu = 1; # 1 if the hd fans should spin up to cool the cpu, 0 otherwise
## HD FAN DUTY CYCLE TO OVERRIDE CPU FANS
$cpu_fans_cool_hd = 1; # 1 if the CPU fans should spin up to cool the HDs, when needed. 0 otherwise. This may be
# useful if the CPU fan zone also contains chassis exit fans, as an increase in chassis exit
# fan speed may increase the HD cooling air flow.
$hd_cpu_override_duty_cycle = 95; # when the HD duty cycle equals or exceeds this value, the CPU fans may be overridden to help cool HDs
## CPU TEMP CONTROL
$cpu_temp_control = 1; # 1 if the script will control a CPU fan to control CPU temperatures. 0 if the script only controls HD fans.
## PID CONTROL GAINS
## If you were using the spinpid.sh PID control script published by @Glorious1 at the link below, you will need to adjust the value of $Kp
## that you were using, as that script defined Kp in terms of the gain per one cycle around the loop, but this script defines it in terms
## of the gain per minute. Divide the Kp value from the spinpid.sh script by the time in minutes for checking hard drive temperatures.
## For example, if you used a gain of Kp = 8, and a T = 3 (3 minute interval), the new value is $Kp = 8/3.
## Kd values from the spinpid.sh script can be used directly here.
## https://forums.freenas.org/index.php?threads/script-to-control-fan-speed-in-response-to-hard-drive-temperatures.41294/page-4#post-285668
#$Kp = 8/3;
# $Kp = 16/3; # define this value in the DEFAULT VALUES block at top of script
# $Ki = 0; # define this value in the DEFAULT VALUES block at top of script
# $Kd = 96; # define this value in the DEFAULT VALUES block at top of script
#######################
## FAN CONFIGURATION
####################
## FAN SPEEDS
## You need to determine the actual max fan speeds that are achieved by the fans
## Connected to the cpu_fan_header and the hd_fan_header.
## These values are used to verify high/low fan speeds and trigger a BMC reset if necessary.
$cpu_max_fan_speed = 1800;
$hd_max_fan_speed = 3300;
## CPU FAN DUTY LEVELS
## These levels are used to control the CPU fans
$fan_duty_high = 100; # percentage on, ie 100% is full speed.
$fan_duty_med = 60;
$fan_duty_low = 30;
## HD FAN DUTY LEVELS
## These levels are used to control the HD fans
$hd_fan_duty_high = 100; # percentage on, ie 100% is full speed.
$hd_fan_duty_med_high = 80;
$hd_fan_duty_med_low = 50;
$hd_fan_duty_low = 25; # some 120mm fans stall below 30.
$hd_fan_duty_start = 60; # HD fan duty cycle when script starts - defined in config file
## FAN ZONES
# Your CPU/case fans should probably be connected to the main fan sockets, which are in fan zone zero
# Your HD fans should be connected to FANA which is in Zone 1
# You could switch the CPU/HD fans around, as long as you change the zones and fan header configurations.
#
# 0 = FAN1..5
# 1 = FANA..FANC
$cpu_fan_zone = 0;
$hd_fan_zone = 1;
## FAN HEADERS
## these are the fan headers which are used to verify the fan zone is high. FAN1+ are all in Zone 0, FANA is Zone 1.
## cpu_fan_header should be in the cpu_fan_zone
## hd_fan_header should be in the hd_fan_zone
$cpu_fan_header = "FAN1"; # used for printing to standard output for debugging
$hd_fan_header = "FAN3"; # used for printing to standard output for debugging
@hd_fan_list = ("FANA", "FAN4"); # used for logging to file
################
## MISC
#######
## IPMITOOL PATH
## The script needs to know where ipmitool is
$ipmitool = "/usr/bin/ipmitool";
## HD POLLING INTERVAL
## The controller will only poll the harddrives periodically. Since hard drives change temperature slowly
## this is a good thing. 180 seconds is a good value.
$hd_polling_interval = 90; # seconds
## FAN SPEED CHANGE DELAY TIME
## It takes the fans a few seconds to change speeds, we allow a grace before verifying. If we fail the verify
## we'll reset the BMC
$fan_speed_change_delay = 10; # seconds
## BMC REBOOT TIME
## It takes the BMC a number of seconds to reset and start providing sensible output. We'll only
## Reset the BMC if its still providing rubbish after this time.
$bmc_reboot_grace_time = 120; # seconds
## BMC RETRIES BEFORE REBOOTING
## We verify high/low of fans, and if they're not where they should be we reboot the BMC after so many failures
$bmc_fail_threshold = 1; # will retry n times before rebooting
# edit nothing below this line
########################################################################################################################
# GLOBALS
@hd_list = ();
# massage fan speeds
$cpu_max_fan_speed *= 0.8;
$hd_max_fan_speed *= 0.8;
$hd_duty = $hd_fan_duty_start;
#fan/bmc verification globals/timers
$last_fan_level_change_time = 0; # the time when we changed a fan level last
$fan_unreadable_time = 0; # the time when a fan read failure started, 0 if there is none.
$bmc_fail_count = 0; # how many times the fans failed verification in the last period.
#this is the last cpu temp that was read
$last_cpu_temp = 0;
use POSIX qw(strftime);
use Time::Local;
#
# When this program terminates, or is killed, I want the fans returned to 'full', not 'optimal'
#$SIG{INT} = sub { print "\nCaught SIGINT: setting fan mode to optimal\n"; set_fan_mode("optimal"); exit(0); };
#$SIG{TERM} = sub { print "\nCaught SIGTERM: setting fan mode to optimal\n"; set_fan_mode("optimal"); exit(0); };
#$SIG{HUP} = sub { print "\nCaught SIGHUP: setting fan mode to optimal\n"; set_fan_mode("optimal"); exit(0); };
#
$SIG{INT} = sub { print "\nCaught SIGINT: setting fan mode to full\n"; set_fan_mode("full"); exit(0); };
$SIG{TERM} = sub { print "\nCaught SIGTERM: setting fan mode to full\n"; set_fan_mode("full"); exit(0); };
$SIG{HUP} = sub { print "\nCaught SIGHUP: setting fan mode to full\n"; set_fan_mode("full"); exit(0); };
# start the controller
main();
################################################ MAIN
sub main
{
open LOG, ">>", $log or die $!;
open DEBUG_LOG, ">>", $debug_log or die $!;
($hd_ave_target, $Kp, $Ki, $Kd, $hd_num_peak, $hd_fan_duty_start) = read_config();
# Print Log Header
@hd_list = get_hd_list();
print_log_header(@hd_list);
# current time
($sec,$min,$hour,$day,$month,$year,$wday,$yday,$isdst) = localtime(time);
$next_log_hour = ( int( $hour/$log_header_hourly_interval ) + 1 ) * $log_header_hourly_interval;
if ( $next_log_hour >= 24 )
{
# next log time is after midnight. Roll back to previous log time, calcuate Unix epoch seconds, and add required seconds to get next log time
$next_log_hour -= $log_header_hourly_interval;
$next_log_time = timelocal(0,0,$next_log_hour,$day,$month,$year) + 3600 * $log_header_hourly_interval;
}
else
{
# next log time in seconds past Unix epoch
$next_log_time = timelocal(0,0,$next_log_hour,$day,$month,$year);
}
# need to go to Full mode so we have unfettered control of Fans
set_fan_mode("full");
my $cpu_fan_level = "";
my $old_cpu_fan_level = "";
my $override_hd_fan_level = 0;
my $last_hd_check_time = 0;
$temp_error = 0;
my $integral = 0;
$cpu_fan_override = 0;
$hd_fan_duty = $hd_fan_duty_start;
($hd_min_temp, $hd_max_temp, $hd_ave_temp_old, @hd_temps) = get_hd_temps();
($hd_ave_target, $Kp, $Ki, $Kd, $hd_num_peak, $hd_fan_duty_start, $config_time) = read_config();
while()
{
if ($cpu_temp_control)
{
$old_cpu_fan_level = $cpu_fan_level;
$cpu_fan_level = control_cpu_fan( $old_cpu_fan_level );
if( $old_cpu_fan_level ne $cpu_fan_level )
{
$last_fan_level_change_time = time;
}
if( $cpu_fan_level eq "high" )
{
if( $hd_fans_cool_cpu && !$override_hd_fan_level && ($last_cpu_temp >= $cpu_hd_override_temp || $last_cpu_temp == 0) )
{
#override hd fan zone level, once we override we won't backoff until the cpu drops to below "high"
$override_hd_fan_level = 1;
dprint( 0, "CPU Temp: $last_cpu_temp >= $cpu_hd_override_temp, Overiding HD fan zone to $hd_fan_duty_high%, \n" );
set_fan_zone_duty_cycle( $hd_fan_zone, $hd_fan_duty_high );
$last_fan_level_change_time = time;
}
}
elsif( $override_hd_fan_level )
{
#restore hd fan zone level;
$override_hd_fan_level = 0;
dprint( 0, "Restoring HD fan zone to $hd_fan_duty%\n" );
set_fan_zone_duty_cycle( $hd_fan_zone, $hd_fan_duty );
$last_fan_level_change_time = time;
}
}
# periodically determine hd fan zone level
my $check_time = time;
if( $check_time - $last_hd_check_time >= $hd_polling_interval )
{
$last_hd_check_time = $check_time;
@last_hd_list = @hd_list;
# check to see if config file has been updated. If so, update the config values and print a new log header
$config_time_new = (stat($config_file))[9];
if ($config_time_new > $config_time)
{
($hd_ave_target, $Kp, $Ki, $Kd, $hd_num_peak, $hd_fan_duty_start, $config_time) = read_config();
print_log_header(@hd_list);
}
# we refresh the hd_list from camcontrol devlist
# everytime because if you're adding/removing HDs we want
# starting checking their temps too!
@hd_list = get_hd_list();
($hd_min_temp, $hd_max_temp, $hd_ave_temp, @hd_temps) = get_hd_temps();
$hd_fan_duty_old = $hd_fan_duty;
$hd_fan_duty = calculate_hd_fan_duty_cycle_PID( $hd_max_temp, $hd_ave_temp, $hd_fan_duty );
if( !$override_hd_fan_level )
{
set_fan_zone_duty_cycle( $hd_fan_zone, $hd_fan_duty );
$last_fan_level_change_time = time; # this resets every time, but it shouldn't matter since hd_polling_interval is large.
}
# print to log
if (@hd_list != @last_hd_list && $log_temp_summary_only == 0)
{
# print new disk iD header if it has changed (e.g. hot swap insert or remove)
@hd_list = print_log_header(@hd_list);
}
elsif ( $check_time >= $next_log_time )
{
# time to print a new log header
@hd_list = print_log_header(@hd_list);
$next_log_time += 3600 * $log_header_hourly_interval;
}
my $timestring = build_time_string();
# ($hd_min_temp, $hd_max_temp, $hd_ave_temp, @hd_temps) = get_hd_temps();
print LOG "$timestring";
if ($log_temp_summary_only)
{
printf(LOG " %2i", 0+@hd_list); # number of HDs, so it can be seen if a hot swap addition or removal was detected
printf(LOG " %2i", $hd_min_temp);
}
else
{
foreach my $item (@hd_temps)
{
printf(LOG "%5s", $item);
}
}
printf(LOG " ^%2i", $hd_max_temp);
printf(LOG "%7.2f", $hd_ave_temp);
printf(LOG "%6.2f", $hd_ave_temp - $hd_ave_target);
$hd_fan_mode = get_fan_mode();
printf(LOG "%6s", $hd_fan_mode);
sleep 10; # pause 10s to allow fans to change speed after setting it
$ave_fan_speed = get_fan_ave_speed(@hd_fan_list);
printf(LOG "%6s", $ave_fan_speed);
printf(LOG "%4i/%-3i", $hd_fan_duty_old, $hd_fan_duty);
$cput = get_cpu_temp_ipmi();
printf(LOG "%4i %6.2f %6.2f %6.2f %6.2f%\n", $cput, $P, $I, $D, $hd_duty);
}
# verify_fan_speed_levels function is fairly complicated
if ($cpu_temp_control)
{
verify_fan_speed_levels( $cpu_fan_level, $override_hd_fan_level ? $hd_fan_duty_high : $hd_fan_duty );
}
else
{
verify_fan_speed_levels2( $hd_fan_duty );
}
# if ($cpu_temp_control)
# {
# # CPU temps can go from cool to hot in 2 seconds! so we only ever sleep for 1 second.
# sleep 1;
# }
# else
# {
# sleep $hd_polling_interval - 1;
# }
# CPU temps can go from cool to hot in 2 seconds! so we only ever sleep for 1 second.
sleep 1;
} # inf loop
}
################################################# SUBS
sub get_hd_list
{
#my $disk_list = `camcontrol devlist | grep -v "SSD" | grep -v "Verbatim" | grep -v "Kingston" | grep -v "Elements" | sed 's:.*(::;s:).*::;s:,pass[0-9]*::;s:pass[0-9]*,::' | egrep '^[a]*da[0-9]+\$' | tr '\012' ' '`;
#
# In Linux, `lsblk -o NAME,ROTA` will provide the drive name and rotation (0 for SSD)
# Filtering on s (SATA) and a (ATA) _should_ provide us any type of spinning drive
#
my $disk_list = `/usr/bin/lsblk -o NAME,ROTA | egrep "^s|^a" | grep "1\$" | awk '{ print $1 }' | tr '\012' ' '`;
dprint(3,"$disk_list\n");
my @vals = split(" ", $disk_list);
foreach my $item (@vals)
{
dprint(2,"$item\n");
}
return @vals;
}
sub get_hd_temp
{
my $max_temp = 0;
foreach my $item (@hd_list)
{
my $disk_dev = "/dev/$item";
my $command = "/usr/sbin/smartctl -A $disk_dev | grep Temperature_Celsius";
dprint( 3, "$command\n" );
my $output = `$command`;
dprint( 2, "$output");
my @vals = split(" ", $output);
# grab 10th item from the output, which is the hard drive temperature (on Seagate NAS HDs)
my $temp = "$vals[9]";
chomp $temp;
if( $temp )
{
dprint( 1, "$disk_dev: $temp\n");
$max_temp = $temp if $temp > $max_temp;
}
}
dprint(0, "Maximum HD Temperature: $max_temp\n");
return $max_temp;
}
sub get_hd_temps
# return minimum, maximum, average HD temperatures and array of individual temps
{
my $max_temp = 0;
my $min_temp = 1000;
my $temp_sum = 0;
my $HD_count = 0;
my @temp_list = ();
foreach my $item (@hd_list)
{
my $disk_dev = "/dev/$item";
my $command = "/usr/sbin/smartctl -A $disk_dev | grep Temperature_Celsius";
my $output = `$command`;
my @vals = split(" ", $output);
# grab 10th item from the output, which is the hard drive temperature (on Seagate NAS HDs)
my $temp = "$vals[9]";
chomp $temp;
if( $temp )
{
push(@temp_list, $temp);
$temp_sum += $temp;
$HD_count +=1;
$max_temp = $temp if $temp > $max_temp;
$min_temp = $temp if $temp < $min_temp;
}
}
my @temps_sorted = sort { $a <=> $b } @temp_list;
$temp_sum = 0;
for (my $n = $hd_num_peak; $n > 0; $n = $n -1) {
$temp_sum += pop(@temps_sorted);
}
my $ave_temp = $temp_sum / $hd_num_peak;
return ($min_temp, $max_temp, $ave_temp, @temp_list);
}
sub verify_fan_speed_levels
{
my( $cpu_fan_level, $hd_fan_duty ) = @_;
dprint( 4, "verify_fan_speed_levels: cpu_fan_level: $cpu_fan_level, hd_fan_duty: $hd_fan_duty\n");
my $extra_delay_before_next_check = 0;
my $temp_time = time - $last_fan_level_change_time;
dprint( 4, "Time since last verify : $temp_time, last change: $last_fan_level_change_time, delay: $fan_speed_change_delay\n");
if( $temp_time > $fan_speed_change_delay )
{
# we've waited for the speed change to take effect.
my $cpu_fan_speed = get_fan_speed("CPU");
if( $cpu_fan_speed < 0 )
{
dprint(1,"CPU Fan speed unavailable\n" );
$fan_unreadable_time = time if $fan_unreadable_time == 0;
}
my $hd_fan_speed = get_fan_speed("HD");
if( $hd_fan_speed < 0 )
{
dprint(1,"HD Fan speed unavailable\n" );
$fan_unreadable_time = time if $fan_unreadable_time == 0;
}
if( $hd_fan_speed < 0 || $cpu_fan_speed < 0 )
{
# one of the fans couldn't be reliably read
my $temp_time = time - $fan_unreadable_time;
if( $temp_time > $bmc_reboot_grace_time )
{
#we've waited, and we still can't read fan speed.
dprint(0, "Fan speeds are unreadable after $bmc_reboot_grace_time seconds, rebooting BMC\n");
reset_bmc();
$fan_unreadable_time = 0;
}
else
{
dprint(2, "Fan speeds are unreadable after $temp_time seconds, will try again\n");
}
}
else
{
# we have no been able to read the fan speeds
my $cpu_fan_is_wrong = 0;
my $hd_fan_is_wrong = 0;
#verify cpu fans
if( $cpu_fan_level eq "high" && $cpu_fan_speed < $cpu_max_fan_speed )
{
dprint(0, "CPU fan speed should be high, but $cpu_fan_speed < $cpu_max_fan_speed.\n");
$cpu_fan_is_wrong=1;
}
elsif( $cpu_fan_level eq "low" && $cpu_fan_speed > $cpu_max_fan_speed )
{
dprint(0, "CPU fan speed should be low, but $cpu_fan_speed > $cpu_max_fan_speed.\n");
$cpu_fan_is_wrong=1;
}
#verify hd fans
if( $hd_fan_duty >= $hd_fan_duty_high && $hd_fan_speed < $hd_max_fan_speed )
{
dprint(0, "HD fan speed should be high, but $hd_fan_speed < $hd_max_fan_speed.\n");
$hd_fan_is_wrong=1;
}
elsif( $hd_fan_duty <= $hd_fan_duty_low && $hd_fan_speed > $hd_max_fan_speed )
{
dprint(0, "HD fan speed should be low, but $hd_fan_speed > $hd_max_fan_speed.\n");
$hd_fan_is_wrong=1;
}
#verify both fans are good
if( $cpu_fan_is_wrong || $hd_fan_is_wrong )
{
$bmc_fail_count++;
dprint( 3, "bmc_fail_count: $bmc_fail_count, bmc_fail_threshold: $bmc_fail_threshold\n");
if( $bmc_fail_count <= $bmc_fail_threshold )
{
#we'll try setting the fan speeds, and giving it another attempt
dprint(1, "Fan speeds are not where they should be, will try again.\n");
set_fan_mode("full");
set_fan_zone_level( $cpu_fan_zone, $cpu_fan_level );
set_fan_zone_duty_cycle( $hd_fan_zone, $hd_fan_duty );
}
else
{
#time to reset the bmc
dprint(1, "Fan speeds are still not where they should be after $bmc_fail_count attempts, will reboot BMC.\n");
set_fan_mode("full");
reset_bmc();
$bmc_fail_count = 0;
}
}
else
{
#everything is good. We'll sit back for another minute.
dprint( 2, "Verified fan levels, CPU: $cpu_fan_speed, HD: $hd_fan_speed. All good.\n" );
$bmc_fail_count = 0; # we succeeded
$extra_delay_before_next_check = 60 - $fan_speed_change_delay; # lets give it a minute since it was good.
}
#reset our unreadable timer, since we read the fan speeds.
$fan_unreadable_time = 0;
}
#reset our timer, so that we'll wait before checking again.
$last_fan_level_change_time = time + $extra_delay_before_next_check; #another delay before checking please.
}
return;
}
sub verify_fan_speed_levels2
{
my( $hd_fan_duty ) = @_;
dprint( 4, "verify_fan_speed_level: hd_fan_duty: $hd_fan_duty\n");
my $extra_delay_before_next_check = 0;
my $temp_time = time - $last_fan_level_change_time;
dprint( 4, "Time since last verify : $temp_time, last change: $last_fan_level_change_time, delay: $fan_speed_change_delay\n");
if( $temp_time > $fan_speed_change_delay )
{
# we've waited for the speed change to take effect.
my $hd_fan_speed = get_fan_speed("HD");
if( $hd_fan_speed < 0 )
{
dprint(1,"HD Fan speed unavailable\n" );
$fan_unreadable_time = time if $fan_unreadable_time == 0;
}
if( $hd_fan_speed < 0 )
{
# one of the fans couldn't be reliably read
my $temp_time = time - $fan_unreadable_time;
if( $temp_time > $bmc_reboot_grace_time )
{
#we've waited, and we still can't read fan speed.
dprint(0, "Fan speeds are unreadable after $bmc_reboot_grace_time seconds, rebooting BMC\n");
reset_bmc();
$fan_unreadable_time = 0;
}
else
{
dprint(2, "Fan speeds are unreadable after $temp_time seconds, will try again\n");
}
}
else
{
# we have no been able to read the fan speeds
my $hd_fan_is_wrong = 0;
#verify hd fans
if( $hd_fan_duty >= $hd_fan_duty_high && $hd_fan_speed < $hd_max_fan_speed )
{
dprint(0, "HD fan speed should be high, but $hd_fan_speed < $hd_max_fan_speed.\n");
$hd_fan_is_wrong=1;
}
elsif( $hd_fan_duty <= $hd_fan_duty_low && $hd_fan_speed > $hd_max_fan_speed )
{
dprint(0, "HD fan speed should be low, but $hd_fan_speed > $hd_max_fan_speed.\n");
$hd_fan_is_wrong=1;
}
#verify HD fans are good
if( $hd_fan_is_wrong )
{
$bmc_fail_count++;
dprint( 3, "bmc_fail_count: $bmc_fail_count, bmc_fail_threshold: $bmc_fail_threshold\n");
if( $bmc_fail_count <= $bmc_fail_threshold )
{
#we'll try setting the fan speeds, and giving it another attempt
dprint(1, "Fan speeds are not where they should be, will try again.\n");
set_fan_mode("full");
set_fan_zone_duty_cycle( $hd_fan_zone, $hd_fan_duty );
}
else
{
#time to reset the bmc
dprint(1, "Fan speeds are still not where they should be after $bmc_fail_count attempts, will reboot BMC.\n");
set_fan_mode("full");
reset_bmc();
$bmc_fail_count = 0;
}
}
else
{
#everything is good. We'll sit back for another minute.
dprint( 2, "Verified fan levels, HD: $hd_fan_speed. All good.\n" );
$bmc_fail_count = 0; # we succeeded
$extra_delay_before_next_check = 60 - $fan_speed_change_delay; # lets give it a minute since it was good.
}
#reset our unreadable timer, since we read the fan speeds.
$fan_unreadable_time = 0;
}
#reset our timer, so that we'll wait before checking again.
$last_fan_level_change_time = time + $extra_delay_before_next_check; #another delay before checking please.
}
return;
}
# need to pass in last $cpu_fan
sub control_cpu_fan
{
my ($old_cpu_fan_level) = @_;
my $cpu_temp = get_cpu_temp_ipmi();
# my $cpu_temp = get_cpu_temp_sysctl(); # not available on Linux systems
my $cpu_fan_level = decide_cpu_fan_level( $cpu_temp, $old_cpu_fan_level );
if( $old_cpu_fan_level ne $cpu_fan_level )
{
dprint( 1, "CPU Fan changing... ($cpu_fan_level)\n");
set_fan_zone_level( $cpu_fan_zone, $cpu_fan_level );
}
return $cpu_fan_level;
}
sub calculate_hd_fan_duty_cycle_PID
{
my ($hd_max_temp, $hd_ave_temp, $old_hd_duty) = @_;
# my $hd_duty;
my $temp_error_old = $hd_ave_temp_old - $hd_ave_target;
my $temp_error = $hd_ave_temp - $hd_ave_target;
if ($hd_max_temp >= $hd_max_allowed_temp )
{
$hd_duty = $hd_fan_duty_high;
dprint(0, "Drives are too hot, going to $hd_fan_duty_high%\n") unless $old_hd_duty == $hd_duty;
}
elsif ($hd_max_temp >= 0 )
{
my $temp_error = $hd_ave_temp - $hd_ave_target;
$integral += $temp_error * $hd_polling_interval / 60;
my $derivative = ($temp_error - $temp_error_old) * 60 / $hd_polling_interval;
# my $P = $Kp * $temp_error * $hd_polling_interval / 60;
# my $I = $Ki * $integral;
# my $D = $Kd * $derivative;
$P = $Kp * $temp_error * $hd_polling_interval / 60;
$I = $Ki * $integral;
$D = $Kd * $derivative;
# $hd_duty = $old_hd_duty + $P + $I + $D;
$hd_duty = $hd_duty + $P + $I + $D;
if ($hd_duty > $hd_fan_duty_high)
{
$hd_duty = $hd_fan_duty_high;
}
elsif ($hd_duty < $hd_fan_duty_low)
{
$hd_duty = $hd_fan_duty_low;
}
dprint(0, "temperature error = $temp_error\n");
dprint(1, "PID corrections are P = $P, I = $I and D = $D\n");
dprint(0, "PID control new duty cycle is $hd_duty%\n") unless $old_hd_duty == $hd_duty;
}
else
{
$hd_duty = 100;
dprint( 0, "Drive temperature ($hd_temp) invalid. going to 100%\n");
}
$hd_ave_temp_old = $hd_ave_temp;
if ($cpu_fans_cool_hd == 1 && $hd_duty > $hd_cpu_override_duty_cycle)
{
$cpu_fan_override = 1;
}
else
{
$cpu_fan_override = 0;
}
# $hd_duty is retained as float between cycles, so any small incremental adjustments less
# than 1 will not be lost, but build up until they are large enough to cause a change
# after the value is truncated with int()
# add 0.5 before truncating with int() to approximate the behaviour of a proper round() function
return int($hd_duty + 0.5);
}
sub build_date_time_string
{
my $datetimestring = strftime "%F %H:%M:%S", localtime;
return $datetimestring;
}
sub build_date_string
{
my $datestring = strftime "%F", localtime;
return $datestring;
}
sub build_time_string
{
my $timestring = strftime "%H:%M:%S", localtime;
return $timestring;
}
sub print_log_header
{
@hd_list = @_;
my $timestring = build_time_string();
my $datestring = build_date_string();
printf(LOG "\n\nPID Fan Controller Log --- Target $hd_num_peak Disk HD Temperature = %5.2f deg C --- PID Control Gains: Kp = %6.3f, Ki = %6.3f, Kd = %5.1f\n ", $hd_ave_target, $Kp, $Ki, $Kd);
if ($log_temp_summary_only)
{
print LOG " HD Min";
}
else
{
foreach $item (@hd_list)
{
print LOG " ";
}
}
print LOG " Max Ave Temp Fan Fan Fan % CPU P I D Fan\n$datestring";
if ($log_temp_summary_only)
{
print LOG " Qty Temp ";
}
else
{
foreach $item (@hd_list)
{
printf(LOG "%4s ", $item);
}
}
print LOG "Temp Temp Err Mode RPM Old/New Temp Corr Corr Corr Duty\n";
return @hd_list;
}
sub get_fan_ave_speed
{
my $speed_sum = 0;
my $fan_count = 0;
foreach my $fan (@_)
{
$speed_sum += get_fan_speed2($fan);
$fan_count += 1;
}
my $ave_speed = sprintf("%i", $speed_sum / $fan_count);
return $ave_speed;
}
sub dprint
{
my ( $level,$output) = @_;
# print( "dprintf: debug = $debug, level = $level, output = \"$output\"\n" );
if( $debug > $level )
{
my $datestring = build_date_time_string();
print DEBUG_LOG "$datestring: $output";
}
return;
}
sub dprint_list
{
my ( $level,$name,@output) = @_;
if( $debug > $level )
{
dprint($level,"$name:\n");
foreach my $item (@output)
{
dprint( $level, " $item\n");
}
}
return;
}
sub bail_with_fans_full
{
dprint( 0, "Setting fans full before bailing!\n");
set_fan_mode("full");
die @_;
}
sub get_fan_mode
{
my $command = "$ipmitool raw 0x30 0x45 0";
my $fan_code = `$command`;
if ($fan_code == 1) { $hd_fan_mode = "Full"; }
elsif ($fan_code == 0) { $hd_fan_mode = " Std"; }
elsif ($fan_code == 2) { $hd_fan_mode = " Opt"; }
elsif ($fan_code == 4) { $hd_fan_mode = " Hvy"; }
return $hd_fan_mode;
}
sub get_fan_mode_code
{
my ( $fan_mode ) = @_;
my $m;
if( $fan_mode eq 'standard' ) { $m = 0; }
elsif( $fan_mode eq 'full' ) { $m = 1; }
elsif( $fan_mode eq 'optimal' ) { $m = 2; }
elsif( $fan_mode eq 'heavyio' ) { $m = 4; }
else { die "illegal fan mode: $fan_mode\n" }
dprint( 3, "fanmode: $fan_mode = $m\n");
return $m;
}
sub set_fan_mode
{
my ($fan_mode) = @_;
my $mode = get_fan_mode_code( $fan_mode );
dprint( 1, "Setting fan mode to $mode ($fan_mode)\n");
`$ipmitool raw 0x30 0x45 0x01 $mode`;
sleep 5; #need to give the BMC some breathing room
return;
}
# returns the maximum core temperature from the kernel to determine CPU temperature.
# in my testing I found that the max core temperature was pretty much the same as the IPMI 'CPU Temp'
# value, but its much quicker to read, and doesn't require X10 IPMI. And works when the IPMI is rebooting too.
sub get_cpu_temp_sysctl
{
# significantly more efficient to filter to dev.cpu than to just grep the whole lot!
my $core_temps = `sysctl -a dev.cpu | egrep -E \"dev.cpu\.[0-9]+\.temperature\" | awk '{print \$2}' | sed 's/.\$//'`;
chomp($core_temps);
dprint(3,"core_temps:\n$core_temps\n");
my @core_temps_list = split(" ", $core_temps);
dprint_list( 4, "core_temps_list", @core_temps_list );
my $max_core_temp = 0;
foreach my $core_temp (@core_temps_list)
{
if( $core_temp )
{
dprint( 2, "core_temp = $core_temp C\n");
$max_core_temp = $core_temp if $core_temp > $max_core_temp;
}
}
dprint(1, "CPU Temp: $max_core_temp\n");
$last_cpu_temp = $max_core_temp; #possible that this is 0 if there was a fault reading the core temps
return $max_core_temp;
}
# reads the IPMI 'CPU Temp' field to determine overall CPU temperature
sub get_cpu_temp_ipmi
{
my $cpu_temp = `$ipmitool sensor get \"CPU Temp\" | awk '/Sensor Reading/{print \$4}'`;
chomp $cpu_temp;
dprint( 1, "CPU Temp: $cpu_temp\n");
$last_cpu_temp = $cpu_temp; # note, this hasn't been cleaned.
return $cpu_temp;
}
sub decide_cpu_fan_level
{
my ($cpu_temp, $cpu_fan) = @_;
if ($cpu_fan_override == 1)
{
if( $cpu_fan ne "high" )
{
$cpu_fan = "high";
dprint( 0, "CPU fan set to high to help cool HDs.\n");
}
}
else
{
#if cpu_temp evaluates as "0", its most likely the reading returned rubbish.
if ($cpu_temp <= 0)
{
if( $cpu_temp eq "No") # "No reading"
{
dprint( 0, "CPU Temp has no reading.\n");
}
elsif( $cpu_temp eq "Disabled" )
{
dprint( 0, "CPU Temp reading disabled.\n");
}
else
{
dprint( 0, "Unexpected CPU Temp ($cpu_temp).\n");
}
dprint( 0, "Assuming worst-case and going high.\n");
$cpu_fan = "high";
}
else
{
if( $cpu_temp >= $high_cpu_temp )
{
if( $cpu_fan ne "high" )
{
dprint( 0, "CPU Temp: $cpu_temp >= $high_cpu_temp, CPU Fan going high.\n");
}
$cpu_fan = "high";
}
elsif( $cpu_temp >= $med_cpu_temp )
{
if( $cpu_fan ne "med" )
{
dprint( 0, "CPU Temp: $cpu_temp >= $med_cpu_temp, CPU Fan going med.\n");
}
$cpu_fan = "med";
}
elsif( $cpu_temp > $low_cpu_temp && ($cpu_fan eq "high" || $cpu_fan eq "" ) )
{
dprint( 0, "CPU Temp: $cpu_temp dropped below $med_cpu_temp, CPU Fan going med.\n");
$cpu_fan = "med";
}
elsif( $cpu_temp <= $low_cpu_temp )
{
if( $cpu_fan ne "low" )
{
dprint( 0, "CPU Temp: $cpu_temp <= $low_cpu_temp, CPU Fan going low.\n");
}
$cpu_fan = "low";
}
}
}
dprint( 1, "CPU Fan: $cpu_fan\n");
return $cpu_fan;
}
# zone,dutycycle%
sub set_fan_zone_duty_cycle
{
my ( $zone, $duty ) = @_;
if( $zone < 0 || $zone > 1 )
{
bail_with_fans_full( "Illegal Fan Zone" );
}
if( $duty < 0 || $duty > 100 )
{
dprint( 0, "illegal duty cycle, assuming 100%\n");
$duty = 100;
}
dprint( 1, "Setting Zone $zone duty cycle to $duty%\n");
`$ipmitool raw 0x30 0x70 0x66 0x01 $zone $duty`;
return;
}
sub set_fan_zone_level
{
my ( $fan_zone, $level) = @_;
my $duty = 0;
#assumes high if not low or med, for safety.
if( $level eq "low" )
{
$duty = $fan_duty_low;
}
elsif( $level eq "med" )
{
$duty = $fan_duty_med;
}
else
{
$duty = $fan_duty_high;
}
set_fan_zone_duty_cycle( $fan_zone, $duty );
}
sub get_fan_header_by_name
{
my ($fan_name) = @_;
if( $fan_name eq "CPU" )
{
return $cpu_fan_header;
}
elsif( $fan_name eq "HD" )
{
return $hd_fan_header;
}
else
{
bail_with_full_fans( "No such fan : $fan_name\n" );
}
}
sub get_fan_speed
{
my ($fan_name) = @_;
my $fan = get_fan_header_by_name( $fan_name );
my $command = "$ipmitool sdr | grep $fan";
dprint( 4, "get fan speed command = $command\n");
my $output = `$command`;
my @vals = split(" ", $output);
my $fan_speed = "$vals[2]";
dprint( 3, "fan_speed = $fan_speed\n");
if( $fan_speed eq "no" )
{
dprint( 0, "$fan_name Fan speed: No reading\n");
$fan_speed = -1;
}
elsif( $fan_speed eq "disabled" )
{
dprint( 0, "$fan_name Fan speed: Disabled\n");
$fan_speed = -1;
}
elsif( $fan_speed > 10000 || $fan_speed < 0 )
{
dprint( 0, "$fan_name Fan speed: $fan_speed RPM, is nonsensical\n");
$fan_speed = -1;
}
else
{
dprint( 1, "$fan_name Fan speed: $fan_speed RPM\n");
}
return $fan_speed;
}
sub get_fan_speed2
# get fan speed for specified fan header
{
my ($fan_name) = @_;
my $command = "$ipmitool sdr | grep $fan_name";
my $output = `$command`;
my @vals = split(" ", $output);
my $fan_speed = "$vals[2]";
return $fan_speed;
}
sub reset_bmc
{
#when the BMC reboots, it comes back up in its last fan mode... which should be FULL.
dprint( 0, "Resetting BMC\n");
`$ipmitool bmc reset cold`;
return;
}
sub read_config
{
# read config file, if present
if (do $config_file)
{
$hd_ave_target = $config_Ta // $default_hd_ave_target;
$Kp = $config_Kp // $default_Kp;
$Ki = $config_Ki // $default_Ki;
$Kd = $config_Kd // $default_Kd;
$hd_num_peak = $config_num_disks // $default_hd_num_peak;
$hd_fan_duty_start = $config_hd_fan_start // $default_hd_fan_duty_start;
$config_time = (stat($config_file))[9];
} else {
dprint( 0, "Config file not found. Using default values!\n");
print "config file not found\n";
}
return ($hd_ave_target, $Kp, $Ki, $Kd, $hd_num_peak, $hd_fan_duty_start, $config_time);
}
Adding the pidfancontrol.pl script as a service
-
Create a service file for the PID fan control:
cat /etc/systemd/system/pidFanControl.service [Unit] Description=PID fan control [Service] ExecStart=/usr/sbin/pid_fan_control.pl [Install] WantedBy=multi-user.target
-
Reload the systemd-resolve service, so it can read the new
pidFanControl
service:sudo systemctl restart systemd-resolved
-
Enable the pidFanControl service, so it loads on boot:
sudo systemctl enable pidFanControl.service Created symlink /etc/systemd/system/multi-user.target.wants/pidFanControl.service → /etc/systemd/system/pidFanControl.service.
-
Start the pidFanControl service:
sudo systemctl start pidFanControl
-
Check on the status of the
pidFanControl
service:sudo systemctl status pidFanControl ● pidFanControl.service - PID fan control Loaded: loaded (/etc/systemd/system/pidFanControl.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2021-09-17 13:48:07 MDT; 5s ago
I've been using this for a few months, and it's been working like a charm.
References
Github - khorton / nas_fan_control - PID_fan_control.pl https://github.com/khorton/nas_fan_control/blob/master/PID_fan_control.pl
Noctua - NF-P12 redux-1700 PWN https://noctua.at/en/nf-p12-redux-1700-pwm
Noctua - NF-A8 PWN https://noctua.at/en/nf-a8-pwm
Supermicro Intelligent Management https://www.supermicro.com/en/solutions/management-software/bmc-resources
Ubuntu manuals - fancontrol - automated software based fan speed regulation http://manpages.ubuntu.com/manpages/focal/man8/fancontrol.8.html
TrueNAS Resources - Fan Scripts for Supermicro Boards Using PID Logic https://www.truenas.com/community/resources/fan-scripts-for-supermicro-boards-using-pid-logic.24/
TrueNAS Resources - Script to control fan speed in response to hard drive temperatures https://www.truenas.com/community/threads/script-to-control-fan-speed-in-response-to-hard-drive-temperatures.41294/page-4
TrueNAS Resources - How To: Change IPMI Sensor Thresholds using ipmitool https://www.truenas.com/community/resources/how-to-change-ipmi-sensor-thresholds-using-ipmitool.35/
die.net - ipmitool(1) - Linux man page https://linux.die.net/man/1/ipmitool