Cool stuff for Raspberry Pi, Arduino and all electronics hobby projects
Notifications
Clear all

[Solved / Archived] Firmware changes to enhance reliability

9 Posts
2 Users
0 Likes
255 Views
(@insideoutgrp)
Posts: 9
Active Member
Topic starter
 

We seem to have consistent reliability issues with the WittyPi 4, although i can't accurately reproduce the issues - across our estate of 100 devices it is becoming regular.

The issues seem to be;

Systems not shutting down within the shutdown delay time (25s) or TXD not going low for some reason so the WittyPi never cuts power

We've tried using the new high reliability mode but due to the unstable state it fails in (i.e. not shut down, so power is still on), this doesn't help.

Nothing in the logs seems to show anything, so we're looking how best to amend this.

 

Could we simplify the timer1 overflow routine to immediately call the CutPower subroutine : is there any issues other than the obvious lack of shutdown of the pi that this would cause or other routines we would need to modify?

Alternatively, could the 'high reliability' or similar mode be added/modified so there is a way to make the wittypi schedule simply turn the power on/off rather than waiting for boot, waiting for safe shutdown etc as this seems to be where issues come from.

The only other way I can think of closing these issues is holding TXD low all the time so the Witty cuts the power? But open to other suggestions that don't require flashing all of our hats. 

Or increase the shut down timer in the firmware but given the register size I didn't think this could be increased past 25? 

Thanks

 

 
Posted : 12/09/2024 11:14 pm
(@admin)
Posts: 479
Member Admin
 

I guess you saw the "high reliability mode" from this topic. I would like to reiterate that we never used that name. The special mode we implemented in firmware after that discussion is called "guaranteed wake mode", and it only does exactly that. Giving the extra chance to access the device remotely dosn't mean the system is more reliable. We do not use that name because we don't want to cause such kind of misunderstanding. There is no such a mode that can magically make the system more reliable, otherwise we would have used it as the default mode already.

The "holding TXD high to prevent power cut" issue only exists in firmware revision 1 and 2. Since firmware revision 3, even if the TXD stays high after the shutdown, or the system crashs during the shutdown, the power will still be cut after the timer is due. You can confirm this by doing some simple tests.

If your Witty Pi 4 are with firmware revision 1 or 2, then you need to upgrade their firmware.

You can also change how the firmware works by modifying the firmware, and make sure it suits your needs.

 
Posted : 13/09/2024 1:05 pm
(@insideoutgrp)
Posts: 9
Active Member
Topic starter
 

Okay; can you help us better troubleshoot this issue then? 

Like I said these systems seem to shutdown correctly in the logs (see below) but sometimes every 10 - 30 days or so will just not come back online. When we physically visit the device the pi LED is red (so it has power) but has shut down, the Witty has not cut the power and the only way to get it out of this situation is to physically disconnect and reconnect the power.

The device then comes online fine (obviously showing a reconnection of power in the logs) and works fine again across multiple daily shutdown/startup cycles before seemingly for no reason doing exactly the same.

The system is running v7 firmware and also had the guaranteed wake mode enabled (with 12 hours set) so it should have also had it not correctly ran the schedule turned on after 12 hours. It's as if it gets stuck in a state where as it doesn't cut the power, it then never boots the devices properly and never see's pin 17 to show boot.

[2024-09-01 07:44:59] Shutting down system because scheduled shutdown is due.
[2024-09-01 07:44:59] Halting all processes and then shutdown Raspberry Pi...
[xxxx-xx-xx xx:xx:xx] Witty Pi daemon (v4.21) is started.
[xxxx-xx-xx xx:xx:xx] System: Raspbian GNU/Linux 10 (buster), Kernel: Linux 5.10.17-v7+, Architecture: armhf
[xxxx-xx-xx xx:xx:xx] Running on Raspberry Pi 3 Model B Plus Rev 1.3
[xxxx-xx-xx xx:xx:xx] RTC offset register has value 0x00
[xxxx-xx-xx xx:xx:xx] Seems RTC has good time, write RTC time into system
[xxxx-xx-xx xx:xx:xx] Writing RTC time to system...
[2024-09-05 12:00:43] Done 🙂
[2024-09-05 12:00:43] Firmware ID: 0x26
[2024-09-05 12:00:43] Firmware Revison: 0x07
[2024-09-05 12:00:43] Current Vin=12.51V, Vout=5.05V, Iout=0.64A
[2024-09-05 12:00:43] System starts up because power supply is newly connected.
[2024-09-05 12:00:48] Send out the SYS_UP signal via GPIO-17 pin.
[2024-09-05 12:00:49] Pending for incoming shutdown command...
[2024-09-05 12:00:49] Schedule next shutdown at: 2024-09-06 07:45:00
[2024-09-05 12:00:49] Schedule next startup at: 2024-09-06 08:00:00

Could the guaranteed wake mode be amended so that before a 'gauranteed wake' event it calls CutPower and then powers back up again?
Thanks
Jay

 
Posted : 13/09/2024 4:39 pm
(@admin)
Posts: 479
Member Admin
 

The "guarenteed wake mode" can not help you on this. The logic of this mode is implemented in the sleep() function in firmware, while your Pi was still powered and its firmware was not in sleep mode. A Pi in that status can not trigger the guarented wake, because it is not in sleep yet.

We have no longer seen similar issue after using firmware reversion 3 or above.

It will be difficult for you to troubleshoot it, because it only happens once every 10 days more.

You may consider modifying the firmware, remove the logic that you do not need, and add some code to make sure the power will be cut.

A proper power cut should contains these steps:

systemIsUp = false;
cutPower();
sleep();

The scheduled shutdown is implemented in the processAlarmIfNeeded() function.

It is still recommended to have some delay before cutting the power, but you don't have to rely on the timer1.

 
Posted : 16/09/2024 9:03 am
(@insideoutgrp)
Posts: 9
Active Member
Topic starter
 

I've looked through that function, would this modification essentially be best placed after this line, and replace the emulatebuttonpress? That way at least we can keep the register tracking of reason for shutdown etc.

else if (canTrigger && !alarm2HasTriggered && overdue_alarm2 >= 0 && overdue_alarm2 < 2) { // Alarm 2: shutdown

 

Can you think of any other reason this might be happening, that we could troubleshoot without doing physical firmware mods?

We had another one today we've visited to cut the power and it came straight back online with these logs.

So I understand the logic correctly, assuming the firmware is newer than v3 (these all are) - on shutdown, it no longer waits for TXD to go low, it just waits for the shutdown wait timer and then cuts power? Is there anywhere else it could be getting stuck?

If we put the shutdown timer to 0 seconds, would that help - or could cause other issues?

 

[2024-09-10 20:00:00] Shutting down system because scheduled shutdown is due.
[2024-09-10 20:00:00] Halting all processes and then shutdown Raspberry Pi...
[xxxx-xx-xx xx:xx:xx] Witty Pi daemon (v4.21) is started.
[xxxx-xx-xx xx:xx:xx] System: Raspbian GNU/Linux 11 (bullseye), Kernel: Linux 6.1.21-v7+, Architecture: armhf
[xxxx-xx-xx xx:xx:xx] Running on Raspberry Pi 3 Model B Plus Rev 1.4
[xxxx-xx-xx xx:xx:xx] RTC offset register has value 0x00
[xxxx-xx-xx xx:xx:xx] Seems RTC has good time, write RTC time into system
[xxxx-xx-xx xx:xx:xx] Writing RTC time to system...
[2024-09-16 11:24:04] Done 🙂
[2024-09-16 11:24:04] Firmware ID: 0x26
[2024-09-16 11:24:04] Firmware Revison: 0x07
[2024-09-16 11:24:04] Current Vin=13.56V, Vout=5V, Iout=0.79A
[2024-09-16 11:24:04] System starts up because power supply is newly connected.

 
Posted : 16/09/2024 11:32 pm
(@admin)
Posts: 479
Member Admin
 

@insideoutgrp I don't have a clue in mind. If we can already imagine how it could get stuck, we would have reprouced the issue in lab and also fixed it.

As I mentioned, the scheduled shutdown is exactly defined in this code block:
https://github.com/uugear/Witty-Pi-4/blob/main/Firmware/WittyPi4/WittyPi4.ino#L890-L898

I can't tell you what is the better way to modify it -- we think we are using the best way already. We implemented this firmware with many functionalities, if you don't need all of them, it is indeed possible to simplify the code and make the logic you need more straightforward. Also if you want to add some code to debug the firmware, it is almost impossible without (temporarily) removing some functionalities, because the current firmware is already using up the 8kB program space in MCU.

If you remove the emulateButtonClick() function call, the software (daemon.sh) will not know the firmware is trying to shutdown the Pi, and no "sudo shutdown -h now" command will be executed. In such case, no matter when you call the cutPower() function, it will not be a graceful shutdown. That equals to directly unplug the power supply. It is not healthy to your Pi, and it could even damage the SD card in use.

 

 

 

 
Posted : 17/09/2024 1:38 pm
(@admin)
Posts: 479
Member Admin
 

@insideoutgrp I want to add that, if for some reason the SYS_UP signal does not reach the MCU, the firmware will not be in correct state to handle the incoming shutdown event, and similar issue might happen.

The SYS_UP signal is sent from the Raspberry Pi, via GPIO-17 to the MCU, and it is sent automatically by the daemon.sh during the boot.

If you look into the daemon.sh, on line 182~192, you can see how it is done:

# indicates system is up
log "Send out the SYS_UP signal via GPIO-$SYSUP_PIN pin."
gpio -g mode $SYSUP_PIN out
gpio -g write $SYSUP_PIN 1
sleep 0.1
gpio -g write $SYSUP_PIN 0
sleep 0.1
gpio -g write $SYSUP_PIN 1
sleep 0.1
gpio -g write $SYSUP_PIN 0
sleep 0.1
gpio -g mode $SYSUP_PIN in

In order version of software, the GPIO-17 was toggled for once to send out one SYS_UP, now it does it twice and two SYS_UP signal are sent in a row. We made such modification because we realized the importance of SYS_UP signal and we want to make sure it will be received by the MCU (firmware).

Maybe you would like to make the SYS_UP signal even easier to be received, by increasing the sleep time (e.g. from 0.1 to 0.3) between setting GPIO-17 high and low. Or even sendign one more SYS_UP signal to be sure.

When the SYS_UP signal is received by the firmware, the white LED blinks quickly. With the current software you will see it quickly blinks twice during the boot.

You may also want to double check the GPIO-17 connection.

 

 
Posted : 18/09/2024 8:29 am
(@insideoutgrp)
Posts: 9
Active Member
Topic starter
 

If the sys-up isn't received, would there be a way to tell in the logs etc - as the Witty seems to be all communicating okay.

Would one way of proving it be to remove the shutdown -h now command from the utilities.sh that way unless the MCU cuts the power the system would remain on?

 
Posted : 18/09/2024 9:49 am
(@admin)
Posts: 479
Member Admin
 

I don't think that could prove it. 

Removing the shutdown command from the software only makes graceful shutdown impossible.

It may be possible to keep a bit in the register to indicate the SYS_UP has been received. However I am not sure if we still have the program space for this logic.

 
Posted : 18/09/2024 4:59 pm
Join Waitlist We will inform you when the product arrives in stock. Please leave your valid email address below.