I have the following Unifi networking setup:
- Unifi Secure Gateway (USG-3P) running 18.104.22.16877096 firmware
- 3 Unifi Access Points AP-AC-PRO running 22.214.171.12461 firmware
- 3 Unifi AP-AC-M running 126.96.36.19961 firmware
- Unifi Controller (6.2.25) running on Ubuntu
- The AP’s are connected to Unifi Switches running 188.8.131.5224 firmware
The AP’s are separated into two AP Groups. One of the AP Groups contains the 3 AP-AC-PRO and one of the AP-AC-M. This latter group was the problematic group. The other AP Group which contains the remain 2 AP-AC-M continue to work flawlessly throughout incident. From here on when I reference an AP Group, it is the problematic group that contains the 3 AP-AC-PRO and 1 AP-AC-M.
It all started yesterday when I noticed that my WiFi was a bit slow in the backyard and I wanted to change my radio configurations on one of the AP-AC-PRO and one AP-AC-M in the same AP Group. After the provision, the AP went disconnected as shown by the controller.
ssh into the problematic AP-AC-PRO and discovered in
/var/log/messages that there were many instances of the following log entry:
syswrapper: [state is locked] waiting for lock
I attempted to reboot the device but the device remained in the same “locked” state.
Since it is inconvenient for me to physically reset the AP, I attempted to reset the device via
ssh using the command:
Unfortunately, this did not always work because it immediately just shows the same locking message:
syswrapper: [state is locked] waiting for lock
I had to reboot the device via the command line
reboot. As soon as I can
ssh into the device after reboot, I immediately execute the restore command as above. This took a lot of trial and error because my timing is often off. When I miss the window, I will get the lock message again. I find that my chances are higher if I first forget the device on the controller first.
A quick suggestion to the Unifi team. It would be nice that the
restore-default command if not able to restore immediately due to the lock, would at least set a flag in persistent flash memory of the device so that on the next reboot it will perform the restore then. This feature will safe me A LOT OF TIME!
Once the device is reset to factory default I proceeded to reconfigure it to the WiFi networks that I had. Unfortunately, when I try to provision the changes (adding the re-adopted device back into its original AP Group that is associated with the WiFi networks), it went into the disconnected state again. To make matters worse, the other AP’s in the same AP Group started to misbehave. Some would go into a provisioning state and followed by a disconnected state, while others go into an adopting state. This is of course very unnerving and frustrating. However, this observation lead me to remember a previous episode that I experienced a few weeks ago.
When I updated the controller to 6.2 and upgraded the AP’s firmware, the AP exhibited a similar locking issue. The solution that I employed was to restore to factory default and re-adopt the device. However after readopting, I assigned the AP to a brand new AP Group which I associated with the original WiFi networks. Simply adding the device to the original AP Group did not solve the issue.
When I tried this solution yesterday, it did not work. The device continues to go into a disconnected state immediately after provisioning when I added to the new AP Group. After many hours and much experimentation, I decided to erase all the WiFi networks and the problematic AP Group. I recreated the WiFi networks, and created a new AP Group and proceeded to add each AP one by one (after a reset to factory default). In summary here are the final steps that got me out of this pickle:
- Forget all AP’s in the affected AP Group.
- Remove all WiFi networks from the AP Group.
- Delete the AP Group.
- Delete all the WiFi networks that was associated with the above AP Group.
- Re-create all the WiFi networks and associate with a brand new AP Group.
- For each AP, use
sshto reset to factory default, adopt, and add them one at a time to the AP Group using the controller web UI.
- Since there were four AP’s (3 AC-Pro and 1 AC-M), I waited until the AP is fully connected and can service WiFi clients before I continue with the next one.
I am documenting this so that I can share with Unifi support. This has happened twice now, and each time I spent multiple hours to try to get my WiFi network working. In these pandemic times, WiFi is as important as electricity and plumbing. Since Tier-1 support was unable to resolve this issue, waiting for Tier-2 support (around 24 hours) is a bit “hard to swallow”.
Any ways I am glad that I was able to resolve this and brought my WiFi networks back up and running with the 4 affected AP’s, FOR NOW. However I must admit, these two episodes have made me apprehensive of making configurations to these AP’s, thinking that the next provisions will result in many more lost hours.
I hope the Unifi team can use this information and see if there is an issue relating to AP Group provisioning, since this seem to have triggered the issue in both cases.