Microsoft confirmed that the Azure IoT disconnect incident started on 12/19 due to a faulty backend node that was not working properly. When the node was reset on 12/20, it resulted in devices disconnecting and then quickly trying to reconnect to Azure's Device Provisioning Service (DPS). The connection rate resulted in DPS message throttling which prevented the devices from attaching to the Poly Lens service.
To mitigate the issue Microsoft increased our DPS message quota on 12/21 in hopes it would allow our devices to reconnect. Initially, we observed improvement in the connection rate, but then it regressed. The IoT team identified the issue was occurring after the DPS connection, where some IoT API responses were being throttled. The throttling prevented the devices from clearing their registration check because when a device would check if it was successfully registered, the API would timeout, and the device would restart the registration process again.
Today, the IoT team reduced the DPS registration quota and increased the IoT GetOperations API quota which allowed the full device registration operation to complete on DPS; thus allowing our devices to connect to the service.
Before and throughout this issue, the Poly Lens engineering team worked closely with the Microsoft IoT team to identify potential improvements in our IoT client. We’ve already implemented an improved DPS registration process and we expect to have this available in device software releases early in 2023.