As I’m sure you’re aware, SchoolCloud experienced an outage yesterday between 4pm and 5:04pm. Yesterday did not live up to the high expectations schools have come to expect from us, nor what we expect of our ourselves, especially at this difficult time for schools. We’ve put together this email to explain what went wrong, summarise the lessons learned, and let you know what we’ve done to prevent any similar incident in the future.
Last year, in November and December, we successfully handled over 40% more simultaneous video appointments than were due to take place at 4pm yesterday. Nonetheless, an outage was caused by the performance of a single page, causing one of our servers to hit capacity. This meant that traffic was directed to the other servers and it was this additional traffic which then brought down those servers in a domino effect. Initial attempts to bring the servers back up were thwarted by each server, as it came online, being met by the level of traffic normally served by multiple servers in our web-tier, knocking it offline again.
The system is designed to load everything required to attend all of a teacher or parent’s video calls when they access their video call page. This means that the only parents or teachers affected were those who were not already on the video call page before the outage started. Those that were already on the Video Call page could continue to move from one appointment to the next without requiring contact with SchoolCloud servers.
The page in question was the My Bookings page for teachers, which teachers are taken to after they log in. Once this page was identified as the cause, we blocked access to it in order to relieve pressure on our infrastructure. This brought the system back online at 4:36pm, albeit without the ability for teachers to login. Meanwhile our development team worked to urgently update the My Bookings page to reduce the computation involved. This update was made live at 5:04pm and restored the ability for teachers to login and connect to video calls.
At SchoolCloud we firmly believe virtual parents’ evenings are the future, so in November we started the provisioning process to put in place significant additional capacity to the servers on our web-tier. Regretfully, this additional capacity had not been brought online in time for yesterday. Despite a single page being the culprit, we have no doubt this additional capacity would have helped. We’ve therefore accelerated this so that it will be online by next week. Combined with a performance improvement to the teacher’s My Bookings page, we’re confident that we will not see any repeat of this issue. We will continue to monitor loads on all aspects of our infrastructure during parents’ evenings to ensure we have sufficient capacity to handle rapid growth.
We also recognise that communication is of utmost importance, to help you stay informed during the course of any outage, so that you can advise teachers & parents of what actions may be needed. We can do better, and are making internal changes to give additional SchoolCloud employees access to our communication platforms in order to post status updates quicker and more frequently. In addition, we intend to launch a status page in the near future, hosted by an external provider so that any internal issues will not affect our ability to keep you informed.
Please accept my personal apologies for any hassle or inconvenience caused by this outage. Believe me, this is the last thing we’d want to happen. Ultimately we are here to help schools and wouldn’t be here without you. I know from all the positive comments from parents & teachers how much they’ve enjoyed using the video capability we developed in March last year to keep in touch with their child’s educational progress.
In the meantime, if your school decided to cancel their evening, and wishes to reschedule it to a later date, you can easily do so by following this guide: Changing the date of an Evening.
Finally, if you wish to share a statement from ourselves to parents, please feel free to use the below:
“SchoolCloud provides parents’ evening software to over 5,000 schools, and experienced an outage on 14th January affecting the parents & teachers accessing video calls at that time. An update to the system in December caused a web page to perform slowly and this ultimately took the system offline. This wasn’t up to the high standard of service that schools expect of us, nor that we expect of ourselves. We have already made changes, and continue to take further steps to ensure that you can continue to enjoy this new format of parents’ evenings.”