|HPI UK Data/NMR/VRM||Normal|
Weekly Maintenance – HPI UK
HPI perform their weekly data updates on Sunday mornings from 00:00 am to about 07:00am, however, systems are usually fully updated and functional by 02:00am. During the update period access to UK data is restricted, and vehicle data may not be available
2017-06-02 System Outage
On Friday 2nd June 2017 between 12:42 and 12:46 most Cartell Data Services (excluding the Website) were unreachable to traffic. The IT team were alerted to the issue two minutes into the outage and the system was up and running again within a few minutes. Apologies to all who were affected.
Analyis of Outage
Following extensive research we have identified the root cause of this issue and confirmed same by replication in QA . The downtime was caused by an unscheduled table-locking backup being performed on a new production database during a busy period. As the table was relatively large, table locking during the backup caused server threads to hang while waiting to insert new data. Once all threads were exhausted each app server refused new connections. HA systems automatically came into play, however, due to a mis-configuration at a load balancer, the locked database remained a single point of failure.
- Avoid running backups on production systems during busy periods
- Avoid backup scripts that lock tables unnecessarily
- Configuration on Load Balancer has been corrected.
- All LB configuration rechecked for consistency.
|2017 QOS (YTD 2017-06-07)|
|Uptime percentage (excluding scheduled maintenance)||99.995+%|
|Uptime percentage (including scheduled maintenance)||99.995+%|
|Scheduled downtime||0 hrs|
|Unscheduled downtime||4 min|