Cartell System Status

System Status Comments
Consumer Services Normal
Dealer Services Normal
XML/SOAP/VRM Normal
Carstat/CIG Normal
COO/CPG Valuations Normal
HPI UK Data/NMR/VRM Normal
ICB/HPI(Ireland) Normal
SMS/Email Normal
SFTP Data Normal

 

Weekly Maintenance – HPI UK

HPI perform their weekly data updates on Sunday mornings from 00:00 am to about 07:00am, however, systems are usually fully updated and functional by 02:00am. During the update period access to UK data is restricted, and vehicle data may not be available

2017-06-02 System Outage

Incident Description

On Friday 2nd June 2017 between 12:42 and 12:46 most Cartell Data Services (excluding the Website) were unreachable to traffic. The IT team were alerted to the issue two minutes into the outage and the system was up and running again within a few minutes. Apologies to all who were affected.

Analyis of Outage

Following extensive research we have identified the root cause of this issue and confirmed same by replication in QA . The downtime was caused by an unscheduled table-locking backup being performed on a new production database during a busy period. As the table was relatively large, table locking during the backup caused server threads to hang while waiting to insert new data. Once all threads were exhausted each app server refused new connections. HA systems automatically came into play, however, due to a mis-configuration at a load balancer, the locked database remained a single point of failure.

Remedial Action

  1. Avoid running backups on production systems during busy periods
  2. Avoid backup scripts that lock tables unnecessarily
  3. Configuration on Load Balancer has been corrected.
  4. All LB configuration rechecked for consistency.
2017 QOS  (YTD 2017-06-07)
Uptime percentage (excluding scheduled maintenance) 99.995+%
Uptime percentage (including scheduled maintenance) 99.995+%
Scheduled downtime 0 hrs
Unscheduled downtime 4 min