Back to overview

SignalPrint, Rules APIs briefly unavailable

Nov 07 at 12:13pm CST
Affected services
Verosint Application Frontend
SignalPrint API
Rules API

Resolved
Nov 07 at 12:13pm CST

Incident Overview

At 10:19:06 am CST, Verosint's API backend cluster experienced a node failure which caused a brief downtime on an internal middleware service used by SignalPrint and Rules APIs. During the incident, requests to https://api.verosint.com/v1/signalprint or https://api.verosint.com/v1/rules endpoints would have returned a HTTP 401 error for an invalid API token. The issue was detected and remediated automatically, with full recovery of all services by 10:22:16 am CST.

Impact

Requests sent to either API during the incident failed. Log streams sent via Verosint's Auth0 integration also initially failed, but succeeded on subsequent automatic retries once the incident had recovered, with no data loss.

Response and Root Cause

Normally, a node loss should be a non-event, as Verosint's engineering standards require all services to be highly-available across multiple availability zones but, in this case, the middleware service was not properly configured and when the cluster node failed, it took the lone instance of the middleware service with it. Fortunately, the cluster detected and remediated the failure 2 minutes after it occurred, with full recovery validated within 1 minute of redeployment.

Verosint engineering has corrected the misconfiguration and are taking additional steps to audit/test for improper configurations of every platform service.