Ask HN: A software to track errors and group into root causes?

I am working on a team which maintains internal batch processing system. To keep service quality high, we centrally record all failures/errors, look at every one of them, and assign them to root cause tickets. A frequent failure will get fixed ASAP, one of those once-per-week sporadic failures will get prioritized and put in the next sprint. Sometimes a service breaks and there are dozens of failures (usually binned to one root cause ticket), but most of the the times it is less than a failure per day.

Unfortunately, we have no good way to manage the failures -- we are currently using custom scripts + JIRA and it does not work very well. We are happy to pay to external service, but I simply cannot find anything!

Things like Datadog or Sentry deal in statistics and error groups... but we want to look at every failure to make sure nothing slips through the cracks. JIRA is too slow and limited. We even tried Google sheets, but they do not scale.

Does anyone has similar problem - tracking each individual failure, not just aggregate/counter? What do you use?

1 points | by theamk 8 hours ago

0 comments