• #186 Business Continuity lessons learnt from CrowdStrike

  • Aug 13 2024
  • Duración: 37 m
  • Podcast

#186 Business Continuity lessons learnt from CrowdStrike

  • Resumen

  • In July 2024, A logic error in an update for CrowdStrike’s Falcon software caused 8.5 million windows computers to crash. While a fix was pushed out shortly after, the nature of the error meant that a full recovery of all effected machines took weeks to complete. Many businesses were caught up in the disruption, regardless of if this affected them directly or by proxy due to affected suppliers. So, what can businesses learn from this? Today, Ian Battersby and Steve Mason discuss the aftermath of the CrowdStrike crash, the importance of good business continuity and what actions all businesses should take to ensure they are prepared in the event of an IT incident. You’ll learn · What happened following the CrowdStrike crash? · How long did it take businesses to recover? · Which ISO management system standards would this impact? · How can you use your Management System to address the affects of an IT incident? · How would this change your understanding of the needs and expectations of interested parties? · How do risk assessments factor in where IT incidents are concerned? Resources · Isologyhub · ISO 22301 Business Continuity In this episode, we talk about: [00:30] Join the isologyhub – To get access to a suite of ISO related tools, training and templates. Simply head on over to isologyhub.com to either sign-up or book a demo. [02:05] Episode summary: Ian Battersby is joined by Steve Mason to discuss the recent CrowdStrike crash, the implications on your Management system and business continuity lessons learned that you can apply ahead of any potential future incidents. [03:00] What happened following the CrowdStrike crash?– In short, An update to CrowdStrike’s Falcon software brought down computer systems globally. 8.5 million windows systems, which in reality is less than 1% of windows systems, were affected as a result of this error. Even still, the damage could still be felt from key pillars of our societal infrastructure, with a lot of hospitals and transportation like trains and airlines being the worst affected. [04:45] How long did it take CrowdStrike to issue a fix? – CrowdStrike fixed the issue in about 30 minutes, but this didn’t mean that computers affected would be automatically fixed. In many cases applying the fix meant that engineers had to go on site to many different locations which is both time consuming and costly. In some cases Microsoft said that some computers might need as many as 15 reboots to clear the problem. So, a fix that many were hoping would solve the issue ended up taking a few weeks to fully resolve as not everyone has IT or tech support in the field to issue a manual reboot. A lot of businesses were caught out as they don’t factor this into their recovery time, some assuming that an issue like this is guaranteed to be fixed within 48 hours, which is not something you can promise. You need to be realistic when filling out a Business Impact Assessment (BIA). [07:55] How do you know in advance if an outage will need physical intervention to resolve? – There is a lesson to be learnt from this most recent issue. You need to take a look at your current business continuity plans and ask yourself: · What systems to you use? · How reliable are the third-party applications that you use? · If an issue like this to reoccur, how would it affect us? · Do we have the necessary resource to fix it? i.e. staff on site if needed? Third-parties will have a lot of clients, some may even prioritise those that pay a more premium package, so you can’t always count on them for a quick fix. [09:10] How does this impact out businesses in terms of our management standards? – When we begin to analyse how this has impacted our management systems, we can’t afford to say ‘We don’t use CrowdStrike therefore it did not impact us’ – it may have impacted your suppliers or your customers. Even if there was zero impact, lessons can be learned from this event for all companies. Standards that were directly affected by the outage were: · ISO 22301 – Business Continuity: Recovery times RPO and RTO; BIA; Risk Assessments · ISO 27001 – Information Security: Risk Assessment; Likelihood; Severity; BCP; ICT readiness · ISO 20000-1 – IT Service Management; Risk Assessment of service delivery; Service continuity; Service Availability Remember, our management systems should reflect reality and not aspiration [11:30] How do we use our Management Systems to navigate a path of corrective action and continual improvement? – First and foremost an event like this must be raised as an Incident – in this case it would no doubt have been a Major Incident for some companies. This incident will typically be recorded in the company’s system for capturing non-conformities or ...
    Más Menos

Lo que los oyentes dicen sobre #186 Business Continuity lessons learnt from CrowdStrike

Calificaciones medias de los clientes

Reseñas - Selecciona las pestañas a continuación para cambiar el origen de las reseñas.