Introducing the 2024 PKI & Digital Trust Report     | Download the Report

  • Home
  • Blog
  • Outages
  • Getting Ahead of Certificate-Related Outages With Automation and Visibility

Getting Ahead of Certificate-Related Outages With Automation and Visibility


This article was originally published by Security Magazine on August 18, 2022.

As organizations seek to transform their operations, their digital footprint continues to expand. As we shift from traditional information technology (IT) to more dynamic workloads in the cloud and at the edge, the number of machines is growing at an exponential rate. Today, these machine identities include everything from connected Internet of Things (IoT) and mobile devices to software-defined applications, cloud workloads, virtual machines, containers and even the code running on them.

With the proliferation of devices, today’s enterprises face increasing security and compliance challenges as they struggle to manage their growing machine identity landscape. This includes protecting their machine identities, cryptographic keys and digital certificates.

The State of Machine Identity Management report from Keyfactor examined the role of public key infrastructure (PKI) and machine identities in securing modern enterprises. Based on a survey of more than 1,200 IT security professionals worldwide, the report identified risks and challenges organizations face as their role continues to evolve. Among the most difficult challenges to manage is the growing frequency and severity of certificate-related outages.

The growing certificate outage problem

As machine identities grow within an enterprise, so do the number of associated digital certificates. This is because an encrypted connection is needed to establish trust in the digital transactions between that machine and other devices, users and workloads across the business. Among the costly consequences of ineffectively managing digital certificate lifecycles are outages that can lead to business interruptions and outages. For example, if left unmanaged, certificates expire unexpectedly, causing critical applications or services to stop working.

Looking back at the survey findings, a majority of survey respondents (81%) reported that their organization had experienced at least two or more certificate-related outages in the previous 24 months, up from 77% in 2021. Time to recovery (TTR) from a certificate-related outage took three or more hours on average for 67% of teams to identify and remediate an outage. That includes initial detection, locating the expired certificate, issuing a new certificate, replacing the expired certifi­cate and restarting services. For 38% of respondents, it took security teams more than four hours to recover from a certificate-related outage.

Most enterprises rely on a patchwork of spreadsheets and internal PKI interfaces to manage digital certificates. Without proper visibility into certificates and their locations, it can take teams hours to remediate certificate-related outages. Regardless of size or industry, an overwhelming majority of companies do not know how many keys and certificates they have, who they belong to, what policies they comply with or when they expire.

Strategies for successful certificate management

The financial and operational impacts of just a single expired certificate can reach across the entire organization. In many cases, outages also affect customers and business partners. While IT and security teams spend hours identifying the root cause and replacing the expired certificate of a business-critical website or application, brand reputation and revenue suffer immediate impacts, it’s estimated that unplanned network downtime triggered by expired certificates costs an organization more than $300,000 per hour.

According to the report, the average number of internally trusted certificates grew nearly 16% since last year’s study. More certificates and shorter lifespans are proving difficult to manage, with 70% of respondents indicating that the growing use of keys and digital certificates has significantly increased the operational burden on their IT organization. Another 65% are concerned about the increased workload and risk of outages due to shorter SSL/TLS certificate lifespans, which was cut in half from 27 months to just 13 months in September 2020.

Given the risks such as outages, organizations need to make it a priority to improve how they manage certificates to proactively get ahead of certificate-related security outages. Far too many organizations still rely on a patchwork of manual spreadsheets, tools provided by their SSL/TLS vendor and homegrown tools to manage certificates.

One of the most important steps organizations can take to simplify certificate management is to invest in automation tools that help increase visibility into certificates and automate the lifecycle management of those certificates. According to the report findings, the adoption of certificate lifecycle management tools is on the rise, with 44% of respondents reporting that their organizations use a dedicated certificate lifecycle management (CLM) solution — a sharp increase from 36% in 2021. In fact, 60% of respondents cite lifecycle automation as a top priority for the coming year.

Making visibility and automation a priority

As many organizations shift from tradi­tional IT to more dynamic workloads in the cloud and at the edge, the number of machines in use continues to grow. This means there are more and more certificates to oversee because each of these machines needs an identity in the form of cryptographic keys or digital certificates.

Ineffectively managing digital certificates — or even worse, not managing them at all — can cause massive disruptions in the form of outages. This in turn can cause a significant delay in productivity or open a loophole for a potential data breach.

Increasing visibility and adding automation for the management of digital certificates are two proactive steps organizations can take to reduce outages. As security leaders seek to reduce and prevent the number of certificate-related outages, both visibility and automation should be made a near-term priority. If they can deliver on this, they will be well on their way toward reducing the frequency and severity of outages.