A customer service strategy for when the sh*t hits the fan.
It was a nightmare we didn’t even know we could have. Our app was down. At 3am. On a Saturday.
There was one engineer on call. They were focused on fixing the outage.
No one was telling our customers.
Four hours later, the app was back up and running.
But we had 60 new messages in the inbox.
(60 emails between 3-7am on a Saturday is…not normal for us.)
Once our Head of Support woke up, she hit the inbox like a woman scorned.
And vowed to never let this happen again.
Well, the part where we don’t tell customers what’s going on (the app will go down again).
So we created this six step customer service strategy for dealing with major service disruptions.
6 step customer service strategy for handling major service disruptions
Our engineering team had a port-mortem meeting to go over the details of the outage. Our customer support team decided to force their way in join.
We didn’t have anything to add about the way the technology worked or how our developers solved the problem. But we had a lot of thoughts on how the communication surrounding this incident could have been better.
Both internally—so the entire company knew what was happening and could react appropriately—and externally—so we could improve customer interactions during a crisis.
We came up with this customer service checklist detailing everything that needs to happen before, during, and after an outage to prevent unhappy customers.
Yes, the strategy stems from our customer support reps. But as we drafted it, we realized we needed every member of our small business to be just as involved as we were. Our customer relationships relied on their actions.
Here’s how to create this document for your own team, broken into six steps.
1. Define what a major service disruption is to your business
Groove is a B2B SaaS company. For us, a major service disruption is an outage. Although the type and scale of an outage can vary depending on which product or service it impacts.
To start, we defined the parameters of what makes an outage an outage. This way, every team members’ expectations align.
We broke it into two main types of outages:
- Planned outages
- Unplanned outages
If you also work in tech, you know the jargon here. Sometimes, developers have to purposely bring the app down for scheduled maintenance. This is a planned outage.
We defined planned outages using this criteria:
- Planned outages take place on Saturdays, during overnight US hours.
- Planned outages should be no longer than 20 minutes
We reasoned that overnight on a Saturday was least likely to impact the majority of our customers. And as long as we had lead time, we could schedule a support team member to check in on Sunday morning.
Our engineers told us 20 minutes was reasonable for anticipated downtime.
To keep it simple, anything that falls outside of these guidelines is an unplanned outage.
We have various levels of urgency within these two types, since certain parts of our app can stay up while others are down. Once you start building protocols, you’ll find the hypochondriac in you can chart out every worst case scenario. It’s a useful trait in customer service.
For the purpose of this post though, we’ll stick to the basics. You can always branch off and create more specific documentation for different types of outages from here.
This initial conversation was a crucial first step in building our customer service strategy. We were able to hammer out some fundamental logistics for both the support team and our engineers.
2. Come up with basic questions your customer service team will need to know
Customer service agents know what users will ask before they ask it. We also know what information we’ll need to set up the inbox (or a call center if you offer phone support) before an issue occurs.
By coming up with a list of common questions upfront, we eliminate a lot of back and forth down the road. We prep responses without needing to wait for our engineers to respond to us while fixing an outage.
Here’s the list we came up with:
- How many customers will this affect (in percent)? If under 100%, is this filtered in some way that could be helpful for messaging? (i.e. certain types of users or accounts)
- Start day and time?
- End day and time?
- Worst case scenario, how long will this take?
- What will they see when they log in to the app?
- What will they see if they are already logged in?
These were all questions our developers could easily answer. We just never thought to ask them.
By giving us this information beforehand, we could send out proactive messaging to our 2,000+ account owners to improve customer satisfaction during a crisis. In addition to preparing our inbox to track customer data and automatically respond to anticipated customer feedback.
Here’s a look at our canned reply for a more recent planned outage:
We can pass this information on directly to keep customer happiness intact. As well as spread it within the support team to make sure we’re all fully aware of the situation.
This way, when customers email in with something like this, we know what’s going on and can respond quickly:
3. Create guidelines for next steps
We wanted every decision within our customer service strategy to be as transparent as possible. This way, anyone could pick up this checklist and know what to do without any time-consuming training program.
We consider this crucial working at a lean startup. You shouldn’t have to be the head of support, or even on the support team, to prioritize customer care in a crisis.
So we laid out our decision-making process clearly:
Based on the above questions, we will move forward with FULL communication plan if:
- 5-100% of customers are affected
- AND
- Start and end time are over 30 minutes
Based on the above, we will move forward with PARTIAL (includes Slack, Status Page, Twitter, CS Inbox only — No Email) communication plan if:
- 5-100% of customers are affected
- AND
- Start and end time are under 30 minutes
4. Build a checklist for what needs to be done during an outage
An unexpected outage will catch you by surprise. Your senses will be thrown off. Your logic will break down. Don’t wait until then to figure out what to do, how to do it, and when to do it.
We set up a list of everything that needs to be done while the outage is happening, from messaging in Slack to posting on social media. We quickly realized, although this was a customer service strategy, we needed to involve almost every other department.
Engineers needed to provide information. Support needed to tip off marketing. Every part of the company would be affected by this.
So we broke out sections according to tools and timing, with assignments per department:
- Slack:
- Engineer On Call will post in #customer-service:
“@here We are down for maintenance.
Start time: X
Anticipated end time: X.
X% of customers are affected” - Then answer these questions:
- Best case scenario, how long will this take?
- Worst case scenario, how long will this take?
- What will they see when they login to the app?
- What will they see if they are already logged in?
- Post updates every hour until back up.
- Engineer On Call will post in #customer-service:
- Status Page:
- Engineer On Call will update the status
- Twitter:
- Customer Service post 12 hours before (planned outage)
- Post 1 hour before (planned outage)
- Post at start time of outage with anticipated end time (link to status page)
- Email (planned outage):
- Customer Service drafts email at least 4 days before
- Include: Engineer provided information to questions above
- Marketing gives final copy edits and approval
- Customer Service sends email 2 days before outage
- Customer Service drafts email at least 4 days before
- Customer Service Inbox:
- Edit autoresponder to include note at top: *Our app is currently down due to scheduled maintenance. It will be back by [TIME].*
- Pre-written canned reply will be labeled with date of outage (planned outage)
- Create canned reply with Engineer provided information and label it with date of outage (unplanned outage)
- Add tag with date of outage to any related tickets
5. Build a checklist for what needs to be done after an outage
“Breath a sigh of relief” is the assumed first step after coming back from an outage. For all the other tasks, we created another checklist.
This would allow any team member to jump in during an emergency. Even for team members who presumably know what to do after an outage, a clear checklist makes sure we keep our cool after a stressful incident.
Here’s what our post-outage customer service strategy looks like:
- Slack:
- Engineer On Call posts in #customer-service:
- “@here We are back up. Outage started at [TIME AND DAY] and ended at [TIME AND DAY].”
- Status Page:
- Engineer On Call updates the Status Page
- Twitter
- Customer Service post we are back up
- Respond to customer mentions and replies
- Customer Service Inbox:
- Customer Service edit autoresponder to remove note at top *Our app is currently down due to scheduled maintenance. It will be back by [TIME].*
- Edit canned reply (labeled with date of outage) to past tense
- Add tag with date of outage to any related tickets
6. Get everyone to sign off
At the bottom of our checklist is a space for every department head to sign. Like we said, this is a customer service strategy, but we cannot do anything without the participation of the entire company.
If no one tells customer support there’s an outage…we can’t implement our outage protocol.
We required all team leads to agree on the strategy and approve the plan. Their signature reflects their commitment to and understanding of their responsibilities during an outage.
We double, triple, and quadruple checked that everyone fully understood the plan and knew how to execute it themselves should need be.
This customer service strategy saves our (work-)life
Since implementing this customer service strategy, we’ve gotten our lives back. Our Head of Support finally stopped having nightmares about unexpected outages. And everyone can take real vacations knowing that any team member can step up in an emergency.
We’ve had several planned outages over the past year. And a few unplanned ones.
Our engineers were quick to do their part and alert the customer service team. Our support staff remained calm and followed the protocol. We had everything under control. Customer loyalty remained intact.
If you haven’t had an eye-opening outage or major service disruption yet, know that it will come. Do the work of creating a customer service strategy before you need to use it.
Grab a copy of our outage protocol to help you get started on yours!