Back to Blog
Infrastructure
6 min read

Emergency Smart Hands: What to Expect at 2am

Smart HandsData CentreEmergencyInfrastructureSLA

It's 2am. Your monitoring dashboard lights up red. A storage array in Slough has thrown a critical fault, and your users in Singapore are starting to notice. Your nearest engineer is in Manchester, asleep. The data centre's own remote hands team is quoting a two-hour queue because three other customers had the same idea tonight.

This is the moment that separates organisations that prepared for emergencies from those that didn't.

Why 2am happens more often than you think

Nobody plans for their infrastructure to fail in the middle of the night. But time zones don't care about your shift patterns. If you have users in Asia-Pacific, the Americas, or anywhere not neatly aligned with UK business hours, your "middle of the night" is someone else's "peak trading hours."

The triggers for emergency callouts are predictable, even if the timing isn't. Hardware failure (a dead drive, a failed PSU, a server that won't POST after a firmware update). Connectivity loss (a fibre patch lead failed, a switch port gone dark, cross-connect showing errors). Security incidents where your SOC needs someone to physically disconnect a compromised host or pull a drive for forensic imaging. Power events (a PDU tripped, UPS showing a fault, equipment didn't come back cleanly after maintenance). Failed remote reboots where you've tried IPMI, BMC, iDRAC, and nothing works.

Any of these can happen at any hour. When they do, the clock is ticking.

What actually happens when you call

Your monitoring platform can alert us directly. We'll already have your site details, access credentials, and escalation contacts on file.

Before anyone gets in a car, we need three things. Which facility. What's the problem. Can we get in? Data centre access at 2am isn't always as simple as badging through the door. If you haven't pre-registered us on the access list, that's the first bottleneck.

For the Slough corridor (Equinix, VIRTUS, Digital Realty, and the surrounding facilities on the Trading Estate and Bath Road) we target sub-30-minute response times. For the wider M4 corridor and into London, within an hour. These aren't aspirational numbers on a website. They're the SLAs we hold ourselves to.

This is where smart hands earns its name. We don't just turn up, press a button, and leave. We get on a call with your remote team, your NOC, your SOC, your on-call engineer sitting in their kitchen trying to troubleshoot via VPN. We're the physical extension of that person. We see what they can't see. We describe indicator lights, read error codes off screens, check cable labels, trace connections.

Good communication during an incident is worth more than fast hands.

When the job is done, or when we've stabilised things enough for the rest to wait until morning, we send a documented handover. Photos of the equipment before and after. Notes on what was done. What was observed. What needs follow-up in business hours. No ambiguity.

What we can do at 2am

Realistic scope for an emergency callout:

  • Power cycle equipment (servers, switches, firewalls, storage arrays)
  • Swap failed drives (hot-swap replacements into RAID arrays)
  • Reseat cables and modules (SFPs, power cables, network cables, fibre patch leads)
  • Check and report indicator lights (front panel LEDs, PSU status, drive bay indicators)
  • Replace patch cables (we carry common types)
  • Photograph equipment (asset tags, serial numbers, cable runs, rack elevations)
  • Escort vendor engineers if your hardware vendor sends someone
  • Basic network troubleshooting (verify link lights, test ports, swap to known-good cables)

What we can't do at 2am

Honesty matters more than overpromising.

Complex network reconfiguration without a runbook. We'll troubleshoot and diagnose but we're not making changes to your core routing at 2am without documented procedures and your explicit sign-off.

Anything requiring parts we don't carry. Specific SFP modules, proprietary drive caddies, replacement power supplies. If you need something specific, we need to know in advance.

Access to areas we're not cleared for. Every data centre has its own access policies. If we're not on the list for your cage, suite, or meet-me room, we can't get in regardless of the urgency.

Work requiring vendor involvement. If the fix needs login to a proprietary management console with vendor-only credentials, we'll coordinate but we can't bypass that.

Common mistakes that cost you time

Not having DC access pre-arranged. Adding someone to an access list at 2am is possible but slow. Some facilities require 24-hour notice for new access. We sort this during retainer onboarding so it never comes up at 3am on a Sunday.

No runbook or documentation. We'll troubleshoot without one, but it's significantly faster with one. Even a basic document that says "if server X doesn't respond, try Y before Z" saves precious minutes.

Not knowing exact rack, bay, and unit location. "It's in Equinix somewhere, rack starts with a 4" isn't enough. We need facility code, hall, cage or suite number, rack ID, and ideally U position. The more precise you are, the faster we get to work.

Unclear escalation path. At 2am, we need to know who authorises what. Can we replace a drive without calling someone? Can we power cycle without sign-off? If these decisions aren't pre-agreed, we end up making phone calls instead of fixing things.

The break-glass document

Every organisation with equipment in a data centre should have one. One page is fine. Data centre access information. Rack locations with precise coordinates. Escalation contacts with mobile numbers (not office lines). Basic runbooks. Pre-authorised actions.

Keep it updated. Review quarterly. Store it somewhere your on-call team can actually access at 2am, not on a SharePoint that requires MFA from a device locked in the office.

Why Caleta

We're based in the Slough data centre corridor. Not in a head office somewhere coordinating resources from a spreadsheet, actually here, where the facilities are. Sub-30-minute response to the major Slough facilities. The same engineers, consistently. People who know the buildings, the security teams, and the loading dock procedures.

When it's 2am and something's down, you don't want to be explaining where your rack is to someone who's never been in the building. You want someone who's already been there this week.


Need emergency smart hands support, or want to set up a retainer so you're prepared before the next 2am call? Get in touch and we'll walk you through the options.

Need data centre support in the Thames Valley?

Smart hands, colocation, and emergency access. When you need it.

View Smart Hands Services