Skip to content
Naked Security Naked Security

Sysadmin SNAFU flushes whole company down the drain

The only backup you'll ever regret... is the one you didn't make.

Here’s a fun story that’s doing the rounds right now.

It’s the perfect anecdote to cheer up a Friday afternoon.

Actually, it isn’t, because it’s all about someone else’s deep misfortune.

It’s more of a There, but for the grace of God, go I, but we thought we’d tell the tale anyway, so that you don’t go there yourself.

Just imagine…

Imagine that you were going to make a backup last night, but you never quite got a Round Tuit.

This morning, you got slammed by ransomware that scrambled all your files, so you breathed in really deeply, set your jaw firmly, got out your Bitcoin wallet

…only to find that the crooks wouldn’t take your money, didn’t care about your files, just shrugged and walked away.

Except they didn’t just wipe your files, they wiped everybody’s: your own files, your staff’s files, your customer’s files, along with their web server configurations, their emails, their operating systems, everything, the whole nine yards, washed down the drain, into the river, out to the North Sea. [You’re mixing your metaphors againEd.]

And, anyway, it wasn’t the crooks that did it – it was you that did the damage, self-inflicted with a simple slip of the fingers.

Stuff happens

It happens, and sometimes a bit of what you might call “tough love” is all you’re going to get, which is what happened to a user called Bleemboy on Server Fault when he asked this question:

Bleemboy: I run a small hosting provider with more or less 1535 customers… All servers got deleted and the offsite backups too because the remote storage was mounted just before by the same script (that is a backup maintenance script).

How I can recover from a rm -rf / now in a timely manner?

“Tough love” answers came back within about half an hour from André, Sven and Michael:

André: If you really don’t have any backups I am sorry to say but you just nuked your entire company.

Sven: I feel sorry to say that your company is now essentially dead.

Michael: You’re going out of business. You don’t need technical advice, you need to call your lawyer.

To explain…

On a Unix-like system, rm is the command to remove a file, or to delete it, in the slightly blunter terminology of Windows.

The / means “the root directory,” short for starting at the very top of everything.

The -r means “recursive”, which is geek-speak for saying that you want to delete the subdirectories too, oh, and if they have subdirectories, even if they’re mapped to other drives on the network, or have removable disks mounted…heck, it means “spare nothing.”

Then, to make assurance double-sure, there’s -f, for “force,” which means not only that you won’t take no for an answer, but also that you don’t even want to bother asking in the first place.

Why in a script?

But why would any sysadmin put rm -rf / in a script, not least because the script would inevitably be one of its own victims? [Not necessarily, e.g. due to chroot, but don’t let me interrupt youEd.]

Surely you’d notice the self-contradictory nature of such a command?

In this case, the unfortunate sysadmin had written something like:

rm -rf $1/$2

The idea is that the items with the dollar signs are variables that get replaced at runtime, for example by setting $1=user/16504 and $2=retired-files/, so that the script can be used to handle archiving for different users and different directories at different times.

Unfortunately, as Bleemboy himself pointed out:

Those variables [were] undefined due to a bug in the code above.

You can figure out what happens if you replace $1 and $2 above with nothing at all.

What to do?

In computer science courses – even those that supposedly don’t explicitly deal with security – you will learn (and, hopefully, learn to appreciate) all sorts of generic protections against this sort of bug.

Security-conscious programming languages can help if they detect, trap and stop code where variables aren’t defined, to make sure you say what you mean, and mean what you say.

Pair-programming can help, where you always work with a co-pilot, regularly swapping roles, so there are always two pairs of eyes on the job, and there’s always someone on the spot to ask the difficult questions when you start getting careless.

A vigorous testing process is vital, where you don’t just crack out the code and check that it mostly works, but also produce code to help to test your code, which includes testing that it fails correctly too, an outcome that is not an oxymoron in software engineering.

Security wrappers can help, too, like the safe-rm flavour of the rm command that lets you keep a “defence-in-depth” blocklist of files that should never be deleted, even if you try very hard, in order to protect you from yourself.

But the big one here is backup.

If ransomware has one silver lining, it’s the fact that it’s getting backup a bit closer to the front of our minds.

The only backup you will ever regret…

…is the one you didn’t make.

(Encrypt your backups. That way if someone steals your offsite disks, there’s still no data breach.)


Image of sewer outfall pipe courtesy of Shutterstock.

37 Comments

Great, love this stuff…. Happens to all of us, hopefully not when root…
Thanks

Jack
Arizona

Something that seems to be missed on the stories I’ve seen about this is that it’s a hosting provider. This points not only to the need to do backups, and keeping those backups in a manner where the live system can’t nuke them, but that you need to make sure your hosting providers are doing the same.

or #Bleemboy…”you now have 1535 plus justified haters…run” … oh my, I think I’d be changing my shorts after that mistake…now…where’s that backup of my backup’s backup? :)

“backup of my backup’s backup”….. classic! Everyone burned by not having a backup of data now has a backup to the backup of the backup.

I’ve never been burned, but I’ve always backed up my back up & then burned real important stuff on my backups back up onto a DVD-R disc. Overkill? That word doesn’t exist in my dictionary lol

I feel bad for the guy, that’s …WOW

Many years ago I did something very similar on a Unix system, when I intended to delete the entire contents of a specific directory. Unfortunately I typed a command that mixed Unix & DOS syntax (as I said, it was a long time ago!). So, instead of rm -r *, I typed rm -r *.*

That was the day I discovered that the first item that matches *.* in any Unix directory listing is .. which is the parent directory. So, even though I was way down deep in the filestore, Unix helpfully recursed right up to the root, then down through the entire filestore.

That was also the day I learned the importance of keeping backups (which I didn’t have, unfortunately), and the start of several days of reinstalling OS, drivers and apps from a huge stack of floppy disks.

Just be thankful you had the opportunity to spend those “several days of reinstalling OS, drivers and apps from a huge stack of floppy disks”. Most places wouldn’t let you touch a system after something like that. You’d be on the UI line.

30 years ago as a new UNIX system operator, I saw a drive was nearly full and I rm’d a log file, and oops nothing worked. I didn’t know what happened. It was mid morning and staff was beginning to login. I quickly restored the file from a backup… and learned all about /dev/null.

Ouch! My heart goes out to this sysadmin and his customers. This was painful to read!

Well this would not have happened on properly partitioned SOLARIS system. At worst the root partition would have been removed.

A copy on write file system would solve this. Restoring a ZFS snapshot would have restored almost everything in a few minutes.

While this was an obviously avoidable mistake, I’d hope he would go after some sort of data recovery program that seeks to restore deleted data assuming he hasn’t yet re-written anything over the disks. Anything is worth trying to avoid the impending wrath of his customers…

As the Linux man page puts it gently:

“Note that if you use rm to remove a file, it might be possible to recover some of its contents, given sufficient expertise and/or time.”

That’s more of a warning that the FBI/NSA/GCHQ might be able to do it, if they *really* wanted to, than an encouragement that it’s likely if you were to try :-)

Mmm… The guy wrote that it was using Ansible, it doesn’t let you destroy your root dir. RM also should be launched with –no-preserve-root to do that kind of disaster.. It’s more likely a troll..

It’s possible. Or perhaps he wasn’t using the GNU rm? (OS X doesn’t have –preserve-root, for example.) Or perhaps he had –no-preserve-root in there but left that detail out?

To explain: most Linuxes use an “rm” that has an option “–preserve-root”, which on by default. That blocklists “/”, so you can’t delete it by accident or design, which would happen with a mistake like “rm -rf /home/user /” (note the extra space before the final “/”). You have to override that with –no-preserve-root for it to work.

Maybe, but in his original post he talked about centos 7 distro, so most likely a GNU rm with preserve-root param as default. Also he told he was using Ansible so if he was using the “file” module, it prevents you do this kind of disaster otherwise if he had used the exec module with the gnu rm, the centos package of rm have the build in –preserve-root option as default.

For your reference, I’ve just read different Italian newspapers about the case, it is supposed to be a mass marketing campaign to promote his new company…

Nice story, but it’s probably time to take it down and replace it with one about blindly believing everything you read online. When the story started circulating yesterday, I was surprised there was no mention of the business name, and no sign of ex-customers bleating about their sites being wiped. So this morning I wasn’t surprised to see that “bleemboy” had added some even more implausible claims to the Server Fault thread and been exposed as a troll.

It’s the *story* that’s the story, that’s the point :-)

(The link in the article will reveal what happened since we published this.)

Hands up the person who has never made a mistake with “rm,” or who has always got a Round Tuit (no lapses, ever!) when it comes to backup.

“Hands up the person who has never made a mistake with ‘rm,'” *puts hand up*

“who has always got a Round Tuit (no lapses, ever!) when it comes to backup.” *puts hand down*

Drat. XD

Are you sure this actually happened? The question has since been taken down from ServerFault “for reasons of moderation”. Are there any other sources other than the ServerFault post, and the person wasn’t just trolling?

I once read of a company where they lost all their files on a Unix system. They thought they had been hacked (even sent out a helicopter to fly the IT manager back from a white water rafting holiday), but couldn’t find where. So they restored all their files. Exactly one month later it happened again. It turns out they had a script to remove the home director of expired accounts. Unfortunately one such account’s home directory was /.

I did this while moonlighting at a customer’s site. It was a well respected London wine merchant and I was ‘cleaning up’ their Sco Unix system (accounts and everything) and putting some backup and maintenance scripts in place. Well, I think a space crept in to the ‘rm -rf /tmp/*’ and when I tested it, it deleted almost everything. “Um, do you have last night’s backup tape?”, I asked innocently…..

I’m sure this wasn’t your intent, but using the username you listed in the article I was able to find the sysadmin’s social media details (which in turn display his real name and employer) through his Server Fault account.

I’ll be careful to do a little check if I need to use rm in a script from now on.

if [[ -z "$2" ]]
then echo "Better check your variables."
else echo "The variables are $1 and $2"
fi

This is another example of why not to log into your PC as an admin account. Had he been logged in as a standard user and admin apps run-as, the impact would have been limited.

It’s “annoying” and “inconvenient”, but donut anyway.

Even if the story is apocryphal, it has elicited some lively discussion and reminds many of us of apprenticeship in IT, as well as not to just enter a command and . If you are not sure of what the command does, don’t use it, do a search if man doesn’t tell you.

Well, the story *as told here* is not apocryphal :-)

We wrote that M posted to a forum that he had rm -rfed everything,”and then P, Q and R gave various following honest but uncompromising answers upon reading his claim.

Whether M’s posting was true or not, it is what he posted, and the replies are indeed what came back. (I didn’t take a screenshot, but I did copy the replies from the posting itself while it was still live.)

I read somewhere a suggestion that this may have been a marketing stunt that was supposed to end with big reveal that, of course, the problem didn’t really happen because the OP was using XYZ backup product, please buy now. If so, I guess the OP picked the wrong forum for a “joke” of that sort :-)

(FWIW, the Bash syntax in the original post was wrong, as well as the missing “–no-preserve-root” that would likely have been needed, assuming Linux. He mentioned “rm -rf {foo}/{bar}”, which really ought to have been “rm -rf $foo/$bar” or “rm -rf ${foo}/${bar}”. And, who knows, perhaps even “about 1535 customers” – where you and I would say “about 1500” or “exactly 1535” – was a hint of it being made up, too.)

As you say, the lessons are the same even if this instance was just story-telling…

You should update your story. Turns out my suspicions were correct, and this guy was just a troll. The ServerFault question you linked to has been edited to indicate that.

I think that in British English we’d call it a “wind-up”. Trolling is something altogether more sinister aimed at freaking someone out or offending them. (Making yourself look foolish doesnt count.)

Interestingly, the ServerFault page was taken down. Now it’s been restored but there’s a link to say it was a hoax. (Which is a bit more innocent than trolling :-)

But interestingly a story has now broken that 123-Reg has accidentally deleted a tranche of web sites due to a “rogue script” and that they can’t restore them from backup…

Nice article Paul. You left out one thing: We can do it to ourselves, too. I have a 2008 iMac. I encrypted the hard drive. I backed up everything diligently. Everything is encrypted. [editor note: we know already!] You guessed it – the video card or power supply gave out, can’t boot up and now I’m using my back-up Windows laptop. I will have to beg a friend for a Mac to boot from my old hard drive, if it will work, [ed: It will cost an expensive meal or two] or buy a new iMac and boot from it, if that will work. Same cost as ransomware.

I keep a bootable OS X USB key handy in case I need to get stuff off an encrypted CoreStorage HFS+ volume. Not sure I’d want to get the SSD out of my current MacBook, though. I imagine it’s kind of cramped in there…and only a USB-C port for a wired connection:-)

Er…. You should know that this tale of woe was a hoax. Never happened. Apparently told by a hosting service.

Well how many unreported cases go by with people that have deleted large portions of company files and couldn’t restore them? It does happen by user & it does happen by I.T. It sometimes happens repeatedly and users have to be tied down access wise to an insane degree because they aren’t paying attention.

The larger point is to ensure that getting your stuff backed up whether at home or work is essential because you COULD be that person….

Comments are closed.

Subscribe to get the latest updates in your inbox.
Which categories are you interested in?