What a difference a cool name makes!
A security research paper entitled Putting control characters into one-dimensional barcodes to trip up sloppily coded apps probably wouldn’t grab your attention.
But BadBarcode would, so that’s what a Chinese security researcher who goes by the name Hyperchem Ma called his paper at the recent PacSec 2015 conference.
The paper has received a lot of publicity, including some dramatic headlines like “Poisoned barcodes can be used to take over systems” and “Customized barcodes can hack computers”.
So we thought we’d take a look at what BadBarcode really is, so you can decide how dangerous the problem is likely to be.
A one-dimensional study
Ma looked only at so-called one-dimensional, or 1D, barcodes, which are the ones you typically find on products on a supermarket shelf.
The barcode runs in a single line, printed from left-to-right, though thanks to the arrangement of the stripes, it can be read upside down.
There are two main sorts of 1D barcode, known as Code 39 and Code 128.
The names are a curious mix of history and peculiarity.
Code 39 can represent 43 different characters these days, but it was originally limited to 40 symbols, with one reserved as a start/stop marker, and so the number 39 (40 minus 1) stuck in the name
Code 128, curiously, can represent 108 symbols, only 103 of which are actual data characters, but it has 3 control symbols that choose which data bytes are represented by each of the 103 encodings.
You can mix control symbols and data symbols inside a barcode, with the control symbols acting a bit like the Caps Lock key on a keyboard to toggle between different parts of the ASCII character set.
In short: Code 128 can represent all 128 characters in the 7-bit ASCII set, including characters like Ctrl-C, Ctrl-M (Carriage Return) and Ctrl-[ (Escape).
The barcode “keyboard”
The reason this matters is that most barcode readers are implemented as plug-and-play keyboards, just like old-school credit card magstripe readers.
That way, you can read barcodes into your app simply by reading from the keyboard, as you would if the operator typed in the characters printed underneath the barcode.
Now, imagine that your app expects Code 39 barcodes: you might well assume that the input from the pseudo-keyboard barcode reader will only ever include A-Z, 0-9, space and one of -$%./+.
So, even if your app is written using a programming library that processes, say, Ctrl-O as a shortcut to open a file dialog, or Ctrl-R to run a new program, and so on, you might assume that you don’t have to worry about those special characters turning up in a maliciously-generated barcode.
Code 39 doesn’t support those characters, so they can’t show up.
So you might be inclined to trust the input from the barcode implicitly, for example when a user wants to scan an item at one of the price check stations that many supermarkets provide.
But if a crooked customer shows up with a Code 128 barcode that reads something like…
[Ctrl-R]CMD.EXE[Enter]DEL /Y /S C:\*.*[Enter]
…then many barcode readers will nevertheless recognise it as a valid barcode, choose the right decoding algorithm, and return the characters anyway.
As a result, your app might wander into trouble.
Validate your input
To work around that, you’re probably thinking that validating your input is a good idea, and you’d be right.
In other words, you accept a line of input from the barcode scanner but check through it first for anything out of place.
If you’re expecting digits only, for example, then when letters, punctuations or control characters appear, you can trigger an error and refuse the input, instead of going ahead with something definitely unexpected and potentially dangerous.
However, that might not be enough on its own, because the operating system itself – or at least what’s called the window manager – might detect and act on some special characters immediately, before your input validation algorithm is even called.
Window managers are needed when several apps share the keyboard and screen, to make sure that the right apps send and receive the right content, and to deal with special keystrokes that should be consumed directly by the window manager itself, such as Alt-TAB on Windows.
So if you want to protect your barcode-reading app from unusual, unexpected or even malicious “keystroke” data inside a barcode, you also need to familiarise yourself with the low-level programming functions that allow you get the first look at every keystroke, even before the window manager gets its chance.
On Windows, for example, the function SetWindowsHookEx() is your friend.
With this function, you can instruct Windows to call a special procedure inside your app, known as a LowLevelKeyboardProcHook, giving you first look at the keystroke that’s coming next, and allowing you to process it (or ignore it, or change it) before anyone else gets a chance.
That way, you can improve the safety and security of programs that need to accept input from untrusted outsiders, yet are forced by the available hardware to consume that input as if the potential attacker were typing away at a keyboard.
By the way, there’s a whole slew of 2D barcodes as well, such as Data Matrix, PDF417 and – perhaps the best known sort – QR codes.
The 2D barcodes typically let you store much more data in the same space, so are increasingly widely used – and increasingly widely supported by barcode readers.
For all you know, your Code 39-based app, programmed to assume digits only, might some day be confronted by hundreds of bytes of data from a QR code, simply because you can’t control what an untrusted outsider might hold up to the reader.
What to do?
Briefly put:
- Always validate input before using it.
- Always understand how untrusted input might affect the underlying operating system before you see it.
- Assume that specialised input devices (e.g. barcode scanners) can be made to behave like general-purpose ones (e.g. keyboards).
- Expect the unexpected.