Researchers at Kaspersky recently outed a bug in the popular Telegram instant messaging service.
Crooks had revived an old visual trick to disguise files that many users would otherwise recognise as unwanted right off the bat.
The flaw has been addressed by Telegram, so we’re OK to describe in here in detail: it’s a trick that is as simple as it is effective, and involves conning the app into displaying filenames backwards.
Sometimes, of course, the old tricks are the very best – ransomware first appeared in 1989, for instance; spam first showed up in the 1970s; and self-spreading network worms were already a significant problem in the 1980s.
Whether you’re a user or a programmer, it pays to be aware of the optical illusions that are available to the many cybercrooks out there.
The flaw we’ll be talking about in this article – which sort of isn’t a bug in theory, but can be abused as a bug in practice – comes about because not all languages write in the same direction.
English and French, for example, run left-to-right, top-to-bottom; Hebrew and Arabic run right-to-left, top-to-bottom.
Often, for example when printing a book, the text direction isn’t too much of a challenge because it’s consistent throughout.
But in a modern app in the modern world on a modern operating system, you often want to mix and match character sets, languages, writing styles and more.
For example, if I type into my favourite Mac editor the English word HELLO
followed by the Hebrew word SHALOM
, I type in the characters in the order they appear above when written out left-to-right, English style: H
+ E
+ L
+ L
+ O
+ Shin
+ Lamed
+ Vav
+ Mem
.
Indeed, If I save the file as a 2-byte-per-character Unicode file (UTF-16), I get the characters in the order they were typed, shown left-to-right as is conventional in a hex editor:
But my editor “knows” (or, more precisely, the Unicode character set “knows”) that English and Hebrew are supposed to be displayed in different directions, so what I see when I open the file is:
Sometimes, however, leaving text direction to be determined algorithmically doesn’t give you the typographic result you were after, so Unicode provides some special characters (ones that don’t actually display, they merely control) for text direction, including LEFT-TO-RIGHT OVERRIDE
(LRO) and RIGHT-TO-LEFT OVERRIDE
(RLO).
If we insert these at the start of our HELLO SHALOM
greeting, the LRO forces the Hebrew to come out backwards (this messes up the nikkud, or diacritical markings, so it doesn’t render simply as if the text were reversed), while the RLO will reverse the English text, moving it to what is the end of the line in English orthography:
You can see where this is going when it comes to filenames.
A RLO character is simply part of the name of the file when the operating system decides what to do with the file, but it is part of the instructions about how to display the name when the operating system decides how to represent it.
Sadly, as far as computer security is concerned, what actually happens when you click on the file is what matters, not what it looks like in a directory listing.
Software that displays filenames in order to ask you what to do with them needs to take care to prevent this difference from being exploited.
In the case documented by Kaspersky, the crooks used a filename that was processed by Windows as…
photo_hi_re♦gnp.js
…which is, as you can tell from the .JS
extension, a standalone JavaScript program. (We used the diamond shape to denote the position of the RLO character U+202E.)
Detached from your browser, JavaScript programs are essentially as unconstrained as those written in C, C++, C#, Assembler or any other traditional application software development language.
In other words, JavaScript programs you receive via email or IM, or download from the web, are often malware – anything from self-contained attacks like the RAA ransomware, written in 100% JavaScript, to downloader modules that go online and fetch yet more malware to take over your computer.
But the file was displayed for download and launch as…
photo_hi_resj.png
…which looks like an innocent image file. (The RLO character doesn’t take up space in the output, because it’s a control code, not a display glyph.)
There are many other combinations that could be used to disguise the true identity of the download, such as -1SP.MP4
, which reverses to -4PM.PS1
(a Powershell file) and -TAB.JPG
, which reverses to -GPJ.BAT
(another sort of Windows script).
Even a rather obvious executable file (a regular program) can be made to look innocent if the entire filename is reversed, pushing the suspicious ending .EXE to the front, where it looks like a prefix:
Filename for opening and launching: ♦fdp.61-10-8102-NOISICED.DRAOB.EVITUC.EXE Filename when rendered for display: EXE.CUTIVE.BOARD.DECISION-2018-01-16.pdf
What to do?
Technically, you can argue that this sort of trick isn’t really a bug or an exploit: the RLO character is supposed to flip around English text, so the filename ♦TAB.BET
is supposed to look like TEB.BAT
, and that’s that.
With this mind, our advice is:
- Don’t use what you see as your sole judge of what you’ve got.
In the Telegram malware attack documented by Kaspersky, the pseudo-PNG file provoked a JavaScript security warning dialog from Windows, which processed the file by what it was, not what it seemed.
Assume the worst: if the name looks dodgy but the operating system doesn’t seem to mind, assume it’s dodgy.
If the name looks fine but the operating system says it might be dodgy, assume it’s dodgy.
- If you’re a sysadmin, block by both form and function.
For example, the Sophos Email Appliance lets you block attachments by extension (extracted from the name), by true file type (determined by looking inside the file), or both – and we recommend using both, even if it sounds redundant.
After all, if you want to keep out (say) .MP3 audio files, it makes sense to block files that go out of their way to look like audio files, even if they aren’t, as well as to block files that go out of their way to look like they aren’t audio files, even when they are.
- If you’re a programmer, be aware of known display-related tricks.
RLO characters are permitted in filenames, but in real life, this is a detail that both you and your users can do without.
A similar sort of problem exists with character sets (not just character direction), where letters in Russian or Greek can be used to trick you into thinking you’re looking at, say, facebook when you’re really seeing f𝝰cebook or facεbook.
There’s no law against writing English to the left, Hebrew to the right, or representing Greek words with Cyrillic letters – but if there are no good reasons for doing it, why let it through?