It’s the time-saving technique employed by many coders in a hurry – copy and paste snippets of code from crowd-sourcing ‘Q&A’ websites and forums to solve tedious or difficult programming problems.
One of the most popular sites for this is Stack Overflow, and most of the time it works out fine.
But what if some of that code introduces bugs that might compromise the security of the software it ends up being used inside?
The tricky bit, as a new study called An Empirical Study of C++ Vulnerabilities in Crowd-Sourced Code Examples, is working out which code is OK and which isn’t.
After analysing real code from Stack Overflow, the researchers found a small but still significant number of examples where this happened over a 10-year period to 2018.
The team reviewed 72,483 C++ code snippets for weaknesses defined by the industry Common Weakness Enumeration (CWE) guidelines, finding 69 representing 29 different types of security flaw, most often CWE-150 (‘Improper neutralization of space, meta, or control sequence’).
This sounds like a small percentage, but those 69 vulnerable snippets found their way into a total of 2,859 projects on the Microsoft-owned software development platform, GitHub.
The idea that vulnerable code might be floating around on sites such as Stack Overflow is hardly a revelation, although this is apparently the first study that has looked closely at C++, a language that remains widely used for specialised programming tasks.
Bad snippets
One issue the researchers don’t address is whether Q&A code sharing is as good an idea as some assume it to be.
Because most developers are unlikely to ditch the advantages of code sharing because of a few bad snippets, the researchers’ answer is a new class of tools to assess its quality.
This should arrive soon in the form of a Chrome extension which can be used to check copied code against the team’s database of vulnerable code:
The extension then recommends non-vulnerable similar code snippets from other Stack Overflow posts, so that the developer can reuse those safe code snippets instead of the vulnerable code snippet.
Interestingly, when the researchers gave 117 of the affected GitHub project owners the bad news about their use of borrowed code, only 15 responded.
Of those who did, several either refused to fix the issue or offered excuses as to why a vulnerability might not be as risky as it appeared.
This suggests that for some coders, bad or insecure code is either too small a problem to be worth fussing about or an acceptable downside of meeting deadlines.
And once it’s inside software, it’s someone else’s problem.
Mark Stockley
I’d be fascinated to see a study that compared cutting and pasting code – which carries the small risk of copying and pasting security errors – with coders who don’t copy and paste code and solve unfamiliar problems by always writing their own solution.
Solving the problem yourself carries three possible penalties: #1 it will take longer; #2 there is a good chance that the code (which solves a problem the developer is not well versed in solving) will make it in to the world without ever being reviewed by another person; and #3 if it does go unreviewed, there is a good chance the developer will never learn of their mistake and could well end up repeating it, even cutting and pasting it, into other projects.
Stackoverflow’s peer review system isn’t perfect but it works far, far better than no peer review, in my opinion. I suspect cutting and pasting errors is the lesser of two evils.
Anonymous
So, why not ban answering questions on StackOverflow for real languages (no VM, no GC), especially C/C++. And any code in Rust, Swift or C# that contains the “unsafe” keyword. And make the answer in those cases either:
1. Be from one of the most experienced C/C++ developers and reviewed by 2 other experienced ones
2. Must be reviewed by at least 3 of the most experienced C/C++ developers
And the answer should be invisible even for who asked the question, until review happens.
Of course, virtual languages (Java/Kotlin/Scala/Dart/JavaScript/and even Go because it has GC) shouldn’t have this restriction, BUT any unsafe code that may happen (mainly due to improper multithreading), should be vetoable by any experienced C/C++ developer.
Paul Ducklin
The fact that a language runtime has garbage collection helps to prevent memory management errors but doesn’t stop you writing poor code, processing data badly, misinterpreting input, handling security validation badly, producing flawed crypto, leaking secrets, enabling side-channel attacks and much, much more.
Bryan
Limiting answers to such an extent would greatly deflate the function and purpose of Stack, both from a “mission statement” perspective as well as a “business plan” one. People would not reference it as often, which would surely affect the advertising (which is always topical and has never bothered me on Stack).
There are a lot of highly knowledgeable contributors with altruistic intent. While they are surely prone to mistakes, they also serve as informal code review for one another. I’m with Mark–SE is imperfect but absolutely a net gain for the coding community.
Bryan
Someone downvoted you, Anonymous.
:,(
But this is a well-proposed question so I evened the odds. The only way problems are solved is asking questions.
:,)
Simon McAllister
Were any of the discovered weaknesses only known (or classed as) weaknesses after the code was published? As in, perhaps some that were discovered were not intended at the time of publishing, but were later found – much like bugs that get patched on most of the software we use today.
And am I right to think that anyone who discovers, or suspects dodgy code buried in these platforms, cannot flag them if the comments sections are closed?
Bryan
I’ve wondered about this each time I borrow from SE and try to be mindful of how sabotage or simple oversight could impact what I’m doing with my subpoenaed snippets.
Fortunately, they typically go into back-end shell scripts which (generally) carry fewer security implications with no public interface on single-login-user servers.
Generally.
:,(
Raylund
It depends on the coder on how to handle the codes. I think the proper way is always review as much codes as possible related to the subject matter, and then know the reasoning behind and do your own modification.
It’s very danger to “copy-n-paste” without knowing what actually the codes are doing. The best is to understand the method of thinking too.
Paul Ducklin
+1
It’s dangerous to copy and paste almost anything – blind trust is the friend of fake news, internet hoaxes… and ill-considered code.