Archive for February, 2008»
Whole-Disk Encryption Cracked
Early this week, some researchers at Princeton University’s Center for Information Technology Policy released a fascinating video of whole-disk encryption being cracked quite quickly and easily.
Whole-disk encryption products — such as PGP Whole Disk Encryption, TrueCrypt System Encryption, and Windows Vista’s BitLocker — work by encrypting the entire hard disk with a symmetric key, save for a small loader. When the computer is powered on, the loader prompts the user for a password or other authenticator (like a smart card or a certificate on a USB keyfob), which is used to decrypt the key. Assuming the correct authenticator is provided, the key is decrypted and then the OS is booted from the encrypted drive. The key remains in memory until the machine is powered off, since continuous access to the key is required to access the drive.
The purpose of whole-disk encryption is to protect against an attacker bypassing all of the operating system’s defenses (logins & passwords, filesystem ACLs, etc.) by simply pulling out the hard disk and putting it in another computer (or, equivalently, booting up a LiveCD on the system) such that the operating system isn’t loaded at all. Instead, the drive is mounted into an OS the attacker controls, where he has the ability to change ACLs, bypass logins, etc. With whole-disk encryption, you can’t do this — even if you steal a laptop, without the boot password the entire drive contains nothing but a useless encrypted bitstream.
(As a side note, Vista BitLocker has a mode in which the symmetric key is stored in the TPM of the laptop, so no boot password is required. At first this seems useless — why encrypt if decryption is automatic? — but it does provide protection against simply stealing the hard disk or booting into another OS. The OS being booted must be in that specific computer, as only it has the TPM, and must be BitLocker-aware and capable of getting the key from the TPM. It’s not completely secure in the stolen-laptop scenario, but neither is it useless.)
The Princeton group’s attack on whole-disk encryption relies on a little-known fact — computer memory (DRAM) is not wiped out when the system is powered off. Rather, it becomes unreliable, decaying over a period of seconds to minutes as it gets randomized bit by bit. It turns out that if cooled to a very low temperature, this decay is slowed considerably, to the point of being stable for tens of minutes. Thus, the attack is as follows: get access to a laptop that is currently operating (so that the whole-disk encryption key is in memory), spray the RAM with an inverted compressed air can to cool it to -50 degrees Celsius, and power the system off. Either move the RAM to a system with a custom OS, or attach an external drive to the system and boot off that. The custom OS boots with a minimal memory footprint and then copies everything from RAM to a file on disk. Thus, in less than a minute a “snapshot” of RAM has been taken. This snapshot can then be inspected to locate prospective cryptographic keys and try them on the target drive. Some knowledge of the particular whole-disk encryption product being used would be needed to find the exact spot in memory where the key is, and some error-correction techniques must be used in case a bit or two has been flipped, but it reduces the problem from cryptographically impossible to something that can be cracked in a few minutes or at worst hours.
So is this the end of whole-disk encryption? No, not at all. First of all, whole-disk encryption still successfully protects computers that are powered off (or in hibernation) — in that state, the computer does not have a copy of the encryption key available to it until the user re-enters his password. In most stolen-laptop scenarios, the computer isn’t running at the time! Whole-disk encryption is still a critical mitigation in the case of portable computers containing confidential data, and enterprises and government agencies would do well to implement it. Of course, the best mitigation for this is to not carry confidential data around on your laptop. It always strikes me as absurd when some government employee loses millions of confidential records on a stolen laptop — why did they need to have millions of records to carry around with them? Do they really need all of those on-the-go? It’s possible that in a minority of cases they do, and in those cases encryption is imperative (either of the whole-disk variety or on the file), but in most cases they’d have been better off leaving those files at the office.
Second, this is only a concern in targeted attacks. If a typical thief rips off your laptop and discovers whole-disk encryption in place, they’re not going to execute this attack and get at your data. Instead, they’ll just reformat the hard drive and sell the laptop as hardware. The only reason someone would carry out this attack is if they knew that your laptop in particular contained valuable data and thus set out to steal it specifically. In other words, if you’re a spy, and your laptop is classified TOP SECRET UMBRA, you have to worry about this attack. If you have a typical corporate desktop and aren’t widely known to carry around your company’s entire credit card database, whole-disk encryption will probably protect you just fine.
There are several things that can be done, both by end-users and whole-disk encryption vendors, to mitigate this attack. For end-users:
- If using Vista BitLocker, do not use the automatic mode — choose a mode that requires the use of a USB keyfob or a password to unlock. This makes this attack ineffective when the system is entirely powered off.
- Do not use sleep/suspend-to-RAM when the computer is not actually in your hands — either power off or use hibernate. In a sleep or suspend-to-RAM scenario, the whole-disk encryption key is still maintained in memory and can be recovered.
- If you have a few truly critical files, use file encryption (such as Windows’s Encrypted File System or PGP’s file encryption) on those files with a different password than that used on the whole-disk encryption.
For makers of whole-disk encryption software:
- Provide an option to re-encrypt the symmetric key during sleep or screen-saver activity. This would mean the the laptop would need to be stolen during a truly active state; however, it would also inconvenience the user with more frequent password prompts.
- Consider the cryptographic key expansion mitigation described in the Princeton research paper. It vastly increases the chances of even a small amount of decay of memory rendering the key unrecoverable. Of course, it does so at the cost of performance (by requiring an additional hashing and XOR operation every time the key must be used.)
Deterring the Internal Attacker
On January 21st, 2008, the major French bank Société Générale lost $7.09 billion attempting to unwind unauthorized trading positions taken by Jérôme Kerviel, a futures trader with the bank. Kerviel had taken positions worth $73.3 billion, far above not only his trading limits but the bank’s entire market capitalization. The loss taken by unwinding the positions during a declining stock market was the largest rogue trader loss in history, dwarfing the $1.4 billion loss by Nick Leeson that collapsed the venerable Barings Bank in 1992.
For all that we in the security industry picture threats coming at our companies from without, sometimes the greatest threats lie within. No external hacker has ever done the kind of damage that rogue insiders like Kerviel and Leeson are capable of, yet we focus on putting firewalls around our companies, rooting out worms and viruses, and securing our websites. While these are undoubtedly important, it is equally important to protect against internal adversaries — and often much more difficult.
The Problem of Trust
Companies must trust their employees — without the employees, there is no company. Accountants and traders are trusted with financial records, system administrators and information security personnel are trusted with access to critical files, physical and cleaning personnel are trusted with physical access to the facilities, and managers are trusted with company secrets, strategy, and intentions.
IT employees and developers are specialists. As systems increase in complexity, those trusted with building and maintaining those systems are required to obtain knowledge further and further from most people’s understanding. Often, knowledge of how to build and maintain these systems also involves the knowledge of how to subvert them. IT engineers and developers know how their systems break down — they know their weak points, where they’re being watched and monitored, and where no one is looking. This problem isn’t unique to information technology — an aircraft mechanic probably knows how to sabotage a plane without leaving a trace, and members of police and military bomb squads are experts on explosives and what cannot be detected or tracked. And as recent news has demonstrated, traders in brokerages and banks know how the internal controls of their corporations work, and where they break down. Internal attackers are thus the most dangerous of all — they are already equipped with the kind of domain knowledge that an external attacker might need to spend weeks or months gathering.
Although we cannot entirely abandon trust in a company’s employees, we should consider where this trust comes from and whether or not it is warranted. Many companies sharply divide the level of trust and privilege given to employees vs. that given to contractors and vendors within their IT and development departments. The theory is that employees are allied to the company for the long term, and compensated with long-term benefits like retirement plans and vacation time that they will be unwilling to risk for short-term gain while vendors and contractors have less loyalty since they come and go as needed. However, in today’s IT world, is this really the case? I do not doubt that the contractors feel little loyalty for the company, but it is increasingly doubtful that the employees do, either. The average IT employee’s tenure at a corporation is now under 18 months — and thus they place little value on long-term benefits. Books like Corporate Confidential advise employees to view their employment relationship as, if not outright adversarial, at least mutually exploitative, to be dropped by either party as soon as it becomes in their interest to do so. Employees see that corporations no longer feel loyalty to them — the days of the job for life are over — and so loyalty to the corporation has gone as well.
Of course, lacking a strong sense of corporate loyalty does not lead most employees to embark on rogue-trading schemes, steal from their employers, commit electronic sabotage, etc. And even in the 1950s heyday of the organization man and the corporate family, some people took advantage of their employees and ran off with stolen fortunes. Some people are thieves and will steal given the opportunity no matter how well-treated they may be. Others are incorruptible, bound by their own moral code that would prevent them from stealing regardless of opportunity. The vast bulk of humanity, though, is somewhere in between.
These employees are not likely to become attackers, and trusting them is a necessary part of doing business. However, this trust need not be absolute — we can trust, but verify. While we may not be able to prevent every internal attack, we can deter them, and make them less likely to occur. Steps can be taken to help keep most people honest, reducing both the incentive and the opportunity for theft.
Building Employee Loyalty
The days of the job for life and absolute loyalty to the corporation are probably over for good, inasmuch as they ever existed at all. However, the fact remains that internal attacks, particularly those not motivated by theft but rather simple vandalism, are much more likely to be carried out by disgruntled and angry employees than by content ones.
IT employees and developers are sometimes a strange breed — the sort of person that chooses to spend their time with technology is often different from the sort of person who chooses to be a manager. So if it’s not a good retirement plan, an increase in vacation time after 5 years, and a promise of stability and long-term employment, what does build loyalty and goodwill with technical employees?
(Of course, any generalization about a type of person is going to be more accurate for some people than others, but I’ve found these to be useful rules of thumb for dealing with technology employees.)
- Autonomy. Figuring out how to do things is precisely what geeks enjoy about work — tell them what you want, not how you want them to achieve it.
- Isolation. Technologists, all told, are not very social. They do their best work when left alone. The cubicle is a horrible environment, giving you all the obstacles to collaboration of offices but without any of the privacy. Offices are ideal for most development work, the only exception being the early stages wherein there’s a lot more collaboration and brainstorming than actual coding.
- Technology. People who love technology want to be on the cutting edge. Using current technology makes them more invested in their jobs. In addition, it’s worth investing in the technology they’ll use every day — their workstations, displays, etc.
These are important, but will not, of course, make every employee perfectly happy. There are some things that technical employees have no patience for at all:
- Arbitrary or emotionally-driven decisions. Using an older, inferior, or simply less appropriate tool (e.g. programming language, web framework) because “the boss likes it,” “we’ve always done it that way,” or “we’re a [insert product here] shop” really upsets them. They need real-world reasons for using a technology, like technical benefits or budgetary limitations.
- Anything perceived as unfair. If employees feel they’re paid less than market value, or that someone else who’s not as good as they are is paid more, this breeds resentment. Trying to keep salaries secret helps not at all — today’s employees, especially younger ones, for the most part don’t understand why salaries should be kept secret, and thus will totally disregard any order to do so. It is important that technical employees see cause and effect in review processes, compensation, etc. and have a clear idea of how their performance is being assessed.
- Internal politics. Engineers want to get things done, and they don’t care overmuch who does them — when engineering solutions, they’ll totally ignore interdepartmental boundaries. Having to worry about some manager’s fiefdom-building gets in the way of technical work and is relatively incomprehensible to them.
When it comes to performance management, technical employees need to be told, directly and clearly, how they’re doing and what needs improvement (if anything. ) Not being people-oriented, they often can’t read you. They don’t know if you’re happy with them or not unless you tell them, and they’re certainly not going to ask. While they deal extremely well with technical ambiguity — they love to solve problems, so an incoherent mess from a technical perspective is just a challenge to overcome — they don’t deal well at all with ambiguity in other contexts. Clear expectations and consistent feedback make their job simply another a problem to be solved, which makes it much more satisfying to them. Without this feedback,
For many managers, these may seem like obvious guidelines — but they’re often problems in companies, particularly in IT and development departments of nontechnical companies. These factors mean a lot to many technical employees — often a lot more than traditional compensation. The best prevention against malicious insiders is to keep the insiders from becoming malicious in the first place by ensuring that the company earns their trust and respect.
Reducing Opportunity for Attack
Unfortunately, no matter what your company does, some people aren’t going to love their jobs. In addition, presented with the opportunity to steal, people are going to be tempted — and the greater the opportunity, the greater the temptation. Thus, it is important to reduce the opportunity for theft.
The traditional information security controls are often useless against insiders. The firewall provides no protection at all against someone already inside. Anti-virus and anti-malware systems matter not at all to someone who doesn’t need to gain access to a PC on the network, as they already have access legitimately. Network access controls are impotent against the domain administrator, who has the authority to alter access control lists at will. Obfuscation and hiding secret data provides no defense against the developer tasked with performing the obfuscation and hiding.
Fundamentally, a system designed to provide security always involves an implied question — secure from what? The vault door in a bank secures against burglars coming in in the night — not against the bank manager turning rogue. Alarms secure against armed robbers, not against tellers sneaking cash out of the drawer. Security cameras watch the tellers, but do no good against computer hackers or fraudsters. Reducing the opportunity for insiders to attack the company means considering how insiders differ from outsiders, and what security measures may be employed against them.
The primary advantages of an insider are twofold: knowledge and authorization. They have knowledge of the defenses — Jérôme Kerviel had worked in Société Générale’s internal audit and control department, so he knew exactly how they searched for and detected rogue trades. And they have authorization in that an internal attack often does not involve any sort of elevation of privilege — only an employee misusing their legitimate authority. Even the right to be inside the building, rather than having to break in through a firewall, is a measure of authority an outsider lacks.
However, insiders also have a disadvantage as compared to outsiders: proximity. It is often much easier to verify a suspicion that someone has committed a crime than it is to find the culprit to begin with. As is often depicted in crime dramas and classic mystery plots, investigators have a much easier time finding out who committed a crime when they have specific suspects to question and investigate than when a crime is committed by a random stranger with no known connection with the victim. Fingerprints and DNA evidence do little good if you have no suspect to compare them to. The same goes for electronic forensics — a hacker will often leave plenty of evidence of their activity on their own computer, and a monitoring device at their ISP would likely detect their activities. However, if the hacker is external, or even in a foreign country, as a security professional you’re unlikely to have any idea where their computer is, let alone have access to it. When an insider attacks, on the other hand, the traces can be very obvious. Attacks come from IPs within your perimeter, and your own monitoring equipment might have seen the entire attack end-to-end. The simple fact that there are only so many people inside the company capable of mounting an electronic attack limits the suspects and allows each to be investigated.
Smart insiders know this. While an outsider may believe he is able to hide from detection simply by being a needle in a haystack (how many companies really inspect all their edge firewall logs, even with an automated process?), an insider knows that he’s under observation and has a substantial chance of getting caught. Thus, he will almost always take steps to cover their tracks — steps an outsider would take, too, but the insider has the advantage of legitimate authorization to bolster his abilities.
Deterring internal attackers, then, involves neutralizing their advantages while maximizing their disadvantages. There is little to be done about their first advantage (knowledge of internal procedures), but actions can be taken to mitigate the power of legitimate authorization and to maximize the disadvantage of proximity.
Preventing Abuse of Legitimate Authority
Developers can modify the source code of your product — that’s what developers do. System administrators can change permissions on files and access secured areas — that’s their job. However, no one person should have the ability to do everything — this is the principle behind separation of duties.
Separation of duties enables legitimate tasks to be carried out while making it more difficult for these same powers to be abused. There are three basic controls that can be placed on a power to help prevent abuse:
- Authorization: determines if a person has the right to perform a task
- Recording: keeps a record of when, how, and by whom the task has been performed
- Custody: actually carries out the task
For example, imagine your company needs to deploy new code to a server in a datacenter. The person responsible for the authorization function sets the access control policies on the various machines to determine who has access. The person or system responsible for the recording function makes entries in change-control logs so that it is clear what has been done. The person with custody of the system actually places the new files on the server. In a small company — or one with poor internal controls — these could all be the same person.
If these tasks are all handled by the same person, the potential for abuse is very high. If this person wants to propagate malicious code to the servers that monitors transactions or even steals money from accounts, he can do so. He can authorize himself or another (possibly even a fake account) to make any change desired, carry out the task, and then erase or suspend the logs or records of not only the action but also the authorization changes.
On the other hand, if separate people are responsible for each of these tasks, none of them is capable of perpetrating a fraud on their own. This process could be organized as follows:
- A product team or business owner is responsible for developing the system and determines who can modify the code.
- A division of the IT department is responsible for all audit logging throughout the environment, regardless of who owns the particular servers.
- An operations engineer is responsible for actually placing code on the servers; the developers never have access directly to the production datacenter.
This makes fraud much harder. A member of the product team can tamper with the code, but has no way to actually get it into the datacenter. An operations engineer can access the datacenter, but lacks access to the code. And either one making a change leaves a trail — since audit logging is controlled by another team within IT, neither are able to turn auditing off or simply overlook suspicious entries.
Maximizing the Chance of Detection
Separation of duties limits the ability of a person with legitimate authority to abuse it. However, the is another thing that can be done to those people with the ability to abuse their authority from actually doing so — cause them to believe they are likely to be caught. Internal attackers know what audit and logging systems are being used within an environment, and they know where the “blind spots” in those systems are. Many criminals commit a crime only when the opportunity presents itself. By eliminating failures in monitoring, we eliminate temptation as well as improving our forensic abilities.
Most of the systems used in a modern IT environment have extensive auditing capabilities. (Note that I am using the word “auditing” in the sense of creating an audit trail, not in the sense of some external consultant or accountant reviewing that trail.) Windows machines create an event log of almost everything that happens on them; in an ActiveDirectory domain, security events are also logged on the domain controller. UNIX/Linux/Solaris machines create various system logs, and have the ability to send them to remote machines as they occur. Databases like Oracle and SQL Server have fine-grained audit capabilities and are able to record every access to sensitive data and even detect potential data aggregation attacks. Web servers record every access, as do keycard-based entry control systems, VPN concentrators, firewalls, and a variety of network devices. An attacker, even an internal one, leaves a bewildering array of changes, alerts, and traces every time he does anything.
However, this does little good if no one notices the tracks! In addition, they are often ephemeral — a Windows Security Event Log will grow too large and begin overwriting itself in a matter of hours in a large corporation. If the logs are not available to investigate an incident, they might as well not exist at all.
One of the most powerful ways a company can prevent internal attacks is with the implementation of a Security Information and Event Management product. There are several of these on the market (I have experience implementing the SenSage event data warehouse, but ArcSight, Symantec, IntelliTactics, Computer Associates, and others have competing products,) but the idea behind all of them is to gather event data from a variety of sources and aggregate it in one place. This has two major advantages:
- The data is centrally managed by a separate custodian than the one that controls the various systems it came from, thus providing separation of duties. The system administrators of the systems creating the logs cannot tamper with the logs.
- Data from disparate sources is correlated together, thus detecting attacks in progress and tracing attacks back to their source during an investigation. Forensics is made easier and more effective.
Different SIEM systems have different advantages, and while all will provide separation of duties, some are better at handling massive data volumes than others. Likewise, the data mining involved in event correlation is still a black art in many cases, so different systems have different capabilities in that regard. However, just knowing that a SIEM exists, is monitored, and is out of reach for would-be fraudsters to tamper with can be a powerful deterrent against rogue employees.
Conclusion
The possibility of internal attacks is an unfortunate consequence of the specialization of modern society — those with the capability to build and maintain complex systems are often those best able to compromise and abuse them. However, good design of internal controls centered around separation of duties combined with judicious use of technical information-management solutions greatly reduces the opportunity for insiders to turn against a company’s infrastructure.
It wasn’t a good weekend for Linux.
The ultraportable ASUS Eee PC has seen quite a bit of publicity lately. With prices starting as low as $300, it’s about as cheap as laptops get, and runs on a solid-state drive instead of a hard disk. Of course, to get such a low price, it doesn’t ship with Windows on it — instead, it has a customized version of Xandros Linux using IceWM with a host of open-source applications, like OpenOffice, Firefox, etc. Xandros is a Debian derivative, so the apt package system can be used to get almost any popular Linux application.
Linux gets a lot of good press for being “secure”, by which the media usually means “free from viruses and spyware.” This is pretty much true, for the simple reason that it’s not worth anyone’s time to write a virus for Linux when the market share is so low. However, there’s a big difference between “free of malware” and “secure by default.” It turns out that the Xandros Linux on the Eee ships with Samba 3.0.24, which dates back to February ‘07. (Samba is the Linux version of the SMB protocol — it’s the package that lets Linux machines participate in Windows networks, both to be able to connect to Windows fileshares & to share files themselves.) Samba is, of course, installed and on by default — it wouldn’t be “easy to use” if you had to manually start Samba, would it?
Samba 3.0.24, unsurprisingly considering its age, has known critical security flaws. One of these is a remote root exploit published by RISE Security; the result of this is that any Eee PC can be remotely and silently compromised with a simple Metasploit plugin. If you’re on the Internet with an Eee, anyone can take remote control of your computer, access and change files, etc. You don’t need viruses and spyware when you have direct control.
If you do have an Eee, I suggest using apt to update Samba immediately. Assuming the Eee works like every other Debian derivative out there, a simple “sudo apt-get upgrade samba” ought to take care of the problem.
However, it gets worse. That vulnerability only affects people running an old version of Samba — it only gets attention because a brand-new PC is shipping with said old version of Samba. Also this last weekend, milw0rm released a local root exploit for all Linux kernels 2.6.17 through 2.6.24.1 (the current kernel.) This affects basically every Linux 2.6 system out there, as it affects kernels from June ‘06 through today. Since upgrading a kernel is somewhat of an ordeal (it requires taking the system down at the very least, and on many flavors of Linux involves some work besides; Ubuntu makes it quite easy if you’re using the default kernel, though), it’ll be months before many of these machines are upgraded.
It’s a local root exploit, so you have to be logged onto the machine to use it. Obviously, for Linux-based desktops and laptops that isn’t much of a concern; if someone’s sitting at your computer, they can take it over no matter what you do. However, where this gets scary is shared web hosting. Most small web sites are on shared servers; many (even hundreds) of sites on the same box. What’s more, a web hosting company may have all of their various servers trusting each other in such a way that having root on any one means having full control of all of them.
If you have a shell account on a Linux 2.6 box, full control is now as easy as pasting this code into a file on the machine, and typing
cc -static -Wno-format vmsplice-exploit.c
./a.out
Presto! Root shell. Most web hosts don’t give you shell anymore (unfortunate, in my opinion, and the main reason I’m on DreamHost), but that doesn’t matter — you could upload the source via FTP, along with a simple PHP page that builds it, runs it, and has it send you a shell. There are a lot of hosts on the Internet vulnerable to this right now. (Interestingly, DreamHost is not, as its servers are using the Linux 2.4 kernel instead of the 2.6 branch, and thus lack support for vmsplice.)
Unfortunately, I don’t have enough Linux kernel experience to know exactly what this exploit does to discuss it further — I’ve only done kernel-mode work on Windows, with my Linux coding being strictly in userland. However, vmsplice provides user-mode code with control over a kernel buffer, so any number of tiny bugs could have resulted in a catastrophic compromise (like this one.) Linux Torvalds has an email about splice() here, which does a great job explaining how splice() and vmsplice() can be used to move data around in a copy-free manner through a kernel buffer, but nothing much about why you would do such a thing.
So what to do about this one? There are three choices:
- If you’re running a Linux 2.4 kernel, or anything predating 2.6.17, you’re safe. Well, you’re safe from this; there are other security bugs in year-old kernels.
- Upgrade to a kernel post-2.6.24.1. If you happen to run a cutting-edge distribution like Gentoo, you can just sync the tree today, rebuild the kernel, and be good to go. And if you’re running Gentoo, you actually know how to do that. Debian Stable also has an apt package with a 2.6.18.dfsg.1-18etch1 kernel that’s safe.
- There are some workarounds (hacks, really) on this thread. Note that disabling vmsplice, while it will fix this vulnerability, means crippling a Linux syscall; while this syscall is used only rarely, if you do this and software does try to use vmsplice it may corrupt kernel memory. Thus, option #2 is much, much better; get an updated kernel for your distro that fixes the bug.
In my last post about finding a job in information security, when discussing application security, I off-handedly mentioned several mitigation technologies — GS, DEP, SAL, and ASLR. These are technologies developed by OS vendors to provide system-wide protection against common attacks, and are things every application developer should know about when dealing with native (unmanaged) code.
The scourge of C and C++ apps for the last decade and a half has been the stack buffer overflow. This is an attack wherein the attacker discovers that an application is trying to fit some piece of user input into a spot in memory without first checking to see if it will fit. In the most common scenario, the spot in memory is a local variable, which means that carefully-crafted input can overwrite the return pointer on the stack with a user-selected value. If this is done, when the function finishes it will transfer execution to the user-provided input, which can then take control of the running process and do anything that that process’s owner is capable of. If the process is an OS service, running with a privileged account like root on a UNIX/Linux system or Administrator/SYSTEM on a Windows system, it may be able to take full control of the system. I first learned this attack in Aleph One’s classic Phrack article, Smashing the Stack for Fun and Profit, written in 1996.
Application developers have been told for many years now to be very careful when allocating memory and copying data, especially strings, to prevent these exploits. However, it’s relatively difficult, so developers continue to make the same mistakes. In addition, the attackers get more creative, and have found variations on this attack that are even harder to avoid. Luckily, OS developers have also been busy trying to find global mitigations for these attacks, so that developers can’t make these mistakes, and the whole computing ecosystem becomes safer.
Stack Canaries
The first common OS-based mitigation technology is the stack canary. On Windows, this mitigation is activated via the /GS compiler option (for Guard Stack); Solaris also incorporates a similar mechanism called StackGhost, while the latest GCC compiler on Linux has a stack protection feature called PPC. Of the major OS’s currently in use, only Mac OS X is missing a stack canary feature.
Whenever a function is called, a stack frame is created in memory for the function call. The stack frame is arranged as follows:
| Local Variables | Saved EBP | Saved EIP | Arguments |
Each portion of the frame is just large enough for its contents. EIP is the instruction pointer — whatever EIP points to, the processor executes. The Saved EIP is the return pointer — when the function returns, that saved value is placed into EIP.
A buffer overflow occurs when the attacker tricks the application into placing something into a local variable that is too large to fit. It thus overflows its bounds, overwriting the saved registers. Since the saved EIP has been overwritten, when the function returns, execution jumps to whatever value the attacker wants. However, in a /GS-compiled binary, this is much more difficult, as the stack frame instead looks like this:
| Local Variables | Canary | Saved EBP | Saved EIP | Arguments |
The canary is basically an arbitrary random number. However, the system remembers what it was when the stack frame was entered, and before returning to the saved EIP, it checks to make sure the canary value hasn’t changed. This poses a problem for the attacker, because it’s in the way! Any value large enough to overwrite the saved EIP will also overwrite the canary, and the attacker doesn’t know what the canary value is. In order to get it, he would need to execute some code to read it… and he can’t execute code with the canary in the way. Thus, stack buffer overflows are prevented.
Some creative attackers figured out that you could still sometimes do some damage by overwriting not the saved EIP, but the function arguments. If a function makes use of delegation and receives function pointers in arguments, you could sometimes still execute code this way, because they would be used during the function, and /GS only checks the canary when the function returns. Thus, in recent versions of Visual Studio, /GS also causes the system to make a copy of the arguments when a stack frame is created, placed before the local variables. The copy is used until the function exits; thus, overwriting the arguments does nothing until the function returns, at which time the canary is checked, and any corruption is detected.
Hardware Data Execution Protection
Another mitigation added for buffer overflow prevention is what Microsoft calls Data Execution Protection (DEP), which makes use of Intel and AMD’s NX (No-Execute) flag on recent CPUs. On NX-enabled CPUs, each memory page is marked as either code (executable) or data (not executable,) and a fatal error occurs if EIP ever points into a data page. A compiler flag in Visual Studio 2005 and greater (/NXCOMPAT) enables this feature on an application; Linux compilers have also added a similar feature.
The entire stack is marked as a data page, which normally prevents stack overflows. While the attacker can overwrite EIP, he can’t make it jump execution into his own input, so he can’t execute his own code — only code already in the process. However, once again, enterprising hackers have found a way around it — what is called the “return to libc” attack. They overwrite the saved EIP with an address pointing to kernel32!VirtualProtect(), the function that marks pages as code or data! With carefully crafted arguments, they can actually instruct VirtualProtect to mark the stack as code, then return into their code. On the bright side, this is very difficult, and won’t work if the exploitable buffer is a string, because the required arguments are full of null bytes.
A more elaborate attack can call into ntdll!NtSetInformationProcess and disable NX for the entire process. The advantage to this is that it can be done without null bytes (though it’s very complicated), so it can go through strings. The disadvantage, though, is that it won’t likely work on a securely-configured production server. If NX is set globally enabled in boot.ini, ntdll!NtSetInformationProcess is unable to override it.
Though I’ve mentioned Windows-specific function names here, there are Linux equivalents that can be used in attacks. (Indeed, it’s called the “return to libc” attack because of the name of the UNIX/Linux C runtime library.)
Address Space Layout Randomization
All of these evasions of NX protection require being able to instruct the system to jump directly into system functions. Doing this requires address prediction – you have to know where in memory the system functions are so you can jump to them. Even in the simple stack-smashing exploit, the attacker still needs to know where the stack is in order to place that value into the saved EIP. Address Space Layout Randomization (ASLR) is a relatively new technology that makes address prediction nearly impossible by making libraries load into different locations on every reboot. If the attacker doesn’t know where the libraries are, he generally cannot jump to them with any reliability.
ASLR is enabled on Windows using the linker flag /DYNAMICBASE. OpenBSD has ASLR by default; Linux implementations have a weak form of ASLR but can be upgraded to full ASLR using various popular kernel patch. Once again, Mac OS X is the only major OS missing this mitigation, though changes in OS X 10.5 imply they are preparing to add it in a future version.
ASLR randomizes where libraries are found, so that it is very difficult to predict where they are. It does, however, have a few weaknesses:
- In many cases, executable files themselves are not randomized. Thus, attackers are prevented from jumping to system functions, but can still jump to functions in the executable file.
- Only the high-order bytes of addresses are randomized; the attacker can still jump to anything within 16 memory pages of known address space.
- It may be possible to brute-force the location of a library by simply trying all the addresses if you have a section of code that will permit this.
Case #3 is very difficult on Windows, since there are no forking daemons and if a service is made to crash several times in a row it will stop restarting (precisely to prevent this sort of brute-forcing.) However, on UNIX/Linux systems, this is possible, and it may be possible on Windows, too, if the code being exploited eats exceptions (i.e. it has an exception handler that discards errors and keeps the service running.)
Safe Structured Exception Handling
On Windows C++ applications, there’s another way around the stack canaries — exploiting Structured Exception Handling. When SEH is used, the stack looks like this:
| Local Variables | SEH Next | SEH Ret | Canary | Saved EBP | Saved EIP | Arguments |
Those SEH pointers are found before the canary, and thus can be overwritten. It’s possible to craft values for those pointers that point into the stack, and then force an exception to occur. When the exception happens, the pointers are followed and arbitrary code is run. Stack canaries don’t help with this (and the canary can’t be put before the SEH pointers, because in a sense they are local variables, just not ones declared by the programmer), though NX still does. However, since NX is not available on all processors (nor enabled on all processes), Microsoft also introduced the /SafeSEH compiler flag.
In a /SafeSEH process, when execution begins, the system asks all the libraries in a process to find all of their possible exception handlers and write them to a table. Before ever jumping to an SEH Next pointer, it verifies that the pointer points to something on the table. Thus, if the attacker overwrites this pointer, it does no good — he can’t run anything that isn’t an exception handler.
There is a problem with this, though — it only works if every library used by an application was compiled with /SafeSEH and records its exception handlers on the table. If even one library didn’t, then the system can’t verify the pointers — they might well be pointing to an exception handler that just wasn’t registered.
There are no non-Windows equivalents to /SafeSEH, as the SEH method of exception handling is a Windows-specific construct.
Security Annotation Language
Ideally, we wouldn’t need all these mitigations because we wouldn’t be writing buffer overflows in the first place. However, when writing complex code, they can be very hard to see. We would prefer that the compiler just detect the overflows at compile-time, but the compiler doesn’t always know how our variables will be used, and thus cannot determine where an overflow may lie.
Microsoft’s Security Annotation Language (discussed in detail on Michael Howard’s blog here) allows the developer to “hint” to the compiler how all the arguments to a function are used. The developer uses SAL annotations on each function declaration, specifying if arguments are input or output, if they can be NULL, how long their buffers are, etc. These “hints” (actually compiler macros) allow the compiler to verify that no buffer overruns are being introduced.
It’s more work for the developer, as they have to put some thought into the annotations, and a company making use of SAL has to enforce its use (i.e. no checking in functions that aren’t annotated.) However, while it’s work, it’s not difficult — unlike checking for buffer overruns manually, which is very difficult. With properly-annotated functions, most buffer overruns can be caught at compile time, and fixed before the application is ever released. Unfortunately, SAL has not seen much use outside of Microsoft itself, due to the extra developer overhead. It’s easier to get people to add a few compiler & linker flags than to change the way they program.
Subscribe