As part of our platform research in Zimperium zLabs, we recently disclosed a buffer overflow vulnerability affecting multiple Android DRM services to Google. Google classified it as high-severity, designated it as CVE-2017-13253 and have patched it in the March security update.
In this blog post, we’ll cover the details of the vulnerability. First, we’ll go over relevant background information, from general Android mechanisms to specific mechanisms related to the vulnerability. We’ll focus on the recently introduced Project Treble and what its changes actually mean. Then, we’ll analyze the vulnerability and its impact. We’ll look into how due to other faults on some devices this vulnerability could be exploited in order to achieve root privileges. Lastly, we’ll talk about the origin of the vulnerability and how it could have been prevented. We’ll see how despite Google’s claim that Project Treble benefits security, it’s possible that it did the opposite.
Note that in this post I aim to cover a lot of background information related to this vulnerability. If you already have experience with Binder and libbinder you might want to skip the first section and start with the information more specific to the vulnerability itself.
Android’s Binder & Security
A very common security model used in many operating systems (including Android) revolves around inter-process communication (IPC). Untrusted code running in an unprivileged process can communicate with privileged processes (services) and ask them to perform very specific actions that the OS allows. This model relies on the services (and the IPC mechanism itself) to properly validate every input sent from the unprivileged process. In turn, this means that bugs in those services, especially in the input validation part, can easily lead to vulnerabilities.
For example, recent iOS vulnerabilities found by Rani Idan right here at Zimperium rely on this method. Bugs in IPC input validation allow an attacker to execute code with higher permissions from an unprivileged app.
In Android’s case, the IPC mechanism is called Binder. The security of Android Binder services is then naturally very interesting for vulnerability research. Binder has many useful features, for example, it allows processes to transfer complex objects between each other, like file descriptors or references to other Binder services. In order to maintain simplicity and good performance, Binder limits each transaction to a maximum size of 1MB. In cases where processes need to transfer large data, they can use shared memory to quickly share the data between them.
Binder’s C++ library
Android’s Binder library (libbinder) gives many abstractions for C++ code which relies on Binder. It allows you to call methods of remote instances of C++ classes almost as if they don’t reside in another process.
Each object which uses this mechanism implements a few classes in a predefined structure:
- An interface class which defines the methods of the object which should be callable through Binder. Prefixed with “I”.
- A “client-side” class in charge of serializing the input and deserializing the output. Prefixed with “Bp”.
- A “server-side” class in charge of deserializing the input and serializing the output. Prefixed with “Bn”.
Eventually, when using the object the interface type is almost always used. This allows you to treat the object in the same way whether it’s in the same process or in a different one.
Example of libbinder’s usage in the ICrypto interface
The “server-side” part of the code traditionally lies inside the privileged service (although in some cases the roles are reversed), so it is usually in charge of validating the input. Validation code can begin at the Bn* class and continue along the subsequently called methods. This is obviously the most interesting part for vulnerability research.
The ICrypto interface and the decrypt method
After covering Binder in general, let’s take a look at the specific implementation relevant to the vulnerability. The mediadrmserver service (which is, unsurprisingly, in charge of DRM media) provides an interface to a Crypto object, the interface is then named ICrypto. Note that the object was recently changed to CryptoHal, we’ll go into that later. The general purpose of this interface is to allow unprivileged apps to decrypt DRM data which requires higher privileges to decrypt, like access to the TEE. The details of the encryption itself are not in the scope of this blog post, as again, we’re more interested in the input validation.
ICrypto has multiple methods, but undoubtedly the most important one (which all the others revolve around) is decrypt.
decrypt’s signature (source)
One of the first noticeable things in decrypt’s signature is how complex the input is. From our point of view, this is very interesting. Complex input leads to complex validation code (every parameter is transferred over Binder and has to be verified) which can be susceptible to vulnerabilities.
Let’s look at some of the parameters:
[wpsm_comparison_table id=”3″ class=””]
(For more information about some of the parameters from the higher-level Java side of the API, see MediaCodec.CryptoInfo)
Now let’s take a closer look at the types of the source and destination parameters:
The relevant struct members here are mHeapSeqNum and both mSharedMemory members (the rest of DestinationBuffer is in case the destination is not stored as shared memory, a case that’s not relevant to this vulnerability). The name heap is used here to refer to the actual shared memory (which is what you run mmap for). mHeapSeqNum is an identifier for a memory like that, which was previously shared using a method of ICrypto called setHeap. Both mSharedMemory members only represent offset and size of a buffer inside the heap. This means that although mHeapSeqNum is inside the source struct, it is actually relevant for both.
Example of parameters for a run of decrypt with clear data
It is interesting to note that some parts of how the parameters are structured are a bit odd. mSharedMemory is an IMemory, which is actually connected to its own heap and is supposed to represent a buffer inside that, yet this heap is ignored and the offset and size are used for the mHeapSeqNum heap. There’s also the presence of mHeapSeqNum in the source struct, yet it is related to both source and destination. This is all a consequence of recent changes to this code, made as part of a major re-architect of the Android framework called Project Treble.
Project Treble was introduced as part of Android 8.0; its main objective is to make system updates easier by creating a clear separation between AOSP and the vendor. Google also claims that Project Treble benefits Android security by adding more isolations.
For a service like mediadrmserver, Project Treble means separation into multiple processes. The code in charge of decryption belongs to the vendor, so it is separated into multiple vendor processes called HAL, each in charge of its own DRM scheme. mediadrmserver’s role is now reduced into transferring data between the app and the HAL process for the relevant DRM scheme. The communication between mediadrmserver and the HALs is over Binder as well, but in a different domain and using the format of a different library – libhwbinder. The previously mentioned change from Crypto to CryptoHal is because now it’s a different class whose sole purpose is to convert the data to libhwbinder’s format and pass it to the HAL.
The figure above shows why Google claims Project Treble benefits security. Permissions are separated across different processes (each HAL can only communicate with its own driver) and untrusted apps no longer interact directly with the high-privileged process.
Note that as of Android 8.1, the separation is still optional and depends on the vendor. In Nexus 5X, for example, the HALs are all inside the mediadrmserver process. The data is still converted into the HAL format but doesn’t transfer to another process.
I’ve previously mentioned different DRM schemes, in Android terminology the handler for each DRM scheme is called a plugin, or Crypto Plugin in our specific case. The vendor is in charge of providing these plugins, but there is some useful code in AOSP that the vendor can use. For example, AOSP contains a full open-source implementation of a plugin for the ClearKey DRM scheme. Commonly, devices would have the open-source ClearKey plugin and a closed-source Widevine plugin (that’s the case in Nexus/Pixel devices for example).
The issue with the aforementioned Project Treble changes is that now plugins receive data in the HAL format. In order to make the transition simple without needing to update each plugin to support this new format, a default Crypto Plugin implementation was added to AOSP for the use of vendors. This implementation converts the data from HAL format into legacy format and passes it on to the original plugin code. Ideally, this solution should be only temporary until the plugins are updated, because otherwise we’re left with redundant format conversions (to and from HAL).
The flow of data format conversions
Looking into the source code
After covering the general process of ICrypto’s decrypt method, let’s look closely at the validation code for the shared memory buffers. As you might have guessed (since we’re talking about a buffer overflow), this is the area where the vulnerability was found.
As was previously mentioned, validation usually starts at the Bn* classes, in our case that’s BnCrypto, the “server-side” of the ICrypto interface.
Part of BnCrypto’s validation code (source)
- First, the code checks that the sum of the subsample sizes is valid and doesn’t overflow. Remember that is the size of data to be copied.
- It also checks that this sum matches totalSize, another parameter passed over Binder which is pretty much redundant (you can already tell the total size by the sum of the subsamples, the code specifically verifies that this is the case).
- Next check is that the data size doesn’t exceed the size of the source buffer.
- Lastly, it checks that the data size plus the offset still don’t exceed the source buffer.
From there CryptoHal converts the data into HAL format and sends it to the relevant plugin; there’s no interesting validation code here.
Next, the default Crypto Plugin implementation (which may or may not be in a different process) converts the data back to the legacy format and continues to validate it.
Part of the default Crypto Plugin validation code (source)
A side note about this code: I find it to be a bit messy. There are multiple “dest” and “source” variables, sourceBase and destBase are actually the exact same thing (the heap) and there are no comments at all to help you. As I’ve previously mentioned, this part is entirely new and was only added in Android 8.0, so it kind of makes sense. Still, I have a slight suspicion that this messiness led to the vulnerability, as it makes it much more difficult to look at the entire validation code and see if something is missing.
- The first check here is that the sum of both offsets and the buffer size doesn’t exceed the heap size. sourceBase is the heap while source is now what was previously source.mSharedMemory. If you’re confused by the two offsets, remember that mSharedMemory contained an offset and there’s also a different offset parameter for the decrypt method.
- The other check is similar, but performed on the destination buffer. destBuffer is what was destination.mSharedMemory and destBase is the same heap as sourceBase. This time the offset parameter isn’t involved.
Eventually, each buffer is simplified into just a pointer to memory; the offset is now part of the pointer while the buffer size is omitted. In order to determine the data size, the plugin uses the subSamples array.
The ClearKey plugin code for when the data is unencrypted (source)
The code above shows the final part in order to help understand the flow. As was previously mentioned, when the data is unencrypted it is simply copied from one place to another.
By now, I’ve given enough information that you could in theory spot the vulnerability. If you want to try and do that you’re welcome to go back and keep reading the code. From my experience, it’s very difficult in these kind of blog posts to give enough information to spot it while still keeping it an actual challenge (especially from the point of view of someone who already found it), so don’t panic even if you can’t spot it (or maybe it was too easy?).
The problem is that there’s no verification that the amount of data being copied doesn’t exceed the destination buffer. There is only a similar check for the source buffer (the third check of BnCrypto checks that and the next check even adds the extra offset into consideration). The only check related to the destination buffer is the second check of the default Crypto Plugin (which makes sure the buffer lies inside the heap and doesn’t exceed it), but that’s simply not enough.
Let’s take a look at an example. Let’s say the size of the data to copy is 0x1000. Since this size is represented by the subsamples array, we’ll have one entry in that array with 0x1000 clear bytes (and 0 encrypted bytes). The heap will also have 0x1000 bytes and the source buffer will point to the entire heap (offset = 0, size = 0x1000). The destination buffer is where it gets interesting. Let’s say the offset is 0x800 and the size is 0x800. This still fits inside the heap, so it passes the check of the default Crypto Plugin. In this case, there will be an overflow; 0x800 bytes would be written after the heap.
Proof of concept
The code for the aforementioned example which triggers the vulnerability (a link to the full PoC can be found at the end of the blog post)
Note: MemoryBase objects are the implementation of the IMemory libbinder interface. This is an example where Binder’s capability of transferring references to other Binder objects is used. This is also an example where the Binder roles are reversed. The privileged process is the “client-side”, so it is asking for information through Binder and is in charge of validating it.
The effect of the vulnerability
This vulnerability allows the attacker to overwrite memory in the target process with arbitrary data. Since this is an overflow at the memory pages level, currently there is no mitigation to prevent it (like there’s a stack canary for a stack overflow). It is still restricted by the fact that the data has to begin inside the shared memory, because of the check of the default Crypto Plugin. This means that only memory which lies directly after the shared memory can be overwritten. Also, normally many areas in memory are either unallocated or unwritable, so trying to write there will result in a segmentation fault.
The affected processes depend on the vendor implementation. If the vendor doesn’t separate the HALs into different processes, then mediadrmserver is affected. If the vendor does separate them, then every HAL service for a Crypto Plugin which uses the default Crypto Plugin code is affected. Since the default Crypto Plugin code only leaves a pointer to the destination buffer, and the size is only determined by the subsamples, the vendor code after it has no way to tell that it received malformed data. This means that it doesn’t matter how well written the vendor part is, it’ll still be vulnerable.
Let’s say an attacker manages to exploit this vulnerability to elevate privileges into the privileges of the vulnerable service, so let’s look into what they can achieve. Note that this part is mostly speculative. I haven’t written an exploit, but I do have some ideas about how this vulnerability could theoretically be used to reach full root privileges.
This is where Android’s SELinux rules come into play; even though the vulnerable service has more permissions, SELinux still limits them heavily. Still, even after the limitations, we’re left with a very interesting permission: full access to the TEE device.
So what can you do with full access to the TEE? Gal Beniamini’s excellent research shows that many devices fail to properly revoke old vulnerable TEE trustlets. This means that if you attack a device that has an old vulnerable trustlet, you could use the access to the TEE device, load the trustlet and exploit it into code execution on the TEE. More than that, Gal Benimaini also demonstrated in the past how on Qualcomm-based devices TEE code execution could lead into root privileges.
Possible attack flow up to root privileges
The origin of the vulnerability
I’ve already mentioned multiple times how Project Treble caused major modifications to this area of the code. It probably won’t surprise you then to learn that these changes actually introduced this vulnerability (before the changes, the destination buffer could not even be set in this format).
Obviously you can’t solely blame the major refactoring for making the code vulnerable, as that would imply that code refactoring shouldn’t happen, which isn’t true. The thing is that as I’ve already pointed out, multiple parts of this code are messy or redundant. While this in itself doesn’t necessarily make a code vulnerable, it does raise the chances for that as it makes the code harder to review (some parts of the code took me quite a long time to understand compared to the complexity of what they actually do). So while vulnerabilities can sometimes be very hard to spot, messy or redundant code is usually easier to notice. I know it’s easier to criticise bad code design from the point of view of a reviewer than it is to actually write good code, but I still think there are some parts which should be improved.
Google claims that Project Treble benefits Android security, yet in this example it actually did the opposite. Project Treble is not necessarily bad in itself, the key issue here is that the implementation wasn’t handled very well.
Full source code for the PoC which triggers the vulnerability can be found on GitHub, along with some extra information.
- 20.12.2017 – Vulnerability discovered
- 28.12.2017 – Vulnerability details + PoC sent to Google
- 29.12.2017 – Initial response from Google
- 05.03.2018 – Google distributed patches