As mobile malware advances to the levels of desktop malware, it’s not uncommon to stumble upon protected APKs while analysing malware. Most of the times, the sample is simply obfuscated via classes/variables name stripping from the DEX file and/or strings obfuscation; but other times several layers divide the researcher from the original code, including:
The best approach to learn how to defeat these techniques is to understand more about their implementation. For example, simply searching Google for “Android anti-debugging techniques” will reveal a huge amount of research on the topic.
Different kinds of analysis and attack methodologies have been developed to address the unpacking problem. The following is a brief list of research projects aimed at automatically extracting the original code from a protected APK, and can be divided in two main groups:
Although the listed solutions are perfect for a quick assessment on the nature of the sample to be analyzed, sometimes it’s worth taking a deeper look on how a particular protection works and to acknowledge all the used techniques. We can’t be sure if the dumped code is complete until we fully understood the unpacking mechanism used to load it. As an example: the release of a particular classes.dex file may happen only if a specific condition is met; or again a class may have not been loaded at runtime yet and with the dumping mechanism we missed it. Note that some of the listed research papers also took a further step and proposed possible solutions for the arbitrary dynamic loading problem.
We can agree that these are limitations of taking a fully dynamic approach, and that’s why a static reversing approach should be combined in the analysis of code such as protection code, as it may actually give valuable information on the inner working or on little-known anti-analysis tricks that may also be employed by a malware in the wild.
The sample we are going to explore has been chosen because a lot of Chinese applications are protected with it and several of them turned out to be malicious. The protection has been updated and strengthen during the past months, but the core ideas have been kept mostly unchanged.
This packer does not deliver obfuscation at smali level, but the protection’s stub is made of several layers that, at the end of the execution, will load the original classes.dex file in memory. The following description outlines how this particular protection has been analysed, with some hints that may be useful while reverse engineering similar samples.
Inspecting the protected APK with jadx, we can immediately see that the AndroidManifest.xml still contains all the original information in clear (e.g. permissions, activities, services, receivers and providers) and that the original APK resources don’t seem to be compressed or encrypted; other protectors may corrupt or obfuscate the manifest and the resources to thwart the analysis tools, but this is clearly not the case.
Taking a look at the identified package and decompiled classes, we can notice that none of the original entry points declared in the AndroidManifest.xml are available; instead a class named com.qihoo.util.StubApp1868252644 poses as new entry point, or more appropriately it poses as protection’s stub. The class inherits from android.app.Application 7, this is done to guarantee the execution before any other application class when the process is created. According to the documentation, the com.qihoo.util.StubApp1868252644 should have been declared in the manifest application tag, and in fact there can be found.
<application android:theme="@style/Theme.Background" android:label="@string/app_name" android:icon="@drawable/icon" android:name="com.qihoo.util.StubApp1868252644" android:allowBackup="false" android:largeHeap="true" android:supportsRtl="true" android:qihoo="activity"> |
The decompiled code also includes other two classes:
We gave a look at the decompiled code through jadx, but we concluded that although the manifest and the resources are there, the original code is clearly missing from the classes.dex file embedded in the APK, but is it really gone?
The classes.dex file size is roughly 4.3 MB, but it contains only 3 classes with not enough code to explain the size. The next step is looking at the DEX header:
The DEX header shows a data_size of 6104 bytes and a data_off value of 2712. If we go to the offset 8816, we clearly see that we didn’t reach the end of the DEX file as usually expected, so something is incorrect. The first bytes at that offset don’t look really meaningful, but a educated eye may notice that the first 2 bytes form the string “qh” which really looks like a magic value to identify the Qihoo data section start (spoiler: it is).
We can’t get many more information from the sequence of bytes; we can guess that it’s somehow encoded (e.g. notice that we see the 0x52 value repeating a lot, some hint to a possible simple XOR encoding?).
It’s time to move to the analysis of the com.qihoo.util.StubApp1868252644 code, the following is the snippet of decompiled source:
package com.qihoo.util; |
import android.app.Application; |
import android.content.Context; |
import android.os.Build; |
import android.util.Log; |
import java.io.BufferedInputStream; |
import java.io.File; |
import java.io.FileInputStream; |
import java.io.FileNotFoundException; |
import java.io.FileOutputStream; |
import java.io.IOException; |
import java.io.InputStream; |
import java.io.RandomAccessFile; |
import java.lang.reflect.Method; |
public class StubApp1868252644 extends Application { |
private static Context context; |
public static Application newApp = null; |
public static Application runApp = null; |
private static String soName = "libjiagu"; |
public static String strEntryApplication = "com.qihoo360.crypt.entryRunApplication"; |
public static native void interface5(Application application); |
public static native String interface6(String str); |
public static native boolean interface7(Application application, Context context); |
public static native boolean interface8(Application application, Context context); |
public static native void mark(); |
public static native int n0111(); |
public static native long n01112(boolean z); |
public static native long n0112(); |
public static native void n01120(long j); |
public static native long n01122(long j); |
public static native void n011230(long j, Object obj); |
public static native float n011231(long j, Object obj); |
public static native void n0112310(long j, Object obj, float f); |
public static native void n01123110(long j, Object obj, int i, boolean z); |
public static native void n011231110(long j, Object obj, int i, int i2, int i3); |
public static native long n0112312(long j, Object obj, int i); |
public static native long n011232(long j, Object obj); |
public static native void n0112322323230(long j, Object obj, long j2, long j3, Object obj2, long j4, Object obj3, long j5, Object obj4); |
public static native void n01123230(long j, Object obj, long j2, Object obj2); |
public static native boolean n01123231(long j, Object obj, long j2, Object obj2); |
public static native void n011232310(long j, Object obj, long j2, Object obj2, int i); |
public static native void n0112332310(long j, Object obj, Object obj2, long j2, Object obj3, boolean z); |
public static native void n01123330(long j, Object obj, Object obj2, Object obj3); |
public static native boolean n0112333111(long j, Object obj, Object obj2, Object obj3, int i, int i2); |
public static native int n011311131(Object obj, int i, int i2, int i3, Object obj2); |
public static native void n01131130(Object obj, int i, int i2, Object obj2); |
public static native boolean n0113311(Object obj, Object obj2, int i); |
public static native boolean n01133111(Object obj, Object obj2, int i, boolean z); |
public static native boolean n01133331(Object obj, Object obj2, Object obj3, Object obj4); |
public native Object n1113113(Object obj, int i, float f); |
public native Object n111313(Object obj, int i); |
public native Object n111313113(Object obj, int i, Object obj2, int i2, int i3); |
public native Object n111323(Object obj, long j); |
public native Object n11133(Object obj); |
public native boolean n111331(Object obj, Object obj2); |
public static Context getAppContext() { |
return context; |
} |
public static Application getNewAppInstance(Context context) { |
try { |
if (newApp == null) { |
ClassLoader classLoader = context.getClassLoader(); |
if (classLoader != null) { |
Class loadClass = classLoader.loadClass(strEntryApplication); |
if (loadClass != null) { |
newApp = (Application) loadClass.newInstance(); |
} |
} |
} |
} catch (Exception e) { |
e.printStackTrace(); |
} |
return newApp; |
} |
public static void ChangeTopApplication() { |
try { |
interface7(newApp, runApp.getBaseContext()); |
} catch (Exception e) { |
e.printStackTrace(); |
} |
} |
public void onCreate() { |
super.onCreate(); |
if (Configuration.ENABLE_CRASH_REPORT) { |
prepareInitCrashReport(); |
} |
ChangeTopApplication(); |
if (newApp != null) { |
interface5(newApp); |
newApp.onCreate(); |
} |
if (Configuration.ENABLE_CRASH_REPORT) { |
initCrashReport(); |
} |
} |
private void prepareInitCrashReport() { |
try { |
Class.forName("com.qihoo.bugreport.CrashReport").getDeclaredMethod("prepareInit", new Class[0]).invoke(null, new Object[0]); |
} catch (Throwable th) { |
Log.e("CRASH_REPORT", "Failed to reflect prepareInit method of Class CrashReport."); |
} |
} |
private void initCrashReport() { |
try { |
Class.forName("com.qihoo.bugreport.CrashReport").getDeclaredMethod("init", new Class[]{Context.class}).invoke(null, new Object[]{getApplicationContext()}); |
} catch (Throwable th) { |
Log.e("CRASH_REPORT", "Failed to reflect init method of Class CrashReport."); |
} |
} |
public static Boolean isX86Arch() { |
try { |
for (String contains : Build.SUPPORTED_32_BIT_ABIS) { |
if (contains.contains("x86")) { |
return Boolean.valueOf(true); |
} |
} |
} catch (NoSuchFieldError e) { |
if (Build.CPU_ABI.contains("x86") || Build.CPU_ABI2.contains("x86")) { |
return Boolean.valueOf(true); |
} |
try { |
RandomAccessFile randomAccessFile = new RandomAccessFile("/system/build.prop", "r"); |
String readLine = randomAccessFile.readLine(); |
while (readLine != null) { |
if (readLine.contains("ro.product.cpu.abi") && readLine.contains("x86")) { |
return Boolean.valueOf(true); |
} |
readLine = randomAccessFile.readLine(); |
} |
} catch (FileNotFoundException e2) { |
e2.printStackTrace(); |
} catch (IOException e3) { |
e3.printStackTrace(); |
} |
} |
return Boolean.valueOf(false); |
} |
private void initAssetForNative() { |
try { |
Class.forName("com.qihoo.dexjiagu.TransitMgr").getMethod("initAssetForNative", new Class[]{Context.class}).invoke(null, new Object[]{this}); |
} catch (Exception e) { |
} |
} |
protected void attachBaseContext(Context context) { |
super.attachBaseContext(context); |
context = context; |
if (newApp == null) { |
String absolutePath = context.getFilesDir().getAbsolutePath(); |
Boolean isX86Arch = isX86Arch(); |
Boolean valueOf = Boolean.valueOf(false); |
if (Build.CPU_ABI.contains("64") || Build.CPU_ABI2.contains("64")) { |
valueOf = Boolean.valueOf(true); |
} |
if (isX86Arch.booleanValue()) { |
copy(context, soName + "_x86.so", absolutePath, soName + ".so"); |
} else { |
copy(context, soName + ".so", absolutePath, soName + ".so"); |
} |
if (valueOf.booleanValue()) { |
if (isX86Arch.booleanValue()) { |
copy(context, soName + "_x64.so", absolutePath, soName + "_64.so"); |
} else { |
copy(context, soName + "_a64.so", absolutePath, soName + "_64.so"); |
} |
System.load(absolutePath + "/" + soName + "_64.so"); |
} else { |
System.load(absolutePath + "/" + soName + ".so"); |
} |
} |
if (runApp == null) { |
runApp = this; |
} |
newApp = getNewAppInstance(context); |
if (newApp != null) { |
try { |
Method declaredMethod = Application.class.getDeclaredMethod("attach", new Class[]{Context.class}); |
if (declaredMethod != null) { |
declaredMethod.setAccessible(true); |
declaredMethod.invoke(newApp, new Object[]{context}); |
} |
} catch (Exception e) { |
e.printStackTrace(); |
} |
} |
interface8(newApp, context); |
initAssetForNative(); |
} |
public static boolean copy(Context context, String str, String str2, String str3) { |
String str4 = str2 + "/" + str3; |
File file = new File(str2); |
if (!file.exists()) { |
file.mkdir(); |
} |
try { |
file = new File(str4); |
if (file.exists()) { |
boolean z; |
InputStream open = context.getResources().getAssets().open(str); |
InputStream fileInputStream = new FileInputStream(file); |
BufferedInputStream bufferedInputStream = new BufferedInputStream(open); |
BufferedInputStream bufferedInputStream2 = new BufferedInputStream(fileInputStream); |
if (isSameFile(bufferedInputStream, bufferedInputStream2)) { |
z = true; |
} else { |
z = false; |
} |
open.close(); |
fileInputStream.close(); |
bufferedInputStream.close(); |
bufferedInputStream2.close(); |
if (z) { |
return z; |
} |
} |
InputStream open2 = context.getResources().getAssets().open(str); |
FileOutputStream fileOutputStream = new FileOutputStream(str4); |
byte[] bArr = new byte[7168]; |
while (true) { |
int read = open2.read(bArr); |
if (read <= 0) { |
break; |
} |
fileOutputStream.write(bArr, 0, read); |
} |
fileOutputStream.close(); |
open2.close(); |
try { |
Runtime.getRuntime().exec("chmod 755 " + str4); |
} catch (Exception e) { |
} |
return true; |
} catch (Exception e2) { |
e2.printStackTrace(); |
return false; |
} |
} |
public static boolean isSameFile(BufferedInputStream bufferedInputStream, BufferedInputStream bufferedInputStream2) { |
try { |
int available = bufferedInputStream.available(); |
int available2 = bufferedInputStream2.available(); |
if (available != available2) { |
return false; |
} |
byte[] bArr = new byte[available]; |
byte[] bArr2 = new byte[available2]; |
bufferedInputStream.read(bArr); |
bufferedInputStream2.read(bArr2); |
for (available2 = 0; available2 < available; available2++) { |
if (bArr[available2] != bArr2[available2]) { |
return false; |
} |
} |
return true; |
} catch (FileNotFoundException e) { |
e.printStackTrace(); |
return false; |
} catch (IOException e2) { |
e2.printStackTrace(); |
return false; |
} |
} |
} |
The source code is not obfuscated in any way, and the entry point can be identified with the attachBaseContext 8 method:
Once the library is loaded, the execution passes to the native code and we need to acknowledge an important thing before proceeding, namely being sure we don’t miss any code during the loading phase of the native library. In fact, the ELF structure and linker documentation details a step taken by the runtime linker before the control is passed to the entry point of the shared library (that is JNI_OnLoad); meet the “Initialization and Termination Routines”. Citing the documentation [9, 10]:
“The .preinit_array, .init_array, and .init sections, are created by the link-editor when a dynamic object is built. These sections are labeled with the .dynamic tags DT_PREINIT_ARRAY, DT_INIT_ARRAY and DT_INIT respectively. The functions whose addresses are contained in the arrays specified by DT_PREINIT_ARRAY and DT_INIT_ARRAY are executed by the runtime linker in the same order in which their addresses appear in the array.”
To check the presence of the initialization sections, we can rely both on 010Editor with help of the ELF template or writing a simple script with LIEF 11 to extract the needed information:
import lief |
if __name__ == "__main__": |
library = lief.parse("libjiagu.so") |
print "[+] ELF Header" |
print library.header |
print "[+] Initialization and Termination Routines" |
INIT_ENTRIES = [ |
lief.ELF.DYNAMIC_TAGS.INIT, lief.ELF.DYNAMIC_TAGS.INIT_ARRAY, lief.ELF.DYNAMIC_TAGS.INIT_ARRAYSZ, # .init_array |
lief.ELF.DYNAMIC_TAGS.PREINIT_ARRAY, lief.ELF.DYNAMIC_TAGS.PREINIT_ARRAYSZ, # .preinit_array |
lief.ELF.DYNAMIC_TAGS.FINI, lief.ELF.DYNAMIC_TAGS.FINI_ARRAY, lief.ELF.DYNAMIC_TAGS.FINI_ARRAYSZ # .fini_array |
] |
for entry in library.dynamic_entries: |
if entry.tag in INIT_ENTRIES: |
print entry |
Executing the script on the native library will reveal the presence of a termination routine at offset 0x1a00 and the absence of any initialization routine. Apparently, newer versions of the protector have two function offsets in the .init_array section and the functions should be used to initialize some strings and clear the dynamic section once the ELF is completely loaded; this is used as a way to scramble the dynamic analysis. At this point, we can safely start our analysis from the conventional entry point function JNI_OnLoad; for the static and dynamic analysis IDA will be our tool of trade.
After loading the APK on an IDA window and the native library on another one, as explained in this blogpost, we can start our dynamic analysis in the native world. The JNI_OnLoad function retrieves the JNIEnv pointer as usual and then jumps to a quite interesting function that I renamed VM_ENTER.
The function is interesting because: if you are familiar with virtual machine obfuscation, you can identify the snippet of code as being the VM_ENTER function executed right before jumping to the virtual machine execution loop. It implements some common operations:
It then jumps to another function renamed as EXECUTE_BYTECODE which implements the following operations:
The VIRTUAL_MACHINE function execution graph may look a bit scary at first but it’s made of many single blocks that should be approached individually to understand the flow.
The virtual machine body will now be explained and the analysis of two simple virtual instructions will be detailed. All references to ARM registers will apply solely to the analysed sample. The main point about this explanation is giving an insight on how to reverse a virtual machine, but it’s not to be assumed as a general solution to the virtual machine obfuscation problem. In the Attachments section, a simple de-virtualizer script can be downloaded to check how the virtual semantic has been converted to a pseudo-ARM semantic.
The protection implementation of the virtual machine loop follows a pretty common execution sequence, but other solutions, although similar, may follow a completely different approach 12.
The execution loop is made of the following phases:
Each virtual opcode has a different semantic, updates the VM_REG_CTX and the VM_BYTECODE_INDEX accordingly.
The two VM instructions that we are going to look at have been named VM_NOP and VM_CALL. For the following analysis the execution context is the following:
This is the simplest of all the instructions and, as the name suggests, it does nothing; in fact NOP stands for No OPeration. The virtual PC is incremented by 1 and saved.
This is an important virtual instruction which is used to call all the functions or APIs from the virtualized code. Hooking the following code will help understand the anti-debug tricks and the unpacking phase before jumping to the second native stub of the protection.
To be able to write the de-virtualizer, the following steps have been taken:
The Python code is obviously not production ready, but it’s good enough to be able to analyze the virtualized code in the sample and to be used as a base to build similar de-virtualizers. The real challenge has been understanding how the virtual machine handles the conditional control flow (which is entirely based on the value of the virtual CPSR register) and to implement correctly the semantic of the helper functions used by the virtual machine (e.g. arithmetic, bit-testing and control-flow functions).
The sample in question has four bytecode virtualized sequences; the control flow graph of the first and third functions have been generated as example:
As can be spotted, there are a lot of BLX calls, and the destination address has been identified. Although the de-virtualized code may not be perfect at first, also at early stages it’s going to give an indication of which operations are carried on and which functions are called.
Another analysis of a newer version of the protector is available 13, but translating from Chinese to English, it’s not clear if the virtualization mechanism has been removed or drastically changed. The analysis makes it clear that during the anti-debugging steps, a lot of jumps have been executed by the code, so it seems the virtualization is still in place, but it was probably ignored during the research phase.
Like every respectable protector, also this one relies on anti-debug checks in the early stage of the unpacking phase. In particular, the first bytecode sequence embeds all of the BLX calls to the anti-debugger functions and modifies the execution accordingly (e.g. killing the process with raise(SIGKILL)).
A brief list of anti-debug checks present in the sample are:
After all anti-debug checks have been validated the second stub unpacking stage starts, and the execution can be summed up with the following steps:
The second stub ELF file has been uncompressed in memory, but it has not been properly loaded by the system. In fact, the native library integrates a ripped portion of the system ELF loader & dynamic linker aimed at properly initializing the second stub. The process consists of the following phases 14:
The second stub contains a lot more code than the loaded native library. In fact, it’s main purpose is to identify the Android execution environment (ART or Dalvik), decrypt and load the original classes.dex (more than one if MultiDex is supported). The loading steps can be summed up as follows:
During the analysis various tools have been used to get more information about the file structures or the code itself. Here is the recap:
This is a list of resources that have been developed during the analysis of the protection; they can be found at the following repository.