Today I got a chance to try Nick Harbour's Tcpxtract program. I had heard of it several months ago, but I had trouble compiling it on FreeBSD. Just now I tried the regular ./configure, make, make install routine using version 1.0.1 and had no problems.
Tcpxtract searches Libpcap traces for file formats it recognizes, using the following configuration file:
#---------------------------------------------------------------------
# ANIMATION FILES
#---------------------------------------------------------------------
#
# AVI (Windows animation and DiVX/MPEG-4 movies)
avi(4000000, RIFF\?\?\?\?);
# MPEG Video
mpg(4000000, \x00\x00\x01\xba, \x00\x00\x01\xb9);
mpg(4000000, \x00\x00\x01\xb3, \x00\x00\x01\xb7);
# Macromedia Flash
fws(4000000, FWS);
#---------------------------------------------------------------------
# GRAPHICS FILES
#---------------------------------------------------------------------
#
#
# AOL ART files
art(150000, \x4a\x47\x04\x0e, \xcf\xc7\xcb);
art(150000, \x4a\x47\x03\x0e, \xd0\xcb\x00\x00);
# GIF and JPG files (very common)
gif(3000000, \x47\x49\x46\x38\x37\x61, \x00\x3b);
gif(3000000, \x47\x49\x46\x38\x39\x61, \x00\x00\x3b);
jpg(1000000, \xff\xd8\xff\xe0\x00\x10, \xff\xd9);
jpg(1000000, \xff\xd8\xff\xe1);
# PNG (used in web pages)
png(1000000, \x50\x4e\x47\?, \xff\xfc\xfd\xfe);
# BMP (used by MSWindows, use only if you have reason to think there are
# BMP files worth digging for. This often kicks back a lot of false
# positives
bmp(100000, BM\?\?\x00\x00\x00);
# TIF
tif(200000000, \x49\x49\x2a\x00);
#---------------------------------------------------------------------
# MICROSOFT OFFICE
#---------------------------------------------------------------------
#
# Word documents
doc(12500000, \xd0\xcf\x11\xe0\xa1\xb1);
# Outlook files
pst(400000000, \x21\x42\x4e\xa5\x6f\xb5\xa6);
ost(400000000, \x21\x42\x44\x4e);
# Outlook Express
dbx(4000000, \xcf\xad\x12\xfe\xc5\xfd\x74\x6f);
idx(4000000, \x4a\x4d\x46\x39);
mbx(4000000, \x4a\x4d\x46\x36);
#
#---------------------------------------------------------------------
# HTML
#---------------------------------------------------------------------
html(50000, \x3chtml, \x3c\x2fhtml\x3e);
#---------------------------------------------------------------------
# ADOBE PDF
#---------------------------------------------------------------------
pdf(5000000, \x25PDF, \x25EOF\x0d);
#---------------------------------------------------------------------
# AOL (AMERICA ONLINE)
#---------------------------------------------------------------------
#
# AOL Mailbox
mail(500000, \x41\x4f\x4c\x56\x4d);
#---------------------------------------------------------------------
# SOUND FILES
#---------------------------------------------------------------------
# wav will be captured as avi.
# Real Audio Files
ra(1000000, \x2e\x72\x61\xfd);
ra(1000000, \x2eRMF);
#---------------------------------------------------------------------
# MISCELLANEOUS
#---------------------------------------------------------------------
#
zip(10000000, PK\x03\x04, \x3c\xac);
java(1000000, \xca\xfe\xba\xbe);
Here are the program's options. Note it can listen to an interface or read a trace.
orr:/var/tmp/tcpxtract$ tcpxtract
Usage: tcpxtract [OPTIONS] [[-d] [-f ]]
Valid options include:
--file, -fto specify an input capture file instead of a device
--device, -dto specify an input device (i.e. eth0)
--config, -cuse FILE as the config file
--output, -odump files to DIRECTORY instead of current directory
--version, -v display the version number of this program
--help, -h display this lovely screen
Here is Tcpxtract in action on a trace containing a visit to a Web site.
orr:/var/tmp/tcpxtract$ tcpxtract -f test.lpc
Found file of type "html" in session [192.168.2.7:14348 -> 192.168.2.5:48117], exporting to 00000000.html
Found file of type "gif" in session [192.168.2.7:14348 -> 192.168.2.5:22002], exporting to 00000001.gif
Found file of type "jpg" in session [192.168.2.7:14348 -> 192.168.2.5:48117], exporting to 00000002.jpg
Found file of type "gif" in session [192.168.2.7:14348 -> 192.168.2.5:43975], exporting to 00000003.gif
Found file of type "gif" in session [192.168.2.7:14348 -> 192.168.2.5:45023], exporting to 00000004.gif
Found file of type "html" in session [192.168.2.7:14348 -> 192.168.2.5:22002], exporting to 00000005.html
Tcpxtract was able to reconstruct the HTML cousin's Bejtlich.com consulting company. It also rebuilt the three GIFs used as graphics on the page.
orr:/var/tmp/tcpxtract$ file 0000*
00000000.html: HTML document text
00000001.gif: GIF image data, version 89a, 305 x 106
00000002.jpg: JPEG image data, JFIF standard 1.02
00000003.gif: GIF image data, version 89a, 31 x 31
00000004.gif: GIF image data, version 89a, 147 x 214
00000005.html: HTML document text
Tcpxtract is not foolproof. Here is a download of putty.zip via FTP through HTTP. In other words, as seen by Tethereal:
4 1.594194 192.168.2.5 52245 192.168.2.7 3128 HTTP GET ftp://ftp.tartarus.
org/pub/people/simon/putty-snapshots/x86/putty.zip HTTP/1.1
Let's see how Tcpxtract handles this trace.
orr:/var/tmp/tcpxtract$ tcpxtract -f test2.lpc
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000000.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000001.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000002.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000003.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000004.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000005.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000006.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000007.zip
Tcpxtract creates 8 .zip files:
orr:/var/tmp/tcpxtract$ ls -al *.zip
-rwx------ 1 richard wheel 297 Jan 3 11:55 00000000.zip
-rwx------ 1 richard wheel 6407 Jan 3 11:55 00000001.zip
-rwx------ 1 richard wheel 213520 Jan 3 11:55 00000002.zip
-rwx------ 1 richard wheel 42590 Jan 3 11:55 00000003.zip
-rwx------ 1 richard wheel 23523 Jan 3 11:55 00000004.zip
-rwx------ 1 richard wheel 10386 Jan 3 11:55 00000005.zip
-rwx------ 1 richard wheel 38498 Jan 3 11:55 00000006.zip
-rwx------ 1 richard wheel 94888 Jan 3 11:55 00000007.zip
orr:/var/tmp/tcpxtract$ file *.zip
00000000.zip: Zip archive data, at least v2.0 to extract
00000001.zip: Zip archive data, at least v2.0 to extract
00000002.zip: Zip archive data, at least v2.0 to extract
00000003.zip: Zip archive data, at least v2.0 to extract
00000004.zip: Zip archive data, at least v2.0 to extract
00000005.zip: Zip archive data, at least v2.0 to extract
00000006.zip: Zip archive data, at least v2.0 to extract
00000007.zip: Zip archive data, at least v2.0 to extract
None of which are similar to the real file:
orr:/var/tmp/tcpxtract$ ls -al /home/richard/putty.zip
-rw-r--r-- 1 richard richard 1069490 Jan 3 11:49 /home/richard/putty.zip
On the other hand, Tcpflow has a little more success, although it is confused by the HTTP traffic over Squid.
orr:/var/tmp/tcpxtract$ tcpflow -r test2.lpc
orr:/var/tmp/tcpxtract$ ls -al 192*
-rw-r--r-- 1 richard wheel 547 Jan 3 11:55 192.168.002.005.52245-192.168.002.007.03128
-rw-r--r-- 1 richard wheel 288 Jan 3 11:55 192.168.002.007.00022-192.168.002.005.51747
-rw-r--r-- 1 richard wheel 1069768 Jan 3 11:55 192.168.002.007.03128-192.168.002.005.52245
orr:/var/tmp/tcpxtract$ file 192*
192.168.002.005.52245-192.168.002.007.03128: ASCII text, with CRLF line terminators
192.168.002.007.00022-192.168.002.005.51747: data
192.168.002.007.03128-192.168.002.005.52245: data
orr:/var/tmp/tcpxtract$ unzip -l 192.168.002.007.03128-192.168.002.005.52245
Archive: 192.168.002.007.03128-192.168.002.005.52245
warning [192.168.002.007.03128-192.168.002.005.52245]: 278 extra bytes at beginning or within zipfile
(attempting to process anyway)
Length Date Time Name
-------- ---- ---- ----
131072 01-02-06 21:03 pageant.exe
608818 01-02-06 19:30 putty.hlp
29840 01-02-06 19:30 putty.cnt
274432 01-02-06 21:03 plink.exe
286720 01-02-06 21:03 pscp.exe
286720 01-02-06 21:03 psftp.exe
434176 01-02-06 21:03 putty.exe
167936 01-02-06 21:03 puttygen.exe
-------- -------
2219714 8 files
orr:/var/tmp/tcpxtract$ unzip 192.168.002.007.03128-192.168.002.005.52245
Archive: 192.168.002.007.03128-192.168.002.005.52245
warning [192.168.002.007.03128-192.168.002.005.52245]: 278 extra bytes at beginning or within zipfile
(attempting to process anyway)
inflating: pageant.exe
inflating: putty.hlp
inflating: putty.cnt
inflating: plink.exe
inflating: pscp.exe
inflating: psftp.exe
inflating: putty.exe
inflating: puttygen.exe
The pageant.exe file worked on a Windows system to which I transferred it.
I really look forward to seeing Tcpxtract develop, and I hope to add some file formats to the configuration file. I might also try to hear the interview with Nick at CyberSpeak.
0 komentar:
Posting Komentar