DPKT
http://code.google.com/p/dpkt/
http://jon.oberheide.org/blog/2008/10/15/dpkt-tutorial-2-parsing-a-pcap-file/
There are a few worked examples around the internet on using this but again lacking in a comprehensive documentation. As a result I tried implementing it only to get seg faults (the most annoying of all errors). I attributed this to excessive memory consumption again.
PYPCAP
http://code.google.com/p/pypcap/
http://www.gradstein.info/python/how-to-understand-the-arp-queries-and-replies-fields-with-pypcap/
Pypcap is handily installed through the python-pypcap module on ubuntu but the examples given show how to handle packets as you receive them on the interface. Since I needed to extract information from an already received capture file, this wasn't terribly helpful. Also, since I have a very limited attention span for poorly documented endeavours I kept on looking.
LIBPCAP
Now for most people this would be the most obvious choice and so it should have been. This is essentially just a wrapper for the C library. Or so I am told, I cant even pretend to have tried the C implementation of packet capture reading. I expect I would have a lot less hair by the end of it.
The operations for getting the code working are fairly self explanatory. But then I would say that, having just done it. It certainly wasn't self explanatory at the time. So here goes:
- Import scapy.layers.all and pcap. The first is required for the "RadioTap()" function and the second for, primarily, the "pcap" class and all it's derivatives
- Using the "pcap.pcapObject()" method defines "p" as a variable which can contain pcap data, because obviously you need somewhere to keep it
- As a result "p" has a series of methods that can be called for it. The primary use of which being opening the pcap. So for the loop of total pcaps in the directory we open each pcap using the open_offline() method.
- In my case I need to look at the radiotap data as well as all subsequent data contained within each packet so again we need somewhere to keep it. Hence "packet=RadioTap()"
- The second method of "p" we need to utilise is "p.next()" which iterates through each of the packets in the file. I write this data into a list variable in "pkt"
- The result is that pkt is now a list which contains [0] *Not sure* [1] Undissected packet information [2] RadioTap received timestamp
- In order to dissect the information we have to run it through the RadioTap parser, hence "packet.dissect(pkt[1])".
- The result of which is a nice parsed packet, with each of the fields being selected through additional parameters, such as ".subtype" here
- Hey presto, we can read packets. However in order to do so we can only move on to the next packet, we dont know how many packets are actually in there, unlike when we loaded the entire file in one go.
- We need a method of terminating the loop cleanly though so we use a try-except. When p.next() reaches the end it returns a TypeError. So excepting this condition allows us to finish the capture file cleanly.
## new_read_pcap.py
## This script reads the contents of pcap files in a directory and summarises the information contained within
from scapy.layers.all import *
import pcap, os
## To have the user import the directory
dir_path = raw_input ('Give the full path to the directory of the pcap files: ')
list_data = []
list_mgmt = []
list_ctrl = []
list_unkn = []
total_packets = 0
p=pcap.pcapObject()
for pcaps in os.listdir(dir_path):
flag = True
i = 0
p.open_offline(dir_path + pcaps)
while flag is not False:
# packet=RadioTap() # Was originally here
try:
i+=1
pkt=p.next()
packet=RadioTap() # But needs to go here!
packet.dissect(pkt[1])
if packet.type == 2L:
# type = data
list_data.append(i)
elif packet.type == 0L:
# type = management
list_mgmt.append(i)
elif packet.type == 1L:
# type= control
list_ctrl.append(i)
else:
list_unkn.append(i)
except TypeError:
flag = False
total_packets += i
storage = open('/home/jonny/Python_work/workfile', 'w+a')
# Opens the file requested with read and append linked to variable storage
storage.write('Summary of the contents of the folder ' + dir_path + ' by the module read_pcap.py\nBy Jonny Milliken\n')
storage.write('Total number of pcap files found in the folder = ' + str(len(os.listdir(dir_path))) + '\n')
storage.write('<><><><><><><><><><><><><><><><><><><><><><><><><><>\n')
storage.write('The total number of data packets is ' + str(len(list_data)) + ' (' + str(len(list_data) * 100 / total_packets) + '%)\n')
storage.write('The total number of management packets is ' + str(len(list_mgmt)) + ' (' + str(len(list_mgmt) * 100 / total_packets) + '%)\n')
storage.write('The total number of control packets is ' + str(len(list_ctrl)) + ' (' + str(len(list_ctrl) * 100 / total_packets) + '%)\n')
storage.write('The total number of unknown packets is ' + str(len(list_unkn)) + ' (' + str(len(list_unkn) * 100 / total_packets) + '%)\n')
storage.close()
<><><>
But oh ho. The problems didn't stop there, oh no. Segfaults abound. Stupid memory. But what is the problem!? A single packet is loaded into memory, dissected and the timestamp extracted. Then this is all over-written, right? Right!? Nope.
Turns out in the implementation above the operation "r.dissect()" actually APPENDS to the existing "r". So in this case "r" is getting increasingly large and causing the same issues as before. As a result I needed to redefine "r" as an empty "RadtioTap()" each time the try is invoked.
Now you ask why I included it at all, given that I have already solved the relatively simple problem. The answer being that I needed something to show for the day it took to work out...