Friday, 8 October 2010

Adventures in Python

Back for my second post nearly a month later. And to think I was joking about the massive gap in postings. The massive pain in the interim has been trying to find any real documentation or guides on how to read in pcap files into python. Various different attempts went wrong for various reasons and I suppose that seems like a good way to get going again.


<><><>

## read_pcap.py

## This script reads the contents of pcap files in a directory and summarises the information contained within


from scapy.all import *
import os

## To have the user import the directory

dir_path = raw_input ('Give the full path to the directory of the pcap files: ')

list_data = []
list_mgmt = []
list_ctrl = []
list_unkn = []
total_packets = 0


# For each of the pcaps in the directory

for pcaps in os.listdir(dir_path):

      pcktList = rdpcap (dir_path + pcaps)
      total_packets += len(pcktList)


# For each of the packets in the loaded list

      for i in range(len(pcktList)):

            if pcktList[i].type == 2L:
            # type = data
                  list_data.append(i)
            elif pcktList[i].type == 0L:
            # type = management
                  list_mgmt.append(i)
            elif pcktList[i].type == 1L:
            # type= control
                  list_ctrl.append(i)
            else:
                  list_unkn.append(i)

storage = open('/home/jonny/Python_work/workfile', 'w+a')

# Opens the file requested with read and append linked to variable storage

storage.write('Summary of the contents of the folder ' + dir_path + ' by the module read_pcap.py\nBy Jonny Milliken\n')
storage.write('Total number of pcap files found in the folder = ' + str(len(os.listdir(dir_path))) + '\n')
storage.write('<><><><><><><><><><><><><><><><><><><><><><><><><><>\n')
storage.write('The total number of data packets is ' + str(len(list_data)) + ' (' + str(len(list_data) * 100 / total_packets) + '%)\n')
storage.write('The total number of management packets is ' + str(len(list_mgmt)) + ' (' + str(len(list_mgmt) * 100 / total_packets) + '%)\n')
storage.write('The total number of control packets is ' + str(len(list_ctrl)) + ' (' + str(len(list_ctrl) * 100 / total_packets) + '%)\n')

<><><>

This piece is simple enough, loading in the entirety of the pcap into a list with each packet being held in memory and selected by the index value. This is a nearly perfect solution in theory since all packets could be selected at will. But as the packet number increases, tested to a limit of around 200,000, the memory consumption on a 3GB RAM system peaks. Above this packet level the operating system, assumedly, starts trying to page the information into hard drive storage which slows the operation to a crawl. Not a deal breaker if you are working with sufficiently small capture files, but at the moment I am working with a directory of 25GB of pcap files with each containing around 1.2million packets....

Of course it would be that the most logical, documented and convenient method would be exactly the one that won't work in my case. There are alternatives though, after all Wireshark seems to open them just fine so it's clearly not impossible!!

1 comment: