Latest Tweets

Find Posts by Tag
Twitter

Entries in Python (3)

Saturday
Feb192011

Google App Engine and socket.inet_ntoa

Yesterday I started tackling a new problem in Python: How to parse a pcap file. The idea is to host something on Google App Engine and have it do the work. I started working with the Python library dpkt to open the pcap file. But pcap files show the source and destination ip addresses as binary packed decimals. That was  a format I never heard of till now so I had no idea what to do with it. It turns out that socket.inet_ntoa will convert it to the string you are used to seeing, like 10.9.8.2.

Unfortunately, Google App Engine doesn't provide the socket library. So I figured there had to be a way to build this functionality on my own. Let me tell you. Considering how much (erm, how little) I know about Python, this was no easy task. First I had to figure out what that library function does.

Well, the socket library in Python is simply a wrapper to the C socket library on the OS. Finding that source code ended up being fairly easy, but it didn't really help me much. It wasn't until I stumbled onto this article on StackOverflow that I started getting somewhere. But you may notice that that's a Java example and I am looking for Python.

It was pretty trivial to translate that code from Java to Python:

def inet_ntoa(number):
    addresslist=[]
    addresslist.append(str((number>>24)&0xff))
    addresslist.append(str((number>>16)&0xff))
    addresslist.append(str((number>>8)&0xff))
    addresslist.append(str(number&0xff))
    return '.'.join(addresslist)

But that wasn't getting me what I needed. When I tried to use it, I got: TypeError: unsupported operand type(s) for >>: 'str' and 'int'. It wasn't making much sense to me. So I tried a bunch of things to try to understand what this number was. After a while I saw that the number I was getting from Wireshark via dpkt was '\x01\n\x01'. Again, not making much sense.

It wasn't until I went back to Wireshark that I started cluing in on the problem. I looked at the packet and found the section that mentioned the source IP address. It said 10.1.10.1. Clicking on that ip address highlighted the hex representation below: 0a010a01. From the classes I teach on parsing text files using Datagrabber which is part of our Alchemy product, I know that the hex representation of a newline is 0a. That explains the 0a or \n in the middle of the src address above.

Now that I had a better understanding of what dpkt was spitting out for the ip address, I started looking for ways to convert that into a more usable format. Thats when I stumbled onto this StackOverflow discussion on converting hex strings to IP addresses. Struct.unpack('!I',number) was the key.

The resulting function to replace socket.inet_ntoa is listed here:

def inet_ntoa(orignumber):
    addresslist=[]
    number = struct.unpack('!I',orignumber)[0]
    addresslist.append(str((number>>24)&0xff))
    addresslist.append(str((number>>16)&0xff))
    addresslist.append(str((number>>8)&0xff))
    addresslist.append(str(number&0xff))
    return '.'.join(addresslist)
Tuesday
Feb152011

Cleaning Up HTML With Simple Python

When you learn something new, you must practice it every day before you really understand it. I started working with Python the other day, so I am forcing myself to practice by writing scripts for various things that come up. Today's task was cleaning up one of the pages on this website.

One of the pages in the menu on top of this page is for my ConCall Numbers. Its a listing of the Orange Business Conferencing dial in numbers for many countries around the world, along with the participant pass code. I use this for the classes I give, as well as for meetings I need to set up. But the page has been pretty ugly for a long time.

Ishot 110215232657 1

The text had come from an email I received that listed out the numbers. I simply copied and pasted it from the email into the page HTML editor. Along with it came dozens of   codes on every line. Fixing it just wasn't a priority. But it turns out that fixing it was very easy with just a little bit of Python.

The result is shown here. Its still a boring list of numbers, but as you can see, its a lot nicer to look at.

Ishot 110215233315 1

It turns out that the script was very easy to create. I open a file, run three regex searches, and then write it to a new file. Once I had the file, I manually added a starting and ending <table> tag and I was done.

 

For those who are curious, here is the complete script. I am sure there is a better way to write this, but this was quick and easy, and it worked.

#!/usr/bin/python -tt

import sys
import re

def ReadFile('bad.html'):
    text = open(filename,'r').read()
    text = re.sub('(&nbsp;\s?)+', '</td><td>', text)
    text = re.sub('<br />','</td></tr>\n<tr><td>',text)
    text = re.sub('<tr><td>\s?</td><td>','<tr><td>',text)
    f=open('good.html','w')
    f.write(text)
    f.close()
    
def main():
    ReadFile()

if __name__ == '__main__':
    main()
Tuesday
Feb152011

How Google App Engine Fixed My Main Problem With SquareSpace

As you are probably aware, technovangelist.com is hosted at SquareSpace.com. It hasn't always been that way. I switched to SS about a year and a half ago when I got tired of self hosting using Community Server and Graffiti and various other home brew solutions. SS has the advantage of making it very easy to come up with a uniform style across your pages, with some customization in every area. It's really a well designed solution and after this much time with them, I have no plans to leave. That said, I do have one beef with them: They don't offer any server-side page generation techniques beyond the sidebars.

Take a look at the home page at technovangelist.com. I have the sidebar, plus 5 other areas of content, each pulling from a different RSS feed. There are 2 feeds from my blogs here at technovangelist and at faxsolutionsblog.opentext.com. There are 2 other feeds from my videos at vimeo and youtube. Finally there is a feed from my photoblog site, chromagenic.com. The home page at technovangelist really is the clearing house for the brand of 'me'.

The only way to create this kind of page at SquareSpace is with a HTML page, meaning I have to create the HTML from scratch. Thats not a problem for me. What is a problem is that the HTML I create has to be completely client-based: HTML and javascript. Nothing can run on the server. So if I am grabbing 5 different feeds and then generating a page from that all at the server, there is going to be a delay of at least a second or two every single time someone looks at this page. Even though the content doesn't update more than once every 1-2 weeks or more.

But I did it anyway for the first version of this page. I used the magical Google Feed API which did exactly what I wanted. Every time it ran, though, 1-2 seconds were required for drawing the page. There had to be a better way. The first thought was to design a client-side app for me to run every time one of my content sources updates. So I started going down that route, working on some test projects before starting the final Mac app that I wanted. That was last Saturday.

Then during one of my little research missions, trying to find something I needed for the app, I re-stumbled on Google App Engine. Here was a hosted location that could run my own custom server-side code. The original thought was to build the page at GAE, then do some sort of server-side include of the content. But then I thought I hit a bit of a wall: the app had to be written in Java or Python.

I hadn't touched Java in 10 years. I last used it when I did some outsourced marketing projects with Sun Microsystems, building test apps that were used in instructional materials. But then I went .Net all the way working for Microsoft and then Captaris/Open Text. I felt re-learning Java was going to be a big hurdle to GAE. Python on the other hand was a bit more digestible. I didn't know the language, but I have a few friends who spend all of their working days with the language. One of my best friends from my Microsoft days was a PM with IronPython. I felt Python was more accessible.

So I started looking in to it. I installed Aptana Studio 3 which comes with PyDev which allowed me to create and build Python scripts in as easy a way as possible. And I followed the fantastic series of videos that are part of Google's Python Class. Go ahead and watch them. It will take you five hours and you'll come away with a pretty good understanding of the language. So I started looking into Python on Saturday evening around 8PM, and by 5PM on Sunday I was beginning to build my GAE app to generate my home page. The only reason it took so long was that I had a Dim Sum lunch with friends for a few hours in the middle of it all.

The end result is a page that is generated in less than a second. And it's a whole lot easier to manage too. But it's not perfect yet. For now it involves a manual step. I'll go to the GAE page and copy the page. Then go to edit my site, and paste the code in. The result is that for you, the home page is displayed as quickly as possible. And I have to run a single manual, 1-minute step every couple of weeks. In the near future (perhaps next weekend), I'd like to see about having it auto-update my SquareSpace site, or at least cache the content locally and figure a way to do some sort of server-side include.

It was a fun project and I was very happy to see a working solution by the end of the evening Sunday night. And learning Python has already proven to be a good investment. I am already leveraging it in some of the scripts I have written to automate the stuff I do at work. Maybe I'll write up some of the details of those scripts, as well as more about what I actually created on GAE....another night...