Latest Tweets

Find Posts by Tag
Twitter

Entries in Google App Engine (2)

Saturday
Feb192011

Google App Engine and socket.inet_ntoa

Yesterday I started tackling a new problem in Python: How to parse a pcap file. The idea is to host something on Google App Engine and have it do the work. I started working with the Python library dpkt to open the pcap file. But pcap files show the source and destination ip addresses as binary packed decimals. That was  a format I never heard of till now so I had no idea what to do with it. It turns out that socket.inet_ntoa will convert it to the string you are used to seeing, like 10.9.8.2.

Unfortunately, Google App Engine doesn't provide the socket library. So I figured there had to be a way to build this functionality on my own. Let me tell you. Considering how much (erm, how little) I know about Python, this was no easy task. First I had to figure out what that library function does.

Well, the socket library in Python is simply a wrapper to the C socket library on the OS. Finding that source code ended up being fairly easy, but it didn't really help me much. It wasn't until I stumbled onto this article on StackOverflow that I started getting somewhere. But you may notice that that's a Java example and I am looking for Python.

It was pretty trivial to translate that code from Java to Python:

def inet_ntoa(number):
    addresslist=[]
    addresslist.append(str((number>>24)&0xff))
    addresslist.append(str((number>>16)&0xff))
    addresslist.append(str((number>>8)&0xff))
    addresslist.append(str(number&0xff))
    return '.'.join(addresslist)

But that wasn't getting me what I needed. When I tried to use it, I got: TypeError: unsupported operand type(s) for >>: 'str' and 'int'. It wasn't making much sense to me. So I tried a bunch of things to try to understand what this number was. After a while I saw that the number I was getting from Wireshark via dpkt was '\x01\n\x01'. Again, not making much sense.

It wasn't until I went back to Wireshark that I started cluing in on the problem. I looked at the packet and found the section that mentioned the source IP address. It said 10.1.10.1. Clicking on that ip address highlighted the hex representation below: 0a010a01. From the classes I teach on parsing text files using Datagrabber which is part of our Alchemy product, I know that the hex representation of a newline is 0a. That explains the 0a or \n in the middle of the src address above.

Now that I had a better understanding of what dpkt was spitting out for the ip address, I started looking for ways to convert that into a more usable format. Thats when I stumbled onto this StackOverflow discussion on converting hex strings to IP addresses. Struct.unpack('!I',number) was the key.

The resulting function to replace socket.inet_ntoa is listed here:

def inet_ntoa(orignumber):
    addresslist=[]
    number = struct.unpack('!I',orignumber)[0]
    addresslist.append(str((number>>24)&0xff))
    addresslist.append(str((number>>16)&0xff))
    addresslist.append(str((number>>8)&0xff))
    addresslist.append(str(number&0xff))
    return '.'.join(addresslist)
Tuesday
Feb152011

How Google App Engine Fixed My Main Problem With SquareSpace

As you are probably aware, technovangelist.com is hosted at SquareSpace.com. It hasn't always been that way. I switched to SS about a year and a half ago when I got tired of self hosting using Community Server and Graffiti and various other home brew solutions. SS has the advantage of making it very easy to come up with a uniform style across your pages, with some customization in every area. It's really a well designed solution and after this much time with them, I have no plans to leave. That said, I do have one beef with them: They don't offer any server-side page generation techniques beyond the sidebars.

Take a look at the home page at technovangelist.com. I have the sidebar, plus 5 other areas of content, each pulling from a different RSS feed. There are 2 feeds from my blogs here at technovangelist and at faxsolutionsblog.opentext.com. There are 2 other feeds from my videos at vimeo and youtube. Finally there is a feed from my photoblog site, chromagenic.com. The home page at technovangelist really is the clearing house for the brand of 'me'.

The only way to create this kind of page at SquareSpace is with a HTML page, meaning I have to create the HTML from scratch. Thats not a problem for me. What is a problem is that the HTML I create has to be completely client-based: HTML and javascript. Nothing can run on the server. So if I am grabbing 5 different feeds and then generating a page from that all at the server, there is going to be a delay of at least a second or two every single time someone looks at this page. Even though the content doesn't update more than once every 1-2 weeks or more.

But I did it anyway for the first version of this page. I used the magical Google Feed API which did exactly what I wanted. Every time it ran, though, 1-2 seconds were required for drawing the page. There had to be a better way. The first thought was to design a client-side app for me to run every time one of my content sources updates. So I started going down that route, working on some test projects before starting the final Mac app that I wanted. That was last Saturday.

Then during one of my little research missions, trying to find something I needed for the app, I re-stumbled on Google App Engine. Here was a hosted location that could run my own custom server-side code. The original thought was to build the page at GAE, then do some sort of server-side include of the content. But then I thought I hit a bit of a wall: the app had to be written in Java or Python.

I hadn't touched Java in 10 years. I last used it when I did some outsourced marketing projects with Sun Microsystems, building test apps that were used in instructional materials. But then I went .Net all the way working for Microsoft and then Captaris/Open Text. I felt re-learning Java was going to be a big hurdle to GAE. Python on the other hand was a bit more digestible. I didn't know the language, but I have a few friends who spend all of their working days with the language. One of my best friends from my Microsoft days was a PM with IronPython. I felt Python was more accessible.

So I started looking in to it. I installed Aptana Studio 3 which comes with PyDev which allowed me to create and build Python scripts in as easy a way as possible. And I followed the fantastic series of videos that are part of Google's Python Class. Go ahead and watch them. It will take you five hours and you'll come away with a pretty good understanding of the language. So I started looking into Python on Saturday evening around 8PM, and by 5PM on Sunday I was beginning to build my GAE app to generate my home page. The only reason it took so long was that I had a Dim Sum lunch with friends for a few hours in the middle of it all.

The end result is a page that is generated in less than a second. And it's a whole lot easier to manage too. But it's not perfect yet. For now it involves a manual step. I'll go to the GAE page and copy the page. Then go to edit my site, and paste the code in. The result is that for you, the home page is displayed as quickly as possible. And I have to run a single manual, 1-minute step every couple of weeks. In the near future (perhaps next weekend), I'd like to see about having it auto-update my SquareSpace site, or at least cache the content locally and figure a way to do some sort of server-side include.

It was a fun project and I was very happy to see a working solution by the end of the evening Sunday night. And learning Python has already proven to be a good investment. I am already leveraging it in some of the scripts I have written to automate the stuff I do at work. Maybe I'll write up some of the details of those scripts, as well as more about what I actually created on GAE....another night...