Archive for September 2004

 
 

parsing, chump and sites

I’ve worked some more on the date/time parsing code over the last couple nights – working on some of the edge case situations and also trying to add more tests. One of the things I’m having to really resist is the urge to make early refactoring passes — I catch myself thinking “hmm, maybe it will be faster if I do *this*”. Fortunately I’ve resisted as there is nothing like doing a premature optimization _before_ you have even finished version 1.0 :)

Part of the reason that I’ve been working on the parsing code is it seems that people have managed to find my little corner of the web and have made comments about how they are interested in this code — wow!

I’ve also started working on my version of the behaviour that “the Daily Chump IRC bot”:http://usefulinc.com/chump/ implements. Not that their version is bad or anything, I just like working on these kind of problems and will compare how they did it to how I did it and see what I can learn. My version is going to be a “Supybot”:http://supybot.sourceforge.net plugin.

Grant and I have been working on setting up kwiki, well, ok, _he_ has been setting up kwiki and I’ve been watching ;) and we have also setup a public svn repository for some of the work we have been doing — more details later when it’s ready for some public scrutiny :)

the joys of parsing

Over the last couple of months I’ve heard different people in different contexts talking about wanting a python routine to parse time text. Evidently the routines to parse text like “5 minutes from now” or “next wednesday” are not found in the python world – some of it can be found in other languages.

I have plenty of code that parses time and date text from work – all of it proprietary and in Delphi :( But that allows me to have fun porting it to Python!

So I started working on timeparse.py and currently it handles the basic of formats and I’m slowly working it to handle more every night ;)

Parsing human readable time and date code is a fun mix of styles – you need to extract not only the literal values but you also have to infer patterns just by the ordering and/or relation. For example: “next week” seems very obvious to a human but is missing a ton of information for the computer – all of it the human infers. To handle that, and others, you need to pick sane defaults and also recognize certain edge cases.

Now I’m not an academic, so you won’t find me going thru the differences between this style of parsing or that — but I figure that I should post some thoughts as I work them out.

anywho, more info tomorrow – I just realized it’s 0200 hrs!


Creative Commons Attribution-ShareAlike 3.0 United States
This work by Mike Taylor is licensed under a Creative Commons Attribution-ShareAlike 3.0 United States.