Archive for October 2005

 
 

premature benchmarking

ok, I had to do it … even though I know that it’s not ready for performance timing, I just had to see how it faired :) I set up the following test program:

   import timeit
   t1 = timeit.Timer(stmt="c.Parse('next week')", setup="import parsetime; c=parsetime.Calendar()")
   t2 = timeit.Timer(stmt="c.Parse('5 min from now')", setup="import parsetime; c=parsetime.Calendar()")

   t1.timeit(1000)
   t2.timeit(1000)

t1 (“next week”) took 0.14699 seconds to run 1000 iterations and t2 (“5 min from now”) took 0.12538 seconds. Both of these were run on my powerbook with all my usual applications running, i.e. not a ideal setup for timing :)

Not bad if I do say so – a lot better than I was fearing to be honest.

date time parsing

I worked on my human-readable date/time parsing library some more tonight and inched it ever closer to something I would feel comfortable showing others :)

I finally broke down and dug into some borrowed code that I was using to parse w3c dates and found the error in one of the regexes that I probably introduced while inserting it into my code – gotta love complex regex’s !. I also worked on the core parsing code to add some scary if’s to handle edge cases that were allowing items like “5 min” and “5 hours before noon” to parse but not allowing “next week” or “next tuesday” to parse. So now the following unit tests are handled:

True    next week
False   2005-02-23
(2005, 2, 23, 0, 0, 0, 2, 54, 0)
(2005, 2, 22, 19, 0, 1, 1, 53, 0)
True    5 minutes from now
True    5 min from now
True    5m from now
True    in 5 minutes
True    5 min
True    5 min before now
True    5 min before next week
True    5 hours from noon
True    5 hours before noon
True    in 2 weeks
True    7 days before now
True    next day
True    tomorrow
True    yesterday
True    next week
True    last week
False   next wednesday
(2005, 11, 1, 23, 39, 13, 1, 305, 0)
(2005, 10, 31, 23, 39, 14, 0, 304, 0)
True    next friday
True    Inc(month=4) from Jan 1, 2004
True    Inc(month=12) from Jan 1, 2004
True    Inc(month=14) from Jan 1, 2004

The issue with “2005-02-23″ seems to be a timezone problem – the borrowed w3c parsing code had gmtime assumptions and I may have not weeded out all of them (my code has localtime assumptions :) The “next wednesday” issue seems to be a off-by-one case as it only fails when the test is run on a Wednesday or Thursday, even though I am taking great pains in the test code to make sure all tests are relative to a known date to avoid this exact issue.

I gotta say, it’s getting close — next comes some basic localization refactoring and then I’ll start running some comparisons for coverage and also performance against the Perl Time::Date library.

start.com

hmm, if you are interested in seeing what the buzz is about Microsoft’s start.com you better not use Safari to browse it! All I got was the START “logo” and a single text entry for search. Not very useful.

So, on a hunch, I decided to try Firefox and suddenly saw all the other items on the page – it’s a news aggregator of sorts.

More to come later if I manage to carve out some free time to actually use it – but hey, it’s not like people read my site for reviews :)

oooo, pretty

Spotted on mde’s blog: an early look at the “landing” page for
Chandler (by landing I think they mean the home page for the user base.)

Wow – dude – seriously sweet work!

The first thing that jumped out at me was how much this looked like many of the OS X application home pages I’ve seen. Silly bear … I think that’s the point :) and this looks a hell of a lot better than even the best a wiki page can do.

Oh, and I totally agree with his point about how CSS needs to be taken out to the shed and beat until alignment becomes easy

microformats

Just now I was scanning the recent activity on the Microformats wiki and doing some simple wiki gardening when I had a moment of insight (at least I hope it was insight ;)

microformats help solve now problems instead of possible problems

Now, what do I mean by that. In my previous job I had to deal with large amounts of data that comecame from different data sources but yet contained similiar data items. After all, there are only so many ways you can “describe” inventory or payroll information right?

Well, actually, there are n+1 ways where n is the number of programmers involved in the process ;)

The method I came to use while solving these problems involved setting up classes in a classic object-oriented manner; like data items get described by like classes and classes forming relationships using either inheritance or containment – all “normal” to most programmers. This worked for majority of the cases and the remainder were handled by either pre-processing the data to normalize it or building those ugly edge-case-if-structures to handle the exceptions within the implementation of a class.

I found that this method worked very well and eventually I came to have a collection of classes that described almost all of the data that I had run across. It got to the point where I could create a new import or export routine by hooking up classes to each other with some procedural “glue”. Pretty nifty if you think about it – not everyone can say that OO methods had actually cut down the time or reduced the amount of new code that had to be written to solve a problem.

But to get back to the point …

Early on in the above process I used to plan and try to anticipate all possible data uses, values and/or possibilities — I mean after all, isn’t that what you were supposed to do when you designed your class heirarchy? Soon I found that the code to handle all possible code paths made the code that dealt with actual data much more complicated and was the source of many an ugly bug. (editor’s note: now he finally gets to the point!) The lesson I learned back then was that while you could look ahead and plan for the things that experience had shown would be possible variations, any other planning was purely premature and not something to do.

This is the aha! moment I had earlier. Microformats are a way to ensure that the data presented, or the visible data – as it’s described in the
the microformats principles wiki page, is both usable by human and machine and also is the simplest method to describe that item of data.

updated 2005-10-22 03:30


Creative Commons Attribution-ShareAlike 3.0 United States
This work by Mike Taylor is licensed under a Creative Commons Attribution-ShareAlike 3.0 United States.