Tuesday, December 30, 2008

PHP's strtotime

I generally think of PHP as the buck-toothed high-school drop-out sibling of Perl (itself the unbathed cousin of Python and Ruby). But today I discovered the library function strtotime, which is a very flexible parser for text expressing a date/time. One cool example is a snippet that calculates the current year's Thanksgiving day.

The closest Python equivalent to strtotime seems to be dateutil.parser.parse, though it's much less flexible from a cursory glance. Of course, if a person is only trying to compute "common moveable Christian feasts that can be deduced from the date of Easter Sunday" they need look no further than the mx.DateTime.Feasts library (which includes translations into German and French). Weird. And awesome.

Friday, December 5, 2008

Not Lazy Enough

It occurred to me that many performance problems when dealing with large datasets are due to programs not being "lazy enough". My example problem was "ALTER TABLE" statements in a database, which in most implementations will read, modify, and write every row in the table. Because with large tables this is a significant performance hit, we rarely see people doing "dynamic language things" i.e. altering their schemas in application logic.

One reason for being non-lazy (some say: strict) is to ensure consistency across your dataset. If exact consistency is not a requirement, object/document stores like CouchDB or AppEngine's datastore can provide inexpensive "ALTER TABLE" behavior. With these systems, outdated/unaltered rows only need to be upgraded/altered when they're fetched from the datastore. Here's a case where deferring the "ALTER TABLE" code allows us to write more flexible/dynamic programs because our data management is lazier.

Code can be too lazy though, resulting in bad caching behavior and sometimes huge memory space growth.

Any other examples of cases where laziness/demand-driven-evaluation would be a performance win?