It occurred to me that many performance problems when dealing with large datasets are due to programs not being "lazy enough". My example problem was "ALTER TABLE" statements in a database, which in most implementations will read, modify, and write every row in the table. Because with large tables this is a significant performance hit, we rarely see people doing "dynamic language things" i.e. altering their schemas in application logic.
One reason for being non-lazy (some say: strict) is to ensure consistency across your dataset. If exact consistency is not a requirement, object/document stores like CouchDB or AppEngine's datastore can provide inexpensive "ALTER TABLE" behavior. With these systems, outdated/unaltered rows only need to be upgraded/altered when they're fetched from the datastore. Here's a case where deferring the "ALTER TABLE" code allows us to write more flexible/dynamic programs because our data management is lazier.
Code can be too lazy though, resulting in bad caching behavior and sometimes huge memory space growth.
Any other examples of cases where laziness/demand-driven-evaluation would be a performance win?
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment