I generally think of PHP as the buck-toothed high-school drop-out sibling of Perl (itself the unbathed cousin of Python and Ruby). But today I discovered the library function strtotime, which is a very flexible parser for text expressing a date/time. One cool example is a snippet that calculates the current year's Thanksgiving day.
The closest Python equivalent to strtotime seems to be dateutil.parser.parse, though it's much less flexible from a cursory glance. Of course, if a person is only trying to compute "common moveable Christian feasts that can be deduced from the date of Easter Sunday" they need look no further than the mx.DateTime.Feasts library (which includes translations into German and French). Weird. And awesome.
Tuesday, December 30, 2008
Friday, December 5, 2008
Not Lazy Enough
It occurred to me that many performance problems when dealing with large datasets are due to programs not being "lazy enough". My example problem was "ALTER TABLE" statements in a database, which in most implementations will read, modify, and write every row in the table. Because with large tables this is a significant performance hit, we rarely see people doing "dynamic language things" i.e. altering their schemas in application logic.
One reason for being non-lazy (some say: strict) is to ensure consistency across your dataset. If exact consistency is not a requirement, object/document stores like CouchDB or AppEngine's datastore can provide inexpensive "ALTER TABLE" behavior. With these systems, outdated/unaltered rows only need to be upgraded/altered when they're fetched from the datastore. Here's a case where deferring the "ALTER TABLE" code allows us to write more flexible/dynamic programs because our data management is lazier.
Code can be too lazy though, resulting in bad caching behavior and sometimes huge memory space growth.
Any other examples of cases where laziness/demand-driven-evaluation would be a performance win?
One reason for being non-lazy (some say: strict) is to ensure consistency across your dataset. If exact consistency is not a requirement, object/document stores like CouchDB or AppEngine's datastore can provide inexpensive "ALTER TABLE" behavior. With these systems, outdated/unaltered rows only need to be upgraded/altered when they're fetched from the datastore. Here's a case where deferring the "ALTER TABLE" code allows us to write more flexible/dynamic programs because our data management is lazier.
Code can be too lazy though, resulting in bad caching behavior and sometimes huge memory space growth.
Any other examples of cases where laziness/demand-driven-evaluation would be a performance win?
Friday, October 10, 2008
Ten Years of Progress
"I'm afraid this is very bad. The stones tell me the boar god came from far to the west."

Hello from October 10th 2008. The Dow Jones Industrial Average closed yesterday at 8,579 which is a little bit under its close of 8,643 on March 10th 1998.
Does this mean that the global economy could have been put on hold ten years ago and life would be just the same today?
Probably not. People have to eat.
Does it mean that ten years worth of free market profit motive could have been thrown out the window, that executives could have been told to just keep 'er runnin' and life would be just the same as today?
Probably not. There have been winners and losers, and markets have adjusted for changing consumer preferences over the decade.
I don't know what it means.
But it does seem like an appropriate time to ask:
"Have we been making progress?"
I think that the answer is: Yes!
Though we may have to scale down our retirement plans, and start buying the less expensive brand of veggie burgers, our Wikipedia Content Index (WCI) just keeps growing:

And lest we ignore the importance of emerging global cute funds, think of how far we've come since 1998: a world without lolcats.

Hello from October 10th 2008. The Dow Jones Industrial Average closed yesterday at 8,579 which is a little bit under its close of 8,643 on March 10th 1998.
Does this mean that the global economy could have been put on hold ten years ago and life would be just the same today?
Probably not. People have to eat.
Does it mean that ten years worth of free market profit motive could have been thrown out the window, that executives could have been told to just keep 'er runnin' and life would be just the same as today?
Probably not. There have been winners and losers, and markets have adjusted for changing consumer preferences over the decade.
I don't know what it means.
But it does seem like an appropriate time to ask:
"Have we been making progress?"
I think that the answer is: Yes!
Though we may have to scale down our retirement plans, and start buying the less expensive brand of veggie burgers, our Wikipedia Content Index (WCI) just keeps growing:

And lest we ignore the importance of emerging global cute funds, think of how far we've come since 1998: a world without lolcats.
Thursday, September 25, 2008
Compiling x264 with MP4 support on OS X
I've been wanting to futz around with the internals of a modern video codec, and since H.264 seems like a pretty happening codec, and the x264 project seems pretty awesome, I downloaded their source and tried compiling it on my Mac. I hit a few roadbumps, so here are instructions for anyone else who wants to compile x264 on OS X:
Then, to test that everything worked, run this:
If you don't have git installed, you can find a nightly tarball of the x264 source code at ftp://ftp.videolan.org/pub/videolan/x264/snapshots/. Hope this helps someone!
wget http://downloads.sourceforge.net/gpac/gpac-0.4.4.tar.gz
open gpac # lazy-man's untar
git clone git://git.videolan.org/x264.git x264
cd gpac
find . -name os_net.c | xargs perl -pi -e 's/u_long/unsigned long/g'
./configure
make lib
make install-lib
cd ../x264
./configure --enable-mp4-output
make
Then, to test that everything worked, run this:
wget notlime.com/2008/h264/foreman_part_qcif.yuv
./x264 -v -q 20 -o foreman.mp4 foreman_part_qcif.yuv 176x144
open foreman.mp4
If you don't have git installed, you can find a nightly tarball of the x264 source code at ftp://ftp.videolan.org/pub/videolan/x264/snapshots/. Hope this helps someone!
Tuesday, September 2, 2008
Growing Triangle Vines
When I was in school I was addicted to making Processing sketches. I haven't been as active recently, but made these in the past month and a half.
The first is a visualization of a geometric computation that we used in the ICFP contest: Arcsin of R / D
And the second comes from a pen-and-paper sketch made on a legal pad during my WTC tenure: Iconic Growing Triangles
The first is a visualization of a geometric computation that we used in the ICFP contest: Arcsin of R / D
And the second comes from a pen-and-paper sketch made on a legal pad during my WTC tenure: Iconic Growing Triangles
Tuesday, August 26, 2008
Persistent Refactorings
Refactorings are usually one-shot deals, triggered through an IDE, saved, committed, forgotten.
Imagine for a moment though, that your revision control system was aware of refactorings, and recorded the refactoring command itself, not just the source text changes. You (or your IDE) might run this command:
Which would record the refactoring and which could be used to inform merges with other branches. Everyone loves easier merges of course, but if your automatic merge of two significantly-differently-structured-branches was nearly flawlessness, it would be possible to support two functionally equivalent, but architecturally different branches over a long timespan. In practice you could commit to either branch, and then pull from the other branch, relying on your version control system to refactor your changes so that the merge succeeded.
Why do this?
Perhaps you have a refactoring that reduces code-duplication but makes your codebase less readable, scannable, coherent. In this case you can branch and refactor, ending up with one easy-to-read branch, and one safe-to-modify-without-fear-of-missing-a-copy-pasted-version-of-the-same-function-in-another-module branch. Because the refactorings that we're considering here don't change the code functionally, these aren't really even branches, but simply two views on the same trunk.
What other cases can you see this being handy for?
Imagine for a moment though, that your revision control system was aware of refactorings, and recorded the refactoring command itself, not just the source text changes. You (or your IDE) might run this command:
svn refactor pullup myFunction
Which would record the refactoring and which could be used to inform merges with other branches. Everyone loves easier merges of course, but if your automatic merge of two significantly-differently-structured-branches was nearly flawlessness, it would be possible to support two functionally equivalent, but architecturally different branches over a long timespan. In practice you could commit to either branch, and then pull from the other branch, relying on your version control system to refactor your changes so that the merge succeeded.
Why do this?
Perhaps you have a refactoring that reduces code-duplication but makes your codebase less readable, scannable, coherent. In this case you can branch and refactor, ending up with one easy-to-read branch, and one safe-to-modify-without-fear-of-missing-a-copy-pasted-version-of-the-same-function-in-another-module branch. Because the refactorings that we're considering here don't change the code functionally, these aren't really even branches, but simply two views on the same trunk.
What other cases can you see this being handy for?
Wednesday, July 30, 2008
Prolog Instantiation Modes (and Python exit contexts)
I wrote a piece of Python code that does a funny little transform on the syntax tree of a function, and creates a new function that returns all the local variables defined at the exit point of the original function. When I explain this to someone, they ask "Why?" and I don't have a good answer, but usually I say that it's related to predicates in Prolog that have multiple calling modes. Which I'm going to try explaining here:
So, Prolog doesn't have functions, but it has something called "predicates" which are just as good. Predicates don't return a value, but any argument to a predicate can be an "output" variable, like a C reference parameter. Unlike C though, Prolog predicates often treat all of their parameters as outputs. For example, the function append can be used in (no less than) three different ways.
Through these three operations seem very different from the perspective of an imperative language, in Prolog append can be defined in two simple lines. The intuition about Prolog is that the runtime doesn't think of variables so much as inputs and outputs, but as "things I already know" and "thinks I don't know yet".
We can translate Prolog's append into Python (at some loss of conciseness and functionality), and then use the Python macro mentioned above to check that the variable bindings at the end of the function maintain the invariant that we expect:
So, this is pretty cool, we can pretend that we're writing a predicate instead of a function, assigning computed values to variables when we're able to discover them, and then examine those bindings after running the predicate.
Of course, Prolog's append can also be called with the first two arguments uninstantiated, but that sort of magic is much harder to fit into Python.
So, Prolog doesn't have functions, but it has something called "predicates" which are just as good. Predicates don't return a value, but any argument to a predicate can be an "output" variable, like a C reference parameter. Unlike C though, Prolog predicates often treat all of their parameters as outputs. For example, the function append can be used in (no less than) three different ways.
/* Appending, mode = input,input,output */
?- append([1,2,3],[4,5], X).
X = [1,2,3,4,5]
/* Trimming off a shared starting sequence, mode = input,output,input */
?- append([1,2,3],X,[1,2,3,4,5]).
X = [4,5]
/* Trimming off a shared ending sequence, mode = output,input,input */
?- append(X,[4,5],[1,2,3,4,5]).
X = [1,2,3]
Through these three operations seem very different from the perspective of an imperative language, in Prolog append can be defined in two simple lines. The intuition about Prolog is that the runtime doesn't think of variables so much as inputs and outputs, but as "things I already know" and "thinks I don't know yet".
We can translate Prolog's append into Python (at some loss of conciseness and functionality), and then use the Python macro mentioned above to check that the variable bindings at the end of the function maintain the invariant that we expect:
context = append_exit_context(None,[4,5],[1,2,3,4,5])
assert context['head'] + context['tail'] == context['result']
So, this is pretty cool, we can pretend that we're writing a predicate instead of a function, assigning computed values to variables when we're able to discover them, and then examine those bindings after running the predicate.
Of course, Prolog's append can also be called with the first two arguments uninstantiated, but that sort of magic is much harder to fit into Python.
Subscribe to:
Posts (Atom)