Tuesday, December 30, 2008

PHP's strtotime

I generally think of PHP as the buck-toothed high-school drop-out sibling of Perl (itself the unbathed cousin of Python and Ruby). But today I discovered the library function strtotime, which is a very flexible parser for text expressing a date/time. One cool example is a snippet that calculates the current year's Thanksgiving day.

The closest Python equivalent to strtotime seems to be dateutil.parser.parse, though it's much less flexible from a cursory glance. Of course, if a person is only trying to compute "common moveable Christian feasts that can be deduced from the date of Easter Sunday" they need look no further than the mx.DateTime.Feasts library (which includes translations into German and French). Weird. And awesome.

Friday, December 5, 2008

Not Lazy Enough

It occurred to me that many performance problems when dealing with large datasets are due to programs not being "lazy enough". My example problem was "ALTER TABLE" statements in a database, which in most implementations will read, modify, and write every row in the table. Because with large tables this is a significant performance hit, we rarely see people doing "dynamic language things" i.e. altering their schemas in application logic.

One reason for being non-lazy (some say: strict) is to ensure consistency across your dataset. If exact consistency is not a requirement, object/document stores like CouchDB or AppEngine's datastore can provide inexpensive "ALTER TABLE" behavior. With these systems, outdated/unaltered rows only need to be upgraded/altered when they're fetched from the datastore. Here's a case where deferring the "ALTER TABLE" code allows us to write more flexible/dynamic programs because our data management is lazier.

Code can be too lazy though, resulting in bad caching behavior and sometimes huge memory space growth.

Any other examples of cases where laziness/demand-driven-evaluation would be a performance win?

Friday, October 10, 2008

Ten Years of Progress

"I'm afraid this is very bad. The stones tell me the boar god came from far to the west."


Hello from October 10th 2008. The Dow Jones Industrial Average closed yesterday at 8,579 which is a little bit under its close of 8,643 on March 10th 1998.

Does this mean that the global economy could have been put on hold ten years ago and life would be just the same today?
Probably not. People have to eat.

Does it mean that ten years worth of free market profit motive could have been thrown out the window, that executives could have been told to just keep 'er runnin' and life would be just the same as today?
Probably not. There have been winners and losers, and markets have adjusted for changing consumer preferences over the decade.

I don't know what it means.
But it does seem like an appropriate time to ask:
"Have we been making progress?"

I think that the answer is: Yes!

Though we may have to scale down our retirement plans, and start buying the less expensive brand of veggie burgers, our Wikipedia Content Index (WCI) just keeps growing:


And lest we ignore the importance of emerging global cute funds, think of how far we've come since 1998: a world without lolcats.

Thursday, September 25, 2008

Compiling x264 with MP4 support on OS X

I've been wanting to futz around with the internals of a modern video codec, and since H.264 seems like a pretty happening codec, and the x264 project seems pretty awesome, I downloaded their source and tried compiling it on my Mac. I hit a few roadbumps, so here are instructions for anyone else who wants to compile x264 on OS X:


wget http://downloads.sourceforge.net/gpac/gpac-0.4.4.tar.gz
open gpac # lazy-man's untar
git clone git://git.videolan.org/x264.git x264
cd gpac
find . -name os_net.c | xargs perl -pi -e 's/u_long/unsigned long/g'
./configure
make lib
make install-lib

cd ../x264
./configure --enable-mp4-output
make


Then, to test that everything worked, run this:

wget notlime.com/2008/h264/foreman_part_qcif.yuv
./x264 -v -q 20 -o foreman.mp4 foreman_part_qcif.yuv 176x144
open foreman.mp4


If you don't have git installed, you can find a nightly tarball of the x264 source code at ftp://ftp.videolan.org/pub/videolan/x264/snapshots/. Hope this helps someone!

Tuesday, September 2, 2008

Growing Triangle Vines

When I was in school I was addicted to making Processing sketches. I haven't been as active recently, but made these in the past month and a half.

The first is a visualization of a geometric computation that we used in the ICFP contest: Arcsin of R / D

And the second comes from a pen-and-paper sketch made on a legal pad during my WTC tenure: Iconic Growing Triangles

Tuesday, August 26, 2008

Persistent Refactorings

Refactorings are usually one-shot deals, triggered through an IDE, saved, committed, forgotten.

Imagine for a moment though, that your revision control system was aware of refactorings, and recorded the refactoring command itself, not just the source text changes. You (or your IDE) might run this command:

svn refactor pullup myFunction


Which would record the refactoring and which could be used to inform merges with other branches. Everyone loves easier merges of course, but if your automatic merge of two significantly-differently-structured-branches was nearly flawlessness, it would be possible to support two functionally equivalent, but architecturally different branches over a long timespan. In practice you could commit to either branch, and then pull from the other branch, relying on your version control system to refactor your changes so that the merge succeeded.

Why do this?

Perhaps you have a refactoring that reduces code-duplication but makes your codebase less readable, scannable, coherent. In this case you can branch and refactor, ending up with one easy-to-read branch, and one safe-to-modify-without-fear-of-missing-a-copy-pasted-version-of-the-same-function-in-another-module branch. Because the refactorings that we're considering here don't change the code functionally, these aren't really even branches, but simply two views on the same trunk.

What other cases can you see this being handy for?

Wednesday, July 30, 2008

Prolog Instantiation Modes (and Python exit contexts)

I wrote a piece of Python code that does a funny little transform on the syntax tree of a function, and creates a new function that returns all the local variables defined at the exit point of the original function. When I explain this to someone, they ask "Why?" and I don't have a good answer, but usually I say that it's related to predicates in Prolog that have multiple calling modes. Which I'm going to try explaining here:

So, Prolog doesn't have functions, but it has something called "predicates" which are just as good. Predicates don't return a value, but any argument to a predicate can be an "output" variable, like a C reference parameter. Unlike C though, Prolog predicates often treat all of their parameters as outputs. For example, the function append can be used in (no less than) three different ways.

/* Appending, mode = input,input,output */
?- append([1,2,3],[4,5], X).
X = [1,2,3,4,5]

/* Trimming off a shared starting sequence, mode = input,output,input */
?- append([1,2,3],X,[1,2,3,4,5]).
X = [4,5]

/* Trimming off a shared ending sequence, mode = output,input,input */
?- append(X,[4,5],[1,2,3,4,5]).
X = [1,2,3]


Through these three operations seem very different from the perspective of an imperative language, in Prolog append can be defined in two simple lines. The intuition about Prolog is that the runtime doesn't think of variables so much as inputs and outputs, but as "things I already know" and "thinks I don't know yet".

We can translate Prolog's append into Python (at some loss of conciseness and functionality), and then use the Python macro mentioned above to check that the variable bindings at the end of the function maintain the invariant that we expect:

context = append_exit_context(None,[4,5],[1,2,3,4,5])
assert context['head'] + context['tail'] == context['result']


So, this is pretty cool, we can pretend that we're writing a predicate instead of a function, assigning computed values to variables when we're able to discover them, and then examine those bindings after running the predicate.

Of course, Prolog's append can also be called with the first two arguments uninstantiated, but that sort of magic is much harder to fit into Python.

Saturday, July 19, 2008

Slow Sphinx Indexing

Tejus and I are building a Ruby on Rails site that needs both structured and unstructured search. Hamed suggested that we use Ultrasphinx, a Rails plugin that provides an interface to the Sphinx search backend.

I got everything downloaded and compiled, and had figured out how to debug the nil:NilClass errors that Ultrasphinx's configuration mini-language was generating, and then when I went to build the index for our database of seven documents... it seemed to hang. I was patient though, and let it run in the background for 10 minutes. This might be acceptable on a huge database, but... it was clear that something was wrong.

Several hours of debugging led me to the root cause: sphinx was assuming that the primary keys of the indexed table were sequential, and was creating a query for every 5000 rows between the min and max id of that table. With an auto_increment primary key, this is a valid assumption, but our data was being loaded by an ActiveRecord fixture which was generating random primary keys, so the range between min and max was nearly a billion, thus the number of queries was in the hundreds of thousands, all but seven of them returning nothing.

The solution of course, is to put explicit id's on your fixtures.

Thursday, July 17, 2008

ICFP 2008

Last weekend, I went over to the Monroe Drive House and wrote code for the International Conference on Functional Programming Contest. Thankfully, the ICFP Contest doesn't require that your implementation language be purely functional (or even mostly functional), so we (Mark, Martin, and Erik) wrote our entry in python, using twisted to handle network events. As we hacked on procedural code downstairs, Alex and Lindsey wrote very pure Scheme code upstairs.

We didn't do anything particularly fancy, just used a PID controller to adjust the driver's angle towards the goal, and wrote some geometry routines to detect when we were on a collision course with an obstacle, and then plotted a course in whichever direction around the obstacle looked shorter. We also moved away from Martians if they were too close to us (and facing us). We talked about several more complex tactics, but didn't wind up with the extra time (or brainpower) to implement them.

We had an awful version control experience with mercurial: constant permissions errors in the remote repository, the need to manually "hg up" on the remote server, and flukey merges. I've had good luck in the past with mercurial on (a PyWeek entry) stochasm, and some of the trouble this time was because we were using an SSH repository, rather than the svnserve style that Drew setup on stochasm.

Finally, our entry, the code.

Friday, May 23, 2008

Kitten Naming

Gabi and I have acquired a set of four kittens and a momma-cat. They're a bit wild, and prone to hissing, but we feed them, and they rumpus in our backyard.

We named momma-cat "Olestra" because it has a classical ring to it, and gave the four kittens the names: Sprint, Cingular, (T-) Mobile, and Verizon. We dropped the "T dash" from Mobile's name because it was confusing - what does the "T" mean anyways? Both Sprint and Mobile are grey-tan tabbies, we're not sure which is which yet. Cingular is the fuzzy brown-black one, and Verizon is skinny and all-black.

Naming critter-litters is a bit like naming servers: you want to pick your names from a category that is roughly the same size as your things-to-be-named.

Tuesday, March 11, 2008

Avoiding ThreadDeath with env.js

At Appcelerator we use John Resig's "simulated browser environment for Rhino" as part of our IDE. I'd had intermittent problems with a ThreadDeath error being thrown, which would then cause any other thread running Rhino to hang.



I wasn't sure if the problem was in my Java code, my JavaScript, Rhino's Java code, or Aptana's Java code (we built our IDE atop their HTML/CSS/JS editor). Turns out it wasn't any of those! It was in that "simulated browser environment" code, in window.clearInterval where the thread spawned with setInterval is killed. I puzzled for a moment over why Mr.Resig was using multiple threads rather than a single one for setIntervals (expediency I assume), and then changed two lines so that it doesn't kill the thread, but allows it to die of natural causes.


window.setInterval = function(fn, time){
var num = timers.length;

timers[num] = new java.lang.Thread(new java.lang.Runnable({
run: function(){
while (true) {
while (timers[num]) {
java.lang.Thread.currentThread().sleep(time);
fn();
}
}
}));

timers[num].start();

return num;
};

window.clearInterval = function(num){
if ( timers[num] ) {
timers[num].stop();
delete timers[num];
}
};


Another example of a day of debugging yielding a two line fix. Ugh.

Friday, February 29, 2008

Dodging the DVCS Trainwreak

So what happens when the darcs or mercurial project goes the way of CVS (that is: to bit-heaven)? How do you rescue your distributed repository from legacy-software-land?


Well, the Tailor project provides convertors between different repository formats, even some of the more obscure distributed VCSes. It seems to work in an all-at-once mode, rather than mirroring one live repository in other formats, but it does full history support, so...

There's no harm in picking a bizarre distributed version control system today, if the project stalls, you can always upgrade your repository to whichever system becomes mainstream!

Wednesday, February 27, 2008

Atlhack distributed version control roundup


Alex writes about the advantages of git, Erik shows us how to configure git to serve over http, Lindsey is using darcs, Drew has been using mercurial for his personal projects (Miru, Yue), and I've been using darcs.


Are there any DVCSes we haven't tried? It seems that Mr Bicking has the full list.