Tejus and I are building a Ruby on Rails site that needs both structured and unstructured search. Hamed suggested that we use Ultrasphinx, a Rails plugin that provides an interface to the Sphinx search backend.
I got everything downloaded and compiled, and had figured out how to debug the nil:NilClass errors that Ultrasphinx's configuration mini-language was generating, and then when I went to build the index for our database of seven documents... it seemed to hang. I was patient though, and let it run in the background for 10 minutes. This might be acceptable on a huge database, but... it was clear that something was wrong.
Several hours of debugging led me to the root cause: sphinx was assuming that the primary keys of the indexed table were sequential, and was creating a query for every 5000 rows between the min and max id of that table. With an auto_increment primary key, this is a valid assumption, but our data was being loaded by an ActiveRecord fixture which was generating random primary keys, so the range between min and max was nearly a billion, thus the number of queries was in the hundreds of thousands, all but seven of them returning nothing.
The solution of course, is to put explicit id's on your fixtures.
Subscribe to:
Post Comments (Atom)
2 comments:
wow, you just saved me like two weeks of tooth-grinding pain. thanks
You saved me, also, from a lot of head-banging and enabling mysql log and ... ;). Thanks. :-).
A side note, with the thinking-sphinx plugin, you can use:
production:
sql_range_step: 16777216
in your config/sphinx.yml to have it included in the generated config file.
Nice catch! :)
~Marcello
Post a Comment