Why does tokyo tyrant slow down exponentially even after adjusting bnum? -


Has anyone successfully used Tokyo cabinet / Tokyo dictator with a large dataset? I am trying to upload the users of Wikipedia datasource. After hitting almost 30 million records, I get exponentially slowed down with both the HDB and BDB database. I have adjusted the expected number of records for the HDB case to 2-4x, with only a slight speed I also set up 1 GB or GMIS, but eventually I still hit a wall.

It seems that Tokyo dictator is basically in memory database and when you exceed xmsim or your RAM, you have a difficult database. Has anyone else had to face this problem before this? Were you able to solve it?

I think I have cracked it, and I have not seen this solution elsewhere is. On Linux, there are usually two reasons that Tokyo starts slowing down. Walk through ordinary criminals. First of all, it is that if you set your bnum very little, you want it to be less than half the number of items in the hash. (Previously higher.) Second, you want to try to get close to the size of the bucket array to your xmsiz. To get the size of the bucket array, simply create an empty DB with the right bnum and Tokyo will initialize the file in the appropriate size. (For example, BNU = 200000000 is about 1.5 GB for an empty DB.)

But now, you will see that it still slow down, though with a little distance. We discovered that the move was to close journaling in the file system - for some reason the size of your hash file as journaling (ext3) spikes goes beyond 2-3 GB. (The way we realized that it was spikes in I / O, do not correspond to the file changes on the disk, with the Dion CPU burst kegernal)

For Linux, only unmount and your ext3 partition Mount as an ext2. Create your own DB, and remount as ext3 when journaling was disabled, we can create a 180 MB key size DB without a problem.


Comments

Popular posts from this blog

c++ - Linux and clipboard -

What is expire header and how to achive them in ASP.NET and PHP? -

sql server - How can I determine which of my SQL 2005 statistics are unused? -