Header Database Build Really Slow

This is the place to help test and discuss Version 6 Beta releases.

Header Database Build Really Slow

Postby scapino » Fri Jul 26, 2019 3:17 pm

I recently re-installed Windows to take care of some overall PC performance issues without really being mindful of backing up everything related to Newsbin. As a result I did a clean install of Newsbin. In trying to rebuild the header databases for some groups, I encountered one really large group that seems to be taking forever, with it at one point filling my drive with chunk files in the Import folder. After clearing the Import folder, I updated to the latest beta to see if the process would be more efficient. Currently my Cache shows:
Cache: 400/400 (21193)

The files are being processed, but about three hours ago it was at 21198, so at this rate it will take about 213 days to complete...that seems bad...
scapino
n00b
n00b
 
Posts: 8
Joined: Thu Mar 13, 2008 12:21 am

Registered Newsbin User since: 08/18/03

Re: Header Database Build Really Slow

Postby Quade » Fri Jul 26, 2019 5:16 pm

If we had to rebuild our search engine. It would probably take a month or more to process through all the headers, a couple months wouldn't be beyond reason. That's with a real database, server quality hardware and multiple feeders. The amount of data you're trying to chunk through is 100's of gigs. You can look in the import folder in the data folder and see how much data is really is. If the files end in ".gz" then the actual data being processed is 10 times greater than the disk space being used.

What kind of PC is this? What version of Newsbin is this?
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 43724
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Header Database Build Really Slow

Postby scapino » Fri Jul 26, 2019 5:49 pm

Windows 10 with I7-4790 and 32 GB RAM.

And yes, there are about 500 gb's+ of chuck data in the Import folder. After the first attempt, which caused me to run out of drive space, I deleted everything in that folder; cleared additional space; upgraded to the newest beta; and kicked off the process of downloading headers again. When that was still bringing back about 4 billion headers, I when looking through the settings and changed the Download age from 2000 days to 700. I stopped the download and started it again, but it came back with the same number of headers. I probably should have reset the application and/or rebooted first. I let it run and it took a couple of days just for the downloads.
Now at 21185 files in the Input.
scapino
n00b
n00b
 
Posts: 8
Joined: Thu Mar 13, 2008 12:21 am

Registered Newsbin User since: 08/18/03

Re: Header Database Build Really Slow

Postby scapino » Fri Jul 26, 2019 5:55 pm

Even though I am on a newer build, what if I went out to my Carbinite backup and pulled in all of the old .DB3 files and put them in the correct folder then restart the application. Will it just process them?
scapino
n00b
n00b
 
Posts: 8
Joined: Thu Mar 13, 2008 12:21 am

Registered Newsbin User since: 08/18/03

Re: Header Database Build Really Slow

Postby Quade » Fri Jul 26, 2019 6:36 pm

If you delete everything from the "Import" Folder in the data folder, then restore all the old DB3's into the data folder (I mean in the correct locations), everything should work.

It does depend a bit on how old the old one was. If it's from the last couple years everything should work.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 43724
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Header Database Build Really Slow

Postby scapino » Fri Jul 26, 2019 8:07 pm

Thanks Quade. I'll try that and see if it has us all breathing easy here on Mars.
scapino
n00b
n00b
 
Posts: 8
Joined: Thu Mar 13, 2008 12:21 am

Registered Newsbin User since: 08/18/03

Re: Header Database Build Really Slow

Postby nrbovee » Sun Feb 02, 2020 3:29 pm

Quade, I'm in a similar predicament due to a HD failure. I ended up buying a new machine, with an AMD 3900X 12 core processor and 32 gigs of RAM. Due to the total size of the chunks in the import folder I had to put a 4 gig internal HDD in it. Is HD speed the determine factor in the rate at which the files get processed? I've been using Task Monitor and both RAM usage and CPU usage are quite modest, with the HD being pegged at darned near 100%. I didn't back up the spool folder on old machine and doubt I can get it back. Would I benefit from buying a couple of 2 TB SSD's and RAID them together? Right now I have 17,000 files in the import folder.
nrbovee
n00b
n00b
 
Posts: 4
Joined: Mon Sep 02, 2013 11:19 pm

Re: Header Database Build Really Slow

Postby Quade » Sun Feb 02, 2020 7:20 pm

The question you might ask is whether you really need to keep 4000 days worth of data for all of your groups. If I was starting from scratch, I'd set the "download age" to maybe 30 days, download headers for my groups to get caught up, then pick some groups I want more from and "download all" from these groups. Some groups I'd just avoid for header downloads, like Boneless and Amazing and some of the other dump groups.

One of the main things causing slow imports these days is people are posting with no formatting so, it's impossible to combine posts into files. When that happens instead of one entry in the database, there might be 50,000. In cases like this it's impossible to get good performance.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 43724
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Header Database Build Really Slow

Postby nrbovee » Sat Feb 15, 2020 1:39 pm

Thanks for the tip. What I found worked was set download age to 30, delete the groups, then add them back again. I kept some of the downloaded headers (some of which are .txt files rather than .gz) and I'm adding them back in since I left max download age alone. I had 10 years of headers built up over time on my machine that died, It was not a particularly fast machine, but did the trick. Now I need a backup strategy for the .db3 files.
nrbovee
n00b
n00b
 
Posts: 4
Joined: Mon Sep 02, 2013 11:19 pm

Re: Header Database Build Really Slow

Postby Quade » Sat Feb 15, 2020 2:52 pm

You can right click "Post Storage/Use Download Age" to get the same effect without deleting the groups and re-adding. You can also delete all the existing data from that same menu.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 43724
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97


Return to Newsbin Version 6 Beta Support

Who is online

Users browsing this forum: No registered users and 2 guests