Filter Incompletes

This is the place to help test and discuss Version 6 Beta releases.

Filter Incompletes

Postby mkrubsack » Sat Jul 03, 2021 11:44 am

I'm using Beta 12, and can't figure out how to filter incompletes so they don't show up. I searched the forum, and it seems old versions of Newsbin had a switch to hide incompletes. Is there a filter to do the same with the current beta versions.
mkrubsack
Active Participant
Active Participant
 
Posts: 58
Joined: Sat Aug 25, 2012 5:39 pm
Location: California

Registered Newsbin User since: 08/25/12

Re: Filter Incompletes

Postby Quade » Sun Jul 04, 2021 2:13 am

I don't believe there is an explicit incomplete filter. You can often get the same effect by setting the minimum size filter to a reasonable size. Particularly if you're trying to filter out the spammy posts.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44365
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Filter Incompletes

Postby fred95901 » Sun Aug 01, 2021 2:04 pm

Quade are you going to put the "hide incompletes" button back in the program?
fred95901
n00b
n00b
 
Posts: 9
Joined: Wed Jun 16, 2021 7:27 am

Registered Newsbin User since: 01/11/12

Re: Filter Incompletes

Postby Quade » Sun Aug 01, 2021 3:25 pm

The control is already in RC1. I just need to hook it up.

Size filters will remove most of the spam incompletes too.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44365
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Filter Incompletes

Postby stavros » Sun Aug 01, 2021 11:26 pm

Hi Quade,

is there a way to filter out posts, on an import filter, that are between 700kb and 760kb,
but keep all smaller and larger posts?

I tried this some time ago and couldn't get it to work - I kept filtering out everything
smaller than the upper value - not sure what I was doing wrong.

regards
Stavros.
stavros
Active Participant
Active Participant
 
Posts: 95
Joined: Wed Dec 24, 2003 2:07 am

Registered Newsbin User since: 12/24/03

Re: Filter Incompletes

Postby Quade » Mon Aug 02, 2021 1:20 am

I don't think you can. What smaller files do you want? NFO files bypass the size filters by default.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44365
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Filter Incompletes

Postby stavros » Mon Aug 02, 2021 1:46 am

I am not happy throwing away data unless I absolutely have to, so my initial thought is to keep everything
in case I need it later. It is with a heavy heart that I consider throwing away data between 700kb and 760kb,
but it seems like the lesser of two evils :-)

I'd rather not throw away data from 0kb to 760kb as that would include smaller text files and ebooks, as well as rar'd/zipped files.

regards
Stavros
stavros
Active Participant
Active Participant
 
Posts: 95
Joined: Wed Dec 24, 2003 2:07 am

Registered Newsbin User since: 12/24/03

Re: Filter Incompletes

Postby Quade » Mon Aug 02, 2021 9:24 am

Nothing actually gets removed as long as you don't add this as a header download filter. You can always disable the filters with a click and see everything once more. The display filters just change how the data looks. It doesn't delete anything.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44365
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Filter Incompletes

Postby stavros » Mon Aug 02, 2021 2:06 pm

But I do want to remove the headers that fall between 700kb and 760kb. I would like to do this as an import (header d/l) filter
so that all these useless headers do not get loaded into the DB.
That's why I want to keep the filtering of data to the absolute minimum necessary to achieve the goal.

Once I've done that, then I'll try and figure out some way to tidy the ones already loaded out of the DB with some SQL.
stavros
Active Participant
Active Participant
 
Posts: 95
Joined: Wed Dec 24, 2003 2:07 am

Registered Newsbin User since: 12/24/03

Re: Filter Incompletes

Postby JayPea » Tue Aug 03, 2021 3:07 am

stavros wrote:But I do want to remove the headers that fall between 700kb and 760kb. I would like to do this as an import (header d/l) filter


This also interests me greatly, but I'd be looking at a much smaller size range. Please could a control like this be considered for future builds? I appreciate it's a bit late in the day to start adding more features.

Cheers.
JayPea
Active Participant
Active Participant
 
Posts: 82
Joined: Tue Dec 12, 2006 2:48 pm

Registered Newsbin User since: 12/11/06

Re: Filter Incompletes

Postby fred95901 » Wed Aug 04, 2021 11:18 pm

I have installed 6.90RC1 and it has a switch that says "show incompletes" where it used to say "hide incompletes". Can that be changed back to HIDE instead of SHOW?

Thanks
Fred
fred95901
n00b
n00b
 
Posts: 9
Joined: Wed Jun 16, 2021 7:27 am

Registered Newsbin User since: 01/11/12

Re: Filter Incompletes

Postby Quade » Thu Aug 05, 2021 10:02 am

But I do want to remove the headers that fall between 700kb and 760kb. I would like to do this as an import (header d/l) filter


You understand that ALL files are made from numerous chunks that range from 250K to 1 meg or so. If you filter out posts at the header filter level from 700-760K in size, I expect you'll be tossing out good posts too. The chunk size used when people post good files is completely arbitrary.

You'd be better off deleting them from the header DB after they get a couple header downloads old. Than you'll know that more posts probably aren't coming to group with them. Maybe "obscured and 760K" would work.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44365
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Filter Incompletes

Postby stavros » Fri Aug 06, 2021 8:45 am

OK, I didn't understand about the limitation at header filter level being different to the post import size filter, as presented by newsbin header display. damn.

My main worry about letting all of these useless headers in to the DB, and then removing them,
is that it uses up quite a bit of space, it takes a while to load, then even longer/manual effort to delete them and leaves the DB in a fragmented state.

I suppose I'll need to force a vacuum of each of the DBs each time they're tidied, just to reclaim the space and hopefully
re-org the data into something more efficient for local searches.

I could check the subject to be obscured as well as within the size limit, I suppose, but I assume that would still cast a wide net as an import filter :-(

Is there a way to automatically apply a filter on a group to be applied immediately after an import completes, that would be able to delete
these useless headers based on size and kick off a vacuum to reclaim the space?
stavros
Active Participant
Active Participant
 
Posts: 95
Joined: Wed Dec 24, 2003 2:07 am

Registered Newsbin User since: 12/24/03

Re: Filter Incompletes

Postby Quade » Fri Aug 06, 2021 11:57 am

You can do it from the group right click menu. "Post Storage/Compact Group".

Every "file" on usenet bigger than 250K-1M is split up into multiple parts. Newsbin combines these posts together and generates a representation of a file, then it combines the files together to generate a representation of a set of files.

So, anything you do to filter headers before they hit Newsbin has to keep that in mind.

I just ignore the incompletes. I size filter when I load the group and I limit my groups to only load the last 30 days worth of headers unless I want to look farther back in time. If you check more often, even 30 days might be too long. The shorter the time, the less Newsbin loads.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44365
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Filter Incompletes

Postby stavros » Fri Aug 06, 2021 12:07 pm

I normally only load the last 7 days of headers - but even so, some groups can have 100,000 plus headers loaded, most of them useless.

If I could have a filter that limited the displayed headers to the between 700kb and 760kb, then I could select them all and delete them from the group,
then use the group right click menu option to compact the group.

Even with that, with over 400 groups subscribed, that's a lot of manual effort involved tidying things up....
stavros
Active Participant
Active Participant
 
Posts: 95
Joined: Wed Dec 24, 2003 2:07 am

Registered Newsbin User since: 12/24/03

Re: Filter Incompletes

Postby malhusky » Thu Sep 09, 2021 12:18 am

Quade, you mentioned "header download filters" and "display filters". How do we set up header download filters? I'm assuming that's different from the Filter Options button. Sure would be nice to eliminate spam posts BEFORE they get stored. I would think that would also speed up processing of the good posts.
malhusky
Occasional Contributor
Occasional Contributor
 
Posts: 49
Joined: Sat Apr 28, 2012 1:01 am

Registered Newsbin User since: 12/13/10

Re: Filter Incompletes

Postby Quade » Thu Sep 09, 2021 9:45 am

Keep in mind that header filtering at the download level throws away data permanently. You'd have to re-download headers again over the impacted range to get the data back.

That said I use header filtering.

1 - Generate a new filter profile with things you want to reject or accept. Mine contains:
"PRiVATE.*\[newzNZB\]"
"[.]exe[.]rar"
"[a-z0-9]{24}.*[a-z0-9]{24}"

As "Subject Reject" You can filter specific posters out too.

"[a-z0-9]{24}" Is a "poster reject" I have this in my header filter.

2 - In the options select "Open Data Folder" then exit Newsbin.

3 - The default configuration file is named "Newsbin.nbi". You can edit it with wordpad or another text editor. I use "visual studio code" which is a free full featured editor from MS

[ALT.BINARIES.BLA]
Active=1
DisplayAge=0
DownloadFilter=HeaderFilter
Parent=Stuff


Each group has an entry like this one. Adding you new filter like the bolded line with the name of the new filter you generated will apply it to the group.

[STUFF]
DownloadFilter=HeaderFilter


Assuming you've moved your groups into different topics, you can add the filter to the parent of the groups and it'll apply to all groups in the topic.

Testing

1 - While working with this, you might want to add this to settings

[Settings]
SaveGZ = 1


Instead of deleting the header downloads, it'll save them in the "Processed" folder in the data folder. New header downloads go into "Import" and finished header downloads go into "Processed" so while testing this, you can save the header downloads, play with the filters. Moving the files from "Processed" back to the "Import" will re-feed the headers.

2 - If you have a specific large group you don't want to mess up. You can go into the "spool_v6" folder in the data folder and make a copy of any group or copy them to a different folder. Then you can wipe and feed headers over and over to a specific group while testing the filters and simply copy the saved group back when you're done

3 - Make a backup of the NBI before you edit it so you can always copy the backup over the one you've changed.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44365
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Filter Incompletes

Postby malhusky » Fri Sep 10, 2021 9:35 pm

I agree it could cause issues if the filter doesn't work as you expect it to. But as you show you can always recover. Thanks for the info. This should help a lot and I think it's becoming more important as the spam becomes more prevalent.

If it wasn't for that potential downside risk I would think adding a DELETE option to the standard filters would be relatively easy. Might need a pop-up warning before actually executing a filter that contains that option.
malhusky
Occasional Contributor
Occasional Contributor
 
Posts: 49
Joined: Sat Apr 28, 2012 1:01 am

Registered Newsbin User since: 12/13/10


Return to Newsbin Version 6 Beta Support

Who is online

Users browsing this forum: No registered users and 5 guests

cron