How to filter out filenames containing "index" etc

Tips on writing regular expressions for searching the post list

Moderators: Quade, dexter

How to filter out filenames containing "index" etc

Postby andy » Sat Jun 14, 2008 3:10 am

I set filename reject filters thus
index
idx
ndx
!index
... and several other variations ...

and I still get hundreds of filenames containing those words.
I read the FAQs and went to the abysmally unhelpful regularexpressions.crap without a clue. How about a simple answer to what should be a fair description of my simple problem? Thanks.


ps
I can't find a button to add my avatar (maybe automatic?)
User avatar
andy
Seasoned User
Seasoned User
 
Posts: 141
Joined: Fri Apr 26, 2002 2:26 am
Location: Houston, Texas (TX) USA

Registered Newsbin User since: 04/04/03

Postby itimpi » Sat Jun 14, 2008 5:09 am

You did not make it clear if these files were actually being downloaded or merely displayed? You DO realise that filename reject filters only take place AFTER the download starts as that is the only point at which the filename is known for certain? If you want filters to take place prior to download then you need to use Subject filters.
The Newsbin Online documentation
The Usenettools for tutorials, useful information and links
User avatar
itimpi
Elite NewsBin User
Elite NewsBin User
 
Posts: 12604
Joined: Sat Mar 16, 2002 7:11 am
Location: UK

Registered Newsbin User since: 03/28/03

Files downloaded

Postby andy » Tue Jul 01, 2008 1:53 am

Pardon my inexactitude.

I set the filters to filter out filenames with "index" or parts thereof. Hundreds of unwanted files are actually downloaded. I'm not trying to keep them off the list of available files; I'm trying to NOT download them.

There isn't enough time in a day to individually reject what amounts to thousands of "index" files in the 50 or so image groups I want to monitor.

I don't want ANY of the damned "index" files to download.

thanks.
User avatar
andy
Seasoned User
Seasoned User
 
Posts: 141
Joined: Fri Apr 26, 2002 2:26 am
Location: Houston, Texas (TX) USA

Registered Newsbin User since: 04/04/03

Postby itimpi » Tue Jul 01, 2008 4:37 am

I am not sure what to suggest :(

I just did a test where I set up a reject filter for "index" as both a subject and filename reject filter. I then assigned the filter as a profile to my selected group. I got no filenames containing index downloaded, and quite a few rejected so I know the filter was working.
The Newsbin Online documentation
The Usenettools for tutorials, useful information and links
User avatar
itimpi
Elite NewsBin User
Elite NewsBin User
 
Posts: 12604
Joined: Sat Mar 16, 2002 7:11 am
Location: UK

Registered Newsbin User since: 03/28/03

subject AND filename filter together is very bad

Postby andy » Wed Jul 02, 2008 12:51 am

In image groups, the subject line often contains the phrase "see index", so filtering using the subject line is a very bad approach. That eliminates many desirable files.

I reinstalled the program, and on today's download of 33,000 images I got none of the index files, so I suppose the reinstall did the trick.

Problem solved. Thanks for the help.
User avatar
andy
Seasoned User
Seasoned User
 
Posts: 141
Joined: Fri Apr 26, 2002 2:26 am
Location: Houston, Texas (TX) USA

Registered Newsbin User since: 04/04/03

Postby bobkoure » Thu Jul 10, 2008 10:00 am

You could use
(?<!see.{1,4})index
to match "index" but not "see index"

Or you could use
\.index
to match ".index"
Note that, without a backslash, "." means "any character" so if you were using
.index
that was basically matching any line with "any char" and then "index", so, yeah, lines with "see index" were being matched.

There are a lot of resources on the web. If you don't like one, try another. Different teaching approaches work better for different folks.
To be fair, the (?< stuff is advanced, but \. is not - just a matter of knowing what characters are "special" and need to be preceded with a \
bobkoure
 

Postby andy » Thu Jul 10, 2008 10:46 am

I'll try your suggestion. Thanks!
User avatar
andy
Seasoned User
Seasoned User
 
Posts: 141
Joined: Fri Apr 26, 2002 2:26 am
Location: Houston, Texas (TX) USA

Registered Newsbin User since: 04/04/03


Return to Regular Expressions

Who is online

Users browsing this forum: No registered users and 2 guests