Need Help - Regular Expressions in Watch Topics

Tips on writing regular expressions for searching the post list

Moderators: Quade, dexter

Need Help - Regular Expressions in Watch Topics

Postby BZee » Tue Jun 25, 2013 7:46 am

I working on my watch topics and have very little experience with regular expressions. If I'm looking for a person named Sam Cook and use that name as a watch topic it finds "Sam Cook" and "Cook, Sam" (good) but also finds names like Sam Lacook or Cooke, Sam (unwanted).

(Sam*.Cook) works properly except Sam and Cook have to be in that order so I have to use (Sam*.Cook)|((Cook*.Sam). Can someone experienced in regular expressions explain the functions of "()", "*", "." and especially "*." (I understand | = or). I've looked in several books and/or reference guides on regular expressions but really can't get it into my head how (Sam*.Cook) functions (I guess I need a "grade school level" explanation. :?


Edit:
Why does Rescan Topic with "Crange" or "(Crange)" give one match but the same Rescan Topic with "(Sam*.Cook)|(Cook*.Sam)|(Crange)" shows no matches???
BZee
Seasoned User
Seasoned User
 
Posts: 459
Joined: Thu Sep 27, 2001 9:10 pm
Location: California

Registered Newsbin User since: 04/13/03

Re: Need Help - Regular Expressions in Watch Topics

Postby Quade » Tue Jun 25, 2013 10:42 am

1 - Just use "Sam Cook" and it'll match any combination of those two words. The RE's for watch lists and RE's in general in 6.50 can be more simplified than in older versions. I just use words "this that 720p s03" for instance.

2 - When you rescan, you're adding files to those you already found so, your counts might not go up.

3 - If you rescan with it set to "internet" it doesn't use RE's at all so, "sam cook -german" would be valid for a search based re-scan. "-german" means remove results that contain the word "german".

4 - Don't forget the day range on the re-scan.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44867
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Need Help - Regular Expressions in Watch Topics

Postby BZee » Wed Jun 26, 2013 8:10 am

BZee wrote:Why does Rescan Topic with "Crange" or "(Crange)" give one match but the same Rescan Topic with "(Sam*.Cook)|(Cook*.Sam)|(Crange)" shows no matches???

Could not duplicate - may have had an extraneous space somewhere in the expression.


Scanning headers and trying to match "Sam Cook" "Cook, Sam" or "Sam.Cook".
Quade wrote:1 - Just use "Sam Cook" and it'll match any combination of those two words. The RE's for watch lists and RE's in general in 6.50 can be more simplified than in older versions. I just use words "this that 720p s03" for instance.

Sam Cook also matches names like "Samson Lacook" or "Cookie Sams" giving many unneeded downloads. (Sam Cook) is better but still matches names like "Samson Lacook". I found an example regular expression similar to (Sam*.Cook). I don't know what the *. does but it seems to work if I use (Sam*.Cook)|(Cook*.Sam)
BZee
Seasoned User
Seasoned User
 
Posts: 459
Joined: Thu Sep 27, 2001 9:10 pm
Location: California

Registered Newsbin User since: 04/13/03

Re: Need Help - Regular Expressions in Watch Topics

Postby Quade » Wed Jun 26, 2013 9:41 am

1 - .* means "anything in between". So, "Sam.*cook" will still match "samson cookies"

"sam\s cook\s" requires a space after each word. That might work better for you. "sam[.] cook[.]" would require a "." after each word (seems like a common nomenclature).

2 - Spaces are important now. They trigger AND operations.

(Sam*.Cook)|(Cook*.Sam)


3 - There's nothing wrong with this RE but, it'll match as much as just plain "sam cook". You probably need to make it more exclusive by requiring spaces after the word or "."'s depending on what you're searching for.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44867
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Need Help - Regular Expressions in Watch Topics

Postby BZee » Thu Jun 27, 2013 5:03 am

Thanks for your help. I'll do some experimenting.
BZee
Seasoned User
Seasoned User
 
Posts: 459
Joined: Thu Sep 27, 2001 9:10 pm
Location: California

Registered Newsbin User since: 04/13/03

Re: Need Help - Regular Expressions in Watch Topics

Postby TheWanderer » Sat Jul 20, 2013 3:20 pm

BZee wrote:Can someone experienced in regular expressions explain the functions of "()", "*", "." and especially "*."

. = any character, but not beginning or end of string
* = 0 or more of the preceding character
+ = 1 or more of the preceding character

.* = 0 or more of any character, but not beginning or end of string
.+ = 1 or more of any character, but not beginning or end of string

BZee wrote:Sam Cook as a watch topic finds "Sam Cook" and "Cook, Sam" (good) but also finds names like Sam Lacook or Cooke, Sam (unwanted).

^(?=.*(?<![a-z])sam(?![a-z]))(?=.*(?<![a-z])cook(?![a-z]))

This expression will match any line that has the words (in any position) "sam" and "cook" preceded or followed by any character except for a letter, or preceded or followed by nothing at all (beginning or end of line)
Last edited by TheWanderer on Mon Jul 22, 2013 1:33 am, edited 1 time in total.
TheWanderer
n00b
n00b
 
Posts: 8
Joined: Sat Jul 20, 2013 12:53 am

Registered Newsbin User since: 04/21/13

Re: Need Help - Regular Expressions in Watch Topics

Postby BZee » Sun Jul 21, 2013 5:37 am

TheWanderer wrote:(?=.*(?<![a-z])sam(?![a-z]))(?=.*(?<![a-z])cook(?![a-z]))

This expression will match any line that has the words (in any position) "sam" and "cook" preceded or followed by any character except for a letter, or preceded or followed by nothing at all (beginning or end of line)


Thanks. As I have time I'll study your expression.
BZee
Seasoned User
Seasoned User
 
Posts: 459
Joined: Thu Sep 27, 2001 9:10 pm
Location: California

Registered Newsbin User since: 04/13/03

Re: Need Help - Regular Expressions in Watch Topics

Postby TheWanderer » Sun Jul 21, 2013 5:33 pm

This might help anyone reading this to break it down easier.

^(?=.*(?<![a-z])sam(?![a-z]))(?=.*(?<![a-z])cook(?![a-z]))

The only combinations it will not find are samcook and cooksam. These two could be added in a few different ways but it would make the expression very specific to the words sam and cook.

Edit:
I am changing the expression by adding a carat ^ at the beginning.
It worked perfect as a double positive look ahead, but turned nasty if changed into a mixed positive and negative look ahead without the beginning of string ^
TheWanderer
n00b
n00b
 
Posts: 8
Joined: Sat Jul 20, 2013 12:53 am

Registered Newsbin User since: 04/21/13

Re: Need Help - Regular Expressions in Watch Topics

Postby TheWanderer » Wed Jul 24, 2013 1:57 am

^(?=.*?(?<![a-z])sam(?![a-z]))(?=.*?(?<![a-z])cook(?![a-z]))

This should be my last modification to this expression. I used the greedy .* and I should have used .*?
Filtering a group with 350 million + headers was about 21 second and I dropped it to 10 seconds.
TheWanderer
n00b
n00b
 
Posts: 8
Joined: Sat Jul 20, 2013 12:53 am

Registered Newsbin User since: 04/21/13

Re: Need Help - Regular Expressions in Watch Topics

Postby BZee » Wed Jul 24, 2013 6:21 am

Thanks. I appreciate your help. I use regular expressions so seldom I don't want to spend a lot of time on them. I have a number of names to check for so I'll probably keep using (Sam*.Cook)|(Cook*.Sam) unless I start getting numerous incorrect matches. I don't plan on using auto download anyway. Again, thanks for your help.
BZee
Seasoned User
Seasoned User
 
Posts: 459
Joined: Thu Sep 27, 2001 9:10 pm
Location: California

Registered Newsbin User since: 04/13/03

Re: Need Help - Regular Expressions in Watch Topics

Postby TheWanderer » Wed Jul 24, 2013 6:55 am

BZee wrote:... so I'll probably keep using (Sam*.Cook)|(Cook*.Sam) ...

I don't know if you are typing it wrong only here or in your filter as well but it is not *. it is .*

Sam*.Cook means
sa + m 0 or more times + any character + cook

Sam.*Cook means
sam + any character 0 or more times + cook

you want to use
sam.*cook|cook.*sam or
sam.*?cook|cook.*?sam or
(sam.*cook|cook.*sam) or
(sam.*?cook|cook.*?sam)

or as Quade pointed out since he has made the space into a special character meaning "and".... all you have to do is

sam cook
TheWanderer
n00b
n00b
 
Posts: 8
Joined: Sat Jul 20, 2013 12:53 am

Registered Newsbin User since: 04/21/13


Return to Regular Expressions

Who is online

Users browsing this forum: Google [Bot], Majestic-12 [Bot] and 2 guests