Page 1 of 1

Need Help - Regular Expressions in Watch Topics

PostPosted: Tue Jun 25, 2013 7:46 am
by BZee
I working on my watch topics and have very little experience with regular expressions. If I'm looking for a person named Sam Cook and use that name as a watch topic it finds "Sam Cook" and "Cook, Sam" (good) but also finds names like Sam Lacook or Cooke, Sam (unwanted).

(Sam*.Cook) works properly except Sam and Cook have to be in that order so I have to use (Sam*.Cook)|((Cook*.Sam). Can someone experienced in regular expressions explain the functions of "()", "*", "." and especially "*." (I understand | = or). I've looked in several books and/or reference guides on regular expressions but really can't get it into my head how (Sam*.Cook) functions (I guess I need a "grade school level" explanation. :?


Edit:
Why does Rescan Topic with "Crange" or "(Crange)" give one match but the same Rescan Topic with "(Sam*.Cook)|(Cook*.Sam)|(Crange)" shows no matches???

Re: Need Help - Regular Expressions in Watch Topics

PostPosted: Tue Jun 25, 2013 10:42 am
by Quade
1 - Just use "Sam Cook" and it'll match any combination of those two words. The RE's for watch lists and RE's in general in 6.50 can be more simplified than in older versions. I just use words "this that 720p s03" for instance.

2 - When you rescan, you're adding files to those you already found so, your counts might not go up.

3 - If you rescan with it set to "internet" it doesn't use RE's at all so, "sam cook -german" would be valid for a search based re-scan. "-german" means remove results that contain the word "german".

4 - Don't forget the day range on the re-scan.

Re: Need Help - Regular Expressions in Watch Topics

PostPosted: Wed Jun 26, 2013 8:10 am
by BZee
BZee wrote:Why does Rescan Topic with "Crange" or "(Crange)" give one match but the same Rescan Topic with "(Sam*.Cook)|(Cook*.Sam)|(Crange)" shows no matches???

Could not duplicate - may have had an extraneous space somewhere in the expression.


Scanning headers and trying to match "Sam Cook" "Cook, Sam" or "Sam.Cook".
Quade wrote:1 - Just use "Sam Cook" and it'll match any combination of those two words. The RE's for watch lists and RE's in general in 6.50 can be more simplified than in older versions. I just use words "this that 720p s03" for instance.

Sam Cook also matches names like "Samson Lacook" or "Cookie Sams" giving many unneeded downloads. (Sam Cook) is better but still matches names like "Samson Lacook". I found an example regular expression similar to (Sam*.Cook). I don't know what the *. does but it seems to work if I use (Sam*.Cook)|(Cook*.Sam)

Re: Need Help - Regular Expressions in Watch Topics

PostPosted: Wed Jun 26, 2013 9:41 am
by Quade
1 - .* means "anything in between". So, "Sam.*cook" will still match "samson cookies"

"sam\s cook\s" requires a space after each word. That might work better for you. "sam[.] cook[.]" would require a "." after each word (seems like a common nomenclature).

2 - Spaces are important now. They trigger AND operations.

(Sam*.Cook)|(Cook*.Sam)


3 - There's nothing wrong with this RE but, it'll match as much as just plain "sam cook". You probably need to make it more exclusive by requiring spaces after the word or "."'s depending on what you're searching for.

Re: Need Help - Regular Expressions in Watch Topics

PostPosted: Thu Jun 27, 2013 5:03 am
by BZee
Thanks for your help. I'll do some experimenting.

Re: Need Help - Regular Expressions in Watch Topics

PostPosted: Sat Jul 20, 2013 3:20 pm
by TheWanderer
BZee wrote:Can someone experienced in regular expressions explain the functions of "()", "*", "." and especially "*."

. = any character, but not beginning or end of string
* = 0 or more of the preceding character
+ = 1 or more of the preceding character

.* = 0 or more of any character, but not beginning or end of string
.+ = 1 or more of any character, but not beginning or end of string

BZee wrote:Sam Cook as a watch topic finds "Sam Cook" and "Cook, Sam" (good) but also finds names like Sam Lacook or Cooke, Sam (unwanted).

^(?=.*(?<![a-z])sam(?![a-z]))(?=.*(?<![a-z])cook(?![a-z]))

This expression will match any line that has the words (in any position) "sam" and "cook" preceded or followed by any character except for a letter, or preceded or followed by nothing at all (beginning or end of line)

Re: Need Help - Regular Expressions in Watch Topics

PostPosted: Sun Jul 21, 2013 5:37 am
by BZee
TheWanderer wrote:(?=.*(?<![a-z])sam(?![a-z]))(?=.*(?<![a-z])cook(?![a-z]))

This expression will match any line that has the words (in any position) "sam" and "cook" preceded or followed by any character except for a letter, or preceded or followed by nothing at all (beginning or end of line)


Thanks. As I have time I'll study your expression.

Re: Need Help - Regular Expressions in Watch Topics

PostPosted: Sun Jul 21, 2013 5:33 pm
by TheWanderer
This might help anyone reading this to break it down easier.

^(?=.*(?<![a-z])sam(?![a-z]))(?=.*(?<![a-z])cook(?![a-z]))

The only combinations it will not find are samcook and cooksam. These two could be added in a few different ways but it would make the expression very specific to the words sam and cook.

Edit:
I am changing the expression by adding a carat ^ at the beginning.
It worked perfect as a double positive look ahead, but turned nasty if changed into a mixed positive and negative look ahead without the beginning of string ^

Re: Need Help - Regular Expressions in Watch Topics

PostPosted: Wed Jul 24, 2013 1:57 am
by TheWanderer
^(?=.*?(?<![a-z])sam(?![a-z]))(?=.*?(?<![a-z])cook(?![a-z]))

This should be my last modification to this expression. I used the greedy .* and I should have used .*?
Filtering a group with 350 million + headers was about 21 second and I dropped it to 10 seconds.

Re: Need Help - Regular Expressions in Watch Topics

PostPosted: Wed Jul 24, 2013 6:21 am
by BZee
Thanks. I appreciate your help. I use regular expressions so seldom I don't want to spend a lot of time on them. I have a number of names to check for so I'll probably keep using (Sam*.Cook)|(Cook*.Sam) unless I start getting numerous incorrect matches. I don't plan on using auto download anyway. Again, thanks for your help.

Re: Need Help - Regular Expressions in Watch Topics

PostPosted: Wed Jul 24, 2013 6:55 am
by TheWanderer
BZee wrote:... so I'll probably keep using (Sam*.Cook)|(Cook*.Sam) ...

I don't know if you are typing it wrong only here or in your filter as well but it is not *. it is .*

Sam*.Cook means
sa + m 0 or more times + any character + cook

Sam.*Cook means
sam + any character 0 or more times + cook

you want to use
sam.*cook|cook.*sam or
sam.*?cook|cook.*?sam or
(sam.*cook|cook.*sam) or
(sam.*?cook|cook.*?sam)

or as Quade pointed out since he has made the space into a special character meaning "and".... all you have to do is

sam cook