Page 1 of 1

How to regex TWO words?

PostPosted: Tue Sep 30, 2008 9:22 pm
by optiman
I'm trying to filter subjects with:

"filename.ext CAPITAL WORDS"

That is, any subject with a ".ext" extension in it, followed by capital letter words. My filter line is:

.ext [A-Z]

But it doesn't work. Note space between ".ext" and "[A-Z]. I thought if any line matched any part of the regex it would be filtered.

Any guidance?
Thanks!

PostPosted: Tue Sep 30, 2008 9:51 pm
by dexter
The "." means "match any character so you need to escape it to catch that first dot.

Should probably escape the space too. Then use a "+" at the end to specify matching 1 or more charcters. Using a "*" means 0 or more.

\.ext\ [A-Z]+

PostPosted: Tue Sep 30, 2008 10:28 pm
by optiman
Thanks Dexter.
I added your regex and reloaded; seemed to filter perfectly. Odd though, I would have thought my regex was TOO broad and would have filtered out a lot. I had the "+" on previously as a test but it didn't work before. It must have been the space without the "\" that mucked things up. Appreciate it.

PostPosted: Tue Sep 30, 2008 10:39 pm
by Quade
Keep in mind that RE's in Newsbin aren't case dependent.

PostPosted: Sat Oct 11, 2008 1:48 pm
by optiman
I thought it worked but it didn't. Every new update, the files are back. What if I wanted to filter any subject with a space in it? I tried "ext\ [A-Z]+" with and without the plus, but subjects with "ext ZZZ" are still getting through.

PostPosted: Sat Oct 11, 2008 2:55 pm
by dexter
Which version are you running? Looks like it isn't working as expected in the beta but, works fine for me in Version 5.42.

You can test the RE in the Find field of a post list to get it right, then put it in a filter profile when you find something that works.

Instead of escapes, you can also use brackets for readability. For example [.]ext[ ][a-z]+

As Quade mentioned, NewsBin RE's are case insensitive so [a-z] is the same as doing [A-Za-z].

PostPosted: Mon Oct 13, 2008 11:36 am
by optiman
Thanks Dexter,
Using search to test it is brilliant-- I didn't know you could do that, thanks! I'll report back.

PostPosted: Thu Oct 30, 2008 11:49 am
by optiman
FYI the reason it wasn't working is because there were TWO spaces, not one, between the words. So the regex works. But it broke again with build 5.50B6. Try doing a search for a filename using a space, nothing appears.

PostPosted: Thu Oct 30, 2008 12:50 pm
by Quade
/s* would cover that. I thought /s mean 1 or more spaces but, I guess it doesn't.

I typically use word.*word. word/s*word would probably work too.

PostPosted: Sun May 17, 2009 9:35 pm
by optiman
Any chance case dependency can be built in? So I can filter out "CAPITAL1 CAPITAL2"?

PostPosted: Sun May 17, 2009 10:29 pm
by Quade
Probably not.

PostPosted: Sun May 31, 2009 8:58 pm
by Gaijin64
Here's one that might help. This regex looks for any line matching the extension you want (change the "ext" characters) and looks for capitalized words that are spaced after the extension. To avoid picking up the files that have a sentence or title case after the extension, we make sure that we seek out the ones that are all caps, allowing for a space after the first character...so someone could add "I THINK THIS IS A GREAT FILE" and get picked up. Typing "I Think This Is A Great File" would be skipped.

Will Match:

<filename>.ext CAPITAL LETTERS
<filename>%.ext MORE CAPITAL LETTERS
<filename>.ext I TYPE IN ALL CAPS
<filename>.ext I TYPE IN ALL CAPS I SPACE TWICE AND KEEP TYPING IN ALL CAPS


No Match:
<filename> .ext I TYPE IN ALL CAPS
<filename.ext.pleaselookatme> NO .EXT BUT I'M TYPING IN CAPS
<filename>.ext noncapital letters
<filename>.ext Title Case with Capitalization
<filename>.ext I Type in all caps
<filename>.ext I Type in all caps. space twice AND THEN START TYPING IN ALL CAPS
<filename>.ext I'm A WeAnIe AnD LiKe To AlTeRnAtE My CaSe



How to regex TWO words with CAPS
[^\s]\.(ext[\s][A-Z][\sA-Z]{3,})

Options: ^ and $ match at line breaks
    Match any character that is NOT a "A whitespace character (spaces, tabs, line breaks, etc.)"
    Match the character "." literally
    Match the regular expression below and capture its match into backreference number 1
      Match the characters "ext" literally
      Match a single character that is a "whitespace character" (spaces, tabs, line breaks, etc.)
      Match a single character in the range between "A" and "Z"
      Match a single character present in the list below
        Between 3 and unlimited times, as many times as possible, giving back as needed (greedy)
        A whitespace character (spaces, tabs, line breaks, etc.)
        A character in the range between "A" and "Z"

PostPosted: Sat Nov 07, 2009 1:05 pm
by bobkoure
OK... but Quade's got case sensitivity turned OFF (to the point that a search for \x54 (hex 'T') gets posts with 'T' and 't').
I can't think of any other way to force an uppercase search if it's globally off.