How to regex TWO words?

Tips on writing regular expressions for searching the post list

Moderators: Quade, dexter

How to regex TWO words?

Postby optiman » Tue Sep 30, 2008 9:22 pm

I'm trying to filter subjects with:

"filename.ext CAPITAL WORDS"

That is, any subject with a ".ext" extension in it, followed by capital letter words. My filter line is:

.ext [A-Z]

But it doesn't work. Note space between ".ext" and "[A-Z]. I thought if any line matched any part of the regex it would be filtered.

Any guidance?
Thanks!
User avatar
optiman
Seasoned User
Seasoned User
 
Posts: 251
Joined: Tue Aug 05, 2003 1:06 am

Registered Newsbin User since: 12/13/03

Postby dexter » Tue Sep 30, 2008 9:51 pm

The "." means "match any character so you need to escape it to catch that first dot.

Should probably escape the space too. Then use a "+" at the end to specify matching 1 or more charcters. Using a "*" means 0 or more.

\.ext\ [A-Z]+
User avatar
dexter
Site Admin
Site Admin
 
Posts: 9511
Joined: Fri May 18, 2001 3:50 pm
Location: Northern Virginia, US

Registered Newsbin User since: 10/24/97

Postby optiman » Tue Sep 30, 2008 10:28 pm

Thanks Dexter.
I added your regex and reloaded; seemed to filter perfectly. Odd though, I would have thought my regex was TOO broad and would have filtered out a lot. I had the "+" on previously as a test but it didn't work before. It must have been the space without the "\" that mucked things up. Appreciate it.
User avatar
optiman
Seasoned User
Seasoned User
 
Posts: 251
Joined: Tue Aug 05, 2003 1:06 am

Registered Newsbin User since: 12/13/03

Postby Quade » Tue Sep 30, 2008 10:39 pm

Keep in mind that RE's in Newsbin aren't case dependent.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44867
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Postby optiman » Sat Oct 11, 2008 1:48 pm

I thought it worked but it didn't. Every new update, the files are back. What if I wanted to filter any subject with a space in it? I tried "ext\ [A-Z]+" with and without the plus, but subjects with "ext ZZZ" are still getting through.
User avatar
optiman
Seasoned User
Seasoned User
 
Posts: 251
Joined: Tue Aug 05, 2003 1:06 am

Registered Newsbin User since: 12/13/03

Postby dexter » Sat Oct 11, 2008 2:55 pm

Which version are you running? Looks like it isn't working as expected in the beta but, works fine for me in Version 5.42.

You can test the RE in the Find field of a post list to get it right, then put it in a filter profile when you find something that works.

Instead of escapes, you can also use brackets for readability. For example [.]ext[ ][a-z]+

As Quade mentioned, NewsBin RE's are case insensitive so [a-z] is the same as doing [A-Za-z].
User avatar
dexter
Site Admin
Site Admin
 
Posts: 9511
Joined: Fri May 18, 2001 3:50 pm
Location: Northern Virginia, US

Registered Newsbin User since: 10/24/97

Postby optiman » Mon Oct 13, 2008 11:36 am

Thanks Dexter,
Using search to test it is brilliant-- I didn't know you could do that, thanks! I'll report back.
User avatar
optiman
Seasoned User
Seasoned User
 
Posts: 251
Joined: Tue Aug 05, 2003 1:06 am

Registered Newsbin User since: 12/13/03

Postby optiman » Thu Oct 30, 2008 11:49 am

FYI the reason it wasn't working is because there were TWO spaces, not one, between the words. So the regex works. But it broke again with build 5.50B6. Try doing a search for a filename using a space, nothing appears.
User avatar
optiman
Seasoned User
Seasoned User
 
Posts: 251
Joined: Tue Aug 05, 2003 1:06 am

Registered Newsbin User since: 12/13/03

Postby Quade » Thu Oct 30, 2008 12:50 pm

/s* would cover that. I thought /s mean 1 or more spaces but, I guess it doesn't.

I typically use word.*word. word/s*word would probably work too.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44867
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Postby optiman » Sun May 17, 2009 9:35 pm

Any chance case dependency can be built in? So I can filter out "CAPITAL1 CAPITAL2"?
User avatar
optiman
Seasoned User
Seasoned User
 
Posts: 251
Joined: Tue Aug 05, 2003 1:06 am

Registered Newsbin User since: 12/13/03

Postby Quade » Sun May 17, 2009 10:29 pm

Probably not.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44867
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Postby Gaijin64 » Sun May 31, 2009 8:58 pm

Here's one that might help. This regex looks for any line matching the extension you want (change the "ext" characters) and looks for capitalized words that are spaced after the extension. To avoid picking up the files that have a sentence or title case after the extension, we make sure that we seek out the ones that are all caps, allowing for a space after the first character...so someone could add "I THINK THIS IS A GREAT FILE" and get picked up. Typing "I Think This Is A Great File" would be skipped.

Will Match:

<filename>.ext CAPITAL LETTERS
<filename>%.ext MORE CAPITAL LETTERS
<filename>.ext I TYPE IN ALL CAPS
<filename>.ext I TYPE IN ALL CAPS I SPACE TWICE AND KEEP TYPING IN ALL CAPS


No Match:
<filename> .ext I TYPE IN ALL CAPS
<filename.ext.pleaselookatme> NO .EXT BUT I'M TYPING IN CAPS
<filename>.ext noncapital letters
<filename>.ext Title Case with Capitalization
<filename>.ext I Type in all caps
<filename>.ext I Type in all caps. space twice AND THEN START TYPING IN ALL CAPS
<filename>.ext I'm A WeAnIe AnD LiKe To AlTeRnAtE My CaSe



How to regex TWO words with CAPS
[^\s]\.(ext[\s][A-Z][\sA-Z]{3,})

Options: ^ and $ match at line breaks
    Match any character that is NOT a "A whitespace character (spaces, tabs, line breaks, etc.)"
    Match the character "." literally
    Match the regular expression below and capture its match into backreference number 1
      Match the characters "ext" literally
      Match a single character that is a "whitespace character" (spaces, tabs, line breaks, etc.)
      Match a single character in the range between "A" and "Z"
      Match a single character present in the list below
        Between 3 and unlimited times, as many times as possible, giving back as needed (greedy)
        A whitespace character (spaces, tabs, line breaks, etc.)
        A character in the range between "A" and "Z"
Gaijin64
n00b
n00b
 
Posts: 9
Joined: Tue Jul 22, 2003 11:16 am

Registered Newsbin User since: 05/14/03

Postby bobkoure » Sat Nov 07, 2009 1:05 pm

OK... but Quade's got case sensitivity turned OFF (to the point that a search for \x54 (hex 'T') gets posts with 'T' and 't').
I can't think of any other way to force an uppercase search if it's globally off.
bobkoure
 


Return to Regular Expressions

Who is online

Users browsing this forum: No registered users and 2 guests

cron