Ignore my philosophies

Tips on writing regular expressions for searching the post list

Moderators: Quade, dexter

Ignore my philosophies

Postby TheWanderer » Fri Jul 26, 2013 6:03 am

So I am sitting here trying to devise an expression to that will reliably match file extensions and only file extensions without having to type them all out.
(i.e. ^.*?\.(reg|ex|can|suk) )


When devising expressions that are not simple I always remind myself that regex is greedy.
Regex wants to take everything and it doesn't give two f..ks that you don't want it to take everything.
Doesn't care what you think or how you feel about anything.
It will only laugh (if it could) when you get frustrated because it isn't being logical or rational.
It is a matter of trying to make it do what you want (or trick it) and you often still lose.


Then it dawned on me, regex is Friend of the Court.

(That's the punch line. Go back about your business. Nothing more to see here.)
TheWanderer
n00b
n00b
 
Posts: 8
Joined: Sat Jul 20, 2013 12:53 am

Registered Newsbin User since: 04/21/13

Re: Ignore my philosophies

Postby Quade » Fri Jul 26, 2013 8:29 am

I'm on the "good enough" school of regexes.

In the case of usenet. 99% of the extension end in a space or quote so, I use that.

"[.]jpg[\s\"]" if I want to be relatively precise.

I know I'm probably missing the point of your post...
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44865
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Ignore my philosophies

Postby bobkoure » Mon Jun 13, 2016 11:31 am

Ages late, but on the off chance someone finds this thread via search

\w word character i.e. [a-zA-Z0-9_]
\W inverse of above (non word character)
\. dot
{N} number of characters
q(?=u) lookahead, matches a q that is followed by a u
soooo...

(\.\w{3})(?=\W) // dot, then 3 word chars, followed by a non-word char
or, if you don't want the dot in \1
\.(\w{3})(?=\W)
You can also use lookbehind to see if there's a dot before the three word chars, but lookbehind has performance issues

Sounds like the OP got bitten by a 'hungry' match. Regex has... a fairly steep learning curve, but it's super useful
bobkoure
 

Re: Ignore my philosophies

Postby kalzekdor » Thu Dec 15, 2016 7:39 am

bobkoure wrote:Ages late, but on the off chance someone finds this thread via search

\w word character i.e. [a-zA-Z0-9_]
\W inverse of above (non word character)
\. dot
{N} number of characters
q(?=u) lookahead, matches a q that is followed by a u
soooo...

(\.\w{3})(?=\W) // dot, then 3 word chars, followed by a non-word char
or, if you don't want the dot in \1
\.(\w{3})(?=\W)
You can also use lookbehind to see if there's a dot before the three word chars, but lookbehind has performance issues

Sounds like the OP got bitten by a 'hungry' match. Regex has... a fairly steep learning curve, but it's super useful


Except there are plenty of file extensions that aren't three letters long...
kalzekdor
Occasional Contributor
Occasional Contributor
 
Posts: 27
Joined: Thu Aug 20, 2015 12:44 am

Registered Newsbin User since: 04/09/15


Return to Regular Expressions

Who is online

Users browsing this forum: No registered users and 1 guest

cron