Using regex to extract URLs from plain text with Perl -
I use Pearl to remove all URLs (possibly with variable subdomain) from a specific domain in plain detail How can I use Reggaeps? Lesson? I have tried:
my $ stuff = 'omg http://fail-o-tron.com/bleh omg omg http://homepage.com/woot.gif dfgdfg Http : //shomepage.com/woot.gif AAA '; While ($ accessories = ~ m / (http \: \ / \ /.*? Homepage.com \ /.*? .gif) / gmsi) {print $ 1. "\ N"; }
It fails seriously and gives me:
http://fail-o-tron.com/bleh omg omg omg Omg omg http: /homepage.com/woot.gif http://shomepage.com/woot.gif
I thought it would not be as I . *?
, which should be non-greedy and should give me the smallest match. Can anyone tell me what I am doing wrong? (I do not want some uber-complex, canned regexp to validate the URL; I want to know what I'm doing wrong, so I can learn from this.)
Specifically designed to solve this problem. It will find all the URIs and then you can filter them.
Update: Recently updated to handle Unicode.
Comments
Post a Comment