Using regex to extract URLs from plain text with Perl -


I use Pearl to remove all URLs (possibly with variable subdomain) from a specific domain in plain detail How can I use Reggaeps? Lesson? I have tried:

  my $ stuff = 'omg http://fail-o-tron.com/bleh omg omg http://homepage.com/woot.gif dfgdfg Http : //shomepage.com/woot.gif AAA '; While ($ accessories = ~ m / (http \: \ / \ /.*? Homepage.com \ /.*? .gif) / gmsi) {print $ 1. "\ N"; }  

It fails seriously and gives me:

  http://fail-o-tron.com/bleh omg omg omg Omg omg http: /homepage.com/woot.gif http://shomepage.com/woot.gif  

I thought it would not be as I . *? , which should be non-greedy and should give me the smallest match. Can anyone tell me what I am doing wrong? (I do not want some uber-complex, canned regexp to validate the URL; I want to know what I'm doing wrong, so I can learn from this.)

Specifically designed to solve this problem. It will find all the URIs and then you can filter them.

Update: Recently updated to handle Unicode.


Comments

Popular posts from this blog

c++ - Linux and clipboard -

What is expire header and how to achive them in ASP.NET and PHP? -

sql server - How can I determine which of my SQL 2005 statistics are unused? -