Using regex to extract URLs from plain text with Perl -

- February 15, 2012

I use Pearl to remove all URLs (possibly with variable subdomain) from a specific domain in plain detail How can I use Reggaeps? Lesson? I have tried:

  my $ stuff = 'omg http://fail-o-tron.com/bleh omg omg http://homepage.com/woot.gif dfgdfg Http : //shomepage.com/woot.gif AAA '; While ($ accessories = ~ m / (http \: \ / \ /.*? Homepage.com \ /.*? .gif) / gmsi) {print $ 1. "\ N"; }

It fails seriously and gives me:

  http://fail-o-tron.com/bleh omg omg omg Omg omg http: /homepage.com/woot.gif http://shomepage.com/woot.gif

I thought it would not be as I . *? , which should be non-greedy and should give me the smallest match. Can anyone tell me what I am doing wrong? (I do not want some uber-complex, canned regexp to validate the URL; I want to know what I'm doing wrong, so I can learn from this.)

Specifically designed to solve this problem. It will find all the URIs and then you can filter them.

Update: Recently updated to handle Unicode.

Search This Blog

Dos2Unix

Using regex to extract URLs from plain text with Perl -

Comments

Post a Comment

Popular posts from this blog

c++ - Linux and clipboard -

delphi - Mouseover hint for TChart series value -

How to Create Master-Details view using Asp.Net MVC framework -