Friday, September 18, 2009

HTML::Restrict - Easily Strip HTML From Your Documents

I've just released HTML::Restrict to the world. It's a Perl module which allows you to strip HTML from text very easily. Here's an example:


use strict;
use warnings;

use HTML::Restrict;

my $hr = HTML::Restrict->new();
# use default rules to start with (strip away all HTML)
my $processed = $hr->process('i am bold');

# $processed now equals: i am bold

If you want to allow some HTML but not all, you can add a set of rules to allow arbitrary elements and attributes:


use strict;
use warnings;

use HTML::Restrict;

my $hr = HTML::Restrict->new();
b => [],
img => [qw( src alt )]

my $html = q[hello me];
my $processed = $hr->process( $html );

# $processed now equals: hello me

This has now been released as Open Source software and is available on the CPAN


Anonymous said...

just wanted to say that it's awesome.

Chankey Pathak said...

Nice module. One thing I want to ask is that why doesn't

body => [qw(onLoad)],

works? neither does any javascript function like "onClick, onMouseDOwn" etc?

One thing more is that I don't want the HTML comments to be deleted. How can I do this?

Thanks in advance.

test said...

@chankey I've dealt with your question on your original StackOverflow question.