Friday, September 18, 2009

HTML::Restrict - Easily Strip HTML From Your Documents

I've just released HTML::Restrict to the world. It's a Perl module which allows you to strip HTML from text very easily. Here's an example:



#!/usr/bin/perl

use strict;
use warnings;

use HTML::Restrict;

my $hr = HTML::Restrict->new();
# use default rules to start with (strip away all HTML)
my $processed = $hr->process('i am bold');

# $processed now equals: i am bold

If you want to allow some HTML but not all, you can add a set of rules to allow arbitrary elements and attributes:



#!/usr/bin/perl

use strict;
use warnings;

use HTML::Restrict;

my $hr = HTML::Restrict->new();
$hr->set_rules({
b => [],
img => [qw( src alt )]
});

my $html = q[hello me];
my $processed = $hr->process( $html );

# $processed now equals: hello me


This has now been released as Open Source software and is available on the CPAN

3 comments:

Anonymous said...

Hi,
just wanted to say that it's awesome.

Chankey Pathak said...

Nice module. One thing I want to ask is that why doesn't

body => [qw(onLoad)],

works? neither does any javascript function like "onClick, onMouseDOwn" etc?

One thing more is that I don't want the HTML comments to be deleted. How can I do this?

Thanks in advance.

test said...

@chankey I've dealt with your question on your original StackOverflow question.