The WunderCounter: HTML::Restrict - Easily Strip HTML From Your Documents

Friday, September 18, 2009

HTML::Restrict - Easily Strip HTML From Your Documents

I've just released HTML::Restrict to the world. It's a Perl module which allows you to strip HTML from text very easily. Here's an example:



#!/usr/bin/perl

use strict;
use warnings;

use HTML::Restrict;

my $hr = HTML::Restrict->new();
# use default rules to start with (strip away all HTML)
my $processed = $hr->process('i am bold');

# $processed now equals: i am bold

If you want to allow some HTML but not all, you can add a set of rules to allow arbitrary elements and attributes:



#!/usr/bin/perl

use strict;
use warnings;

use HTML::Restrict;

my $hr = HTML::Restrict->new();
$hr->set_rules({
    b   => [],
    img => [qw( src alt )]
});

my $html = q[hello ];
my $processed = $hr->process( $html );

# $processed now equals: hello

This has now been released as Open Source software and is available on the CPAN

3 comments:

AnonymousAugust 8, 2011 at 8:16 AM
Hi,
just wanted to say that it's awesome.
ReplyDelete
Replies
Chankey PathakSeptember 26, 2012 at 5:11 AM
Nice module. One thing I want to ask is that why doesn't

body => [qw(onLoad)],

works? neither does any javascript function like "onClick, onMouseDOwn" etc?

One thing more is that I don't want the HTML comments to be deleted. How can I do this?

Thanks in advance.
ReplyDelete
Replies
testSeptember 26, 2012 at 9:25 AM
@chankey I've dealt with your question on your original StackOverflow question.
ReplyDelete
Replies

Add comment