Friday, September 18, 2009

HTML::Restrict - Easily Strip HTML From Your Documents

I've just released HTML::Restrict to the world. It's a Perl module which allows you to strip HTML from text very easily. Here's an example:



#!/usr/bin/perl

use strict;
use warnings;

use HTML::Restrict;

my $hr = HTML::Restrict->new();
# use default rules to start with (strip away all HTML)
my $processed = $hr->process('i am bold');

# $processed now equals: i am bold

If you want to allow some HTML but not all, you can add a set of rules to allow arbitrary elements and attributes:



#!/usr/bin/perl

use strict;
use warnings;

use HTML::Restrict;

my $hr = HTML::Restrict->new();
$hr->set_rules({
b => [],
img => [qw( src alt )]
});

my $html = q[hello me];
my $processed = $hr->process( $html );

# $processed now equals: hello me


This has now been released as Open Source software and is available on the CPAN

3 comments:

  1. Hi,
    just wanted to say that it's awesome.

    ReplyDelete
  2. Nice module. One thing I want to ask is that why doesn't

    body => [qw(onLoad)],

    works? neither does any javascript function like "onClick, onMouseDOwn" etc?

    One thing more is that I don't want the HTML comments to be deleted. How can I do this?

    Thanks in advance.

    ReplyDelete
  3. @chankey I've dealt with your question on your original StackOverflow question.

    ReplyDelete