Dynamic web sites which allow users to enter text content containing HTML are at risk for so-called cross-site scripting attacks (Wikipedia, Securitydocs) attacks.
A common approach taken to mitigate this risk is to allow some HTML content, but block content that is potentially harmful. One problem with a straightforward approach to blocking such content is that HTML parsing in browsers differs from the ideal, and nefarious individuals can take advantage of these differences to obscure content.
DeXSS uses TagSoup, an
open-source HTML parser that attempts to mimic how web browsers
work. TagSoup reads wild HTML and generates SAX2 events. DeXSS invokes
TagSoup and follows it with a pipeline of SAX2 filters to remove HTML
tags such as
script and attribute values containing such
DeXSS 1.2 is an Alpha release. You should be aware of the following issues:
If you have an interest in working on these issues, please consider contributing to the project.
DeXSS includes the following classes for direct use:
Source is at https://github.com/leighklotz/dexss.
ant dist -emacs
tagsoup-1.2.1.jarfrom http://tagsoup.info If you need to change the TagSoup version, edit the file etc/build/build.properties.
osbcp-css-paser-1.4.jarfrom http://github.com/corgrath/osbcp-css-parser If you need to change the OSBCP CSS Parser version, edit the file etc/build/build.properties.
java -classpath lib/tagsoup-1.2.1.jar:lib/osbcp-css-parser-1.4.jar:dist/lib/dexss-1.2.jar org.dexss.Test tests/benign/*.txtor
java -classpath lib\tagsoup-1.2.1.jar\;lib\osbcp-css-parser-1.4.jar\;dist\lib\dexss-1.2.jar org.dexss.Test tests/benign/*.txt
java -classpath lib/tagsoup-1.2.1.jar:lib/osbcp-css-parser-1.4.jar:dist/lib/dexss-1.2.jar org.dexss.Test tests/xss/*.txtor
java -classpath lib\tagsoup-1.2.1.jar\;lib\osbcp-css-parser-1.4.jar\;dist/lib/dexss-1.2.jar org.dexss.Test tests/xss/*.txt
If DeXSS does not meet your needs, see freecode.com for a list of similar libraries in other languages such as PHP and Perl.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.