Meetings every second Tuesday
Meeting Review, November 7, 2006
Here are some code sample from the concepts I talked about at the meeting.
I talked about properly escaping data that is to be injected into structured content. The two major examples of structured content that PHP developers inject data into are HTML and SQL.
When injecting data into HTML, a la <?php echo $data; ?>, it is important to always wrap it in htmlspecialchars(), like this:
<?php echo htmlspecialchars($data); ?>
...Unless you really want to inject HTML into the page.
Because HTML has a syntax, you can't inject arbitrary data into it, and expect the resulting HTML to be valid 100% of the time. One of the major features of HTML's syntax is the left angle bracket (<), which starts an HTML tag. If the data you are injecting has the angle bracket, you can't blame the HTML interpreter for interpreting it as the start of an HTML tag.
The htmlspecialchars() function that those special characters that make up the HTML language, and replaces them with tokens that the HTML interpreter understands as aliases to the special characters.
Here are those special characters, and their aliases:
| < | < |
| > | > |
| " | " |
| & | & |
In HTML, those tokens that act as aliases are called entities.
This is the logically correct way to insert arbitrary data into HTML. It's also the secure way to do it. I want to emphasize that by doing it the logical way, you also take care of doing it the secure way. It's a way of killing two birds with one stone.
The down side to this is that your code more verbose and harder to follow. There are some things you can do to make the code less verbose. One is to write an alias for htmlspecialchars():
<?php
function h($str) {
return htmlspecialchars($str);
}
?>That reduces verbosity a certain amount:
<php echo h($data); ?>You can even take it a step further:
<?php
function e($str) {
return echo htmlspecialchars($str);
}
?><?php e($data) ?>Speaking of verbosity, sometimes the above is still too verbose, even with the h() or e() functions. For instance:
<form>
<input type="text" name="phone" value="<?php e($form['phone']); ?>" />
</form>That looks a bit messy and a little error-prone, especially if your form has 20 input fields. Is there a cleaner approach? Yes there is:
<?php
$phone = h($phone);
echo <<<END_FORM
<form>
Phone: <input type="text" name="phone" value="$phone" />
</form>
END_FORM;
The above example uses PHP's heredoc syntax. With it, we can completely eliminate PHP tags from the insides of HTML tags. Here is another example:
<?php
foreach ($contacts as $contact_object) {
$contact_array = (array) $contact_object;
foreach ($contact_array as $key=>$value) $$key = h($value);
echo <<<END_ROW
<tr>
<td>$id</td>
<td>$first_name</td>
<td>$last_name</td>
<td>$phone</td>
<td>
<a href="?action=edit&id=$id">edit</a>
<a href="?action=delete&id=$id">delete</a>
</td>
</tr>
END_ROW;
}Moving on from HTML, the second-most common place to inject data into structured content is SQL. PHP programmers inject SQL strings and numbers into an SQL statement, like this:
<?php
$sql = "SELECT * FROM users WHERE username='$username'";The two quotes surrounding $username have an important job. They define the beginning and ending of a string. An abstract definition of an SQL string goes something like this:
- A single quote character
- Zero or more characters that are NOT a single quote
- A single quote character
However, if $username has a quote, it breaks the definition of an SQL string.
The solution is to search the contents of $username and replace all instances of the quote character with something that the SQL interpreter understands as an alias for the quote character. In the case of MySQL, that would be the backslash, followed by a quote. MySQL's definition of a string goes something like this:
- A single quote character
- Zero or more characters that are NOT a single quote, unless preceeded by a backslash character, which itself is discarded.
- A single quote character
We can perform this search-and-replace operation with the str_replace() function:
<?php
$username = str_replace("'", "\\'", $username);
$sql = "SELECT * FROM users WHERE username='$username'";If we were writing SQL for a DB2 database, which has a different definition of a string from MySQL, here's what the substitution would look like:
<?php
$username = str_replace("'", "''", $username);
$sql = "SELECT * FROM users WHERE username='$username'";In DB2, single quotes need to be replaced with two single quotes.
An SQL Injection attack is what happens when someone tries to take advantage of the fact that data injected into an SQL statement hasn't been properly escaped:
<?php
$username = "' OR ''='";
$sql = "SELECT * FROM users WHERE username='$username'";This causes the database engine to return all the records in the users table. The SQL statement, after injecting the data looks like this:
SELECT * FROM users WHERE username='' OR ''=''This query, translated into English, goes something like, "Select all records from the 'users' table where 'username' is an empty string, or empty string equals empty string". The 'username' column is likely never to have an empty string, but an empty string is always equivalent to another empty string.
~Moxley Stratton


mysql_real_escape_string
Instead of replacing the ' with a //' why not use mysql_real_escape_string? http://us2.php.net/manual/en/function.mysql-real-escape-string.php
Jason