Jump to page content
obsolete

Path case fix

One mistake which I have come across repeatedly in my site’s error logs is incorrect case. My site makes extensive (and probably unwise) use of uppercase characters in paths. Because I am hosted on UNIX, this site is case sensitive and manual entry of paths by visitors is prone to case errors. By this point, switching all paths to lowercase would not solve the problem as then all links from internal and external pages would break instead. Therefore, I wrote an algorithm in PHP for performing a case-insensitive look-up of request paths in order to redirect the user to the correct path.

This code is intended to go into your site’s error handler script. This requires that your site have an explicit 404 handler script instead of relying on the server to generate 404 pages. Those using Apache will set this up using the site control panel, or in httpd.conf/.htaccess as follows:

ErrorDocument 404 /notfound.php

I won’t go into writing PHP 404 scripts here; suffice it to say that this script should contain code that will display a 404 page. Using a PHP (or any server-side scripted) 404 page script allows you to style the 404 page to match your site (including any navigation code) while also showing the requested URL in the HTML. It also allows for nifty tricks such as providing a 404 report form.

With this page set up, you will then want to add a section of extra code: code that will perform a case-insensitive look-up each time a 404 is raised. I have implemented this as two functions. The first, outer function is findInDirCase() which, given a start path and a target path, performs the look-up. The second, auxiliary function inArrayCaseFix() acts similar to the PHP function in_array() with two exceptions: it is case-insensitive, and it returns the case-corrected match. findInDirCase uses this to scan directory contents returned by glob().

Note that only the first match is found. It is not the responsibility of my code if you do anything silly like name two items with filenames that differ only in case! :)

Here are the two functions; skip over the code to continue the explanation

function findInDirCase ($pStartPath, $pRemainderArr) { // Attempt a case-insensitive match of the path $pRemainderArr in the local directory // $pStartPath. Set $pStartPath to "" to search starting from the current working directory. // The target (needle) path $pRemainderArr is an array instead of a string; initially it // comprises the target path split into an array (with no empty entries!) Each time a match is // made (in heaven) the first element is docked and the remaining elements are passed back // to the function recursively to continue the search. Hence the weird name. // Set glob path; need a slash before the * if pStartPath != "" $globPath = $pStartPath ? "$pStartPath/" : ""; // First do a case-sensitive check if (!file_exists($pathCheck = $globPath . $pRemainderArr[0])) { // Not found by case-sensitive match, try case insensitive // Retrieve all files in the start path. $checkArr = glob("$globPath*"); // Check to see if the next item (0) in the search path array pRemainderArr exists // pathCheck will contain FALSE for no match, or the current match so far including path $pathCheck = inArrayCaseFix($globPath . $pRemainderArr[0], $checkArr); } if ($pathCheck !== FALSE) { // Match found // Drop current (0) item as we've found it $localRemainderArr = $pRemainderArr; unset ($localRemainderArr[0]); if (count($localRemainderArr) == 0) { // No more path elements; we're done // Because of the way glob works, $pathCheck itself contains the full path of the progress // made so far. We can therefore return this as our result variable. With an added '/' if // it is a directory if (is_dir($pathCheck)) $pathCheck .= "/"; // give it its trailing slash! return $pathCheck; // found our destination } else { // We have one or more items left in remainderArr $localRemainderArr = array_values($localRemainderArr); // redo the array keys if (!is_dir($pathCheck)) { // The current path element is not a directory, we cannot proceed here // If we need paths of the form /dir/script/arg, then replace this line with // return $pathCheck . '/' . implode($localRemainderArr, '/'); return FALSE; } else { // We have another directory level to explore; recurse return findInDirCase($pathCheck, $localRemainderArr); } } } else { // No match found for this path element; return FALSE; } } function inArrayCaseFix ($pTarget, $pArray) { // Case-insensitive version of in_array, which returns the case-corrected result // or FALSE if nothing was found. Assumes a string needle $pTarget and an array // haystack $pArray. Performs a basic linear search; $pArray does not need to be sorted. // Inspired by code posted to PHP.net by one_eddie (at tenbit.pl) // Repurposed by me to return the case-corrected item foreach ($pArray as $currItem) if (strtoupper($pTarget) == strtoupper($currItem)) return $currItem; return FALSE; // nothing found }

Calling the code

This calling code should be fairly straighforward but isn’t quite!

The first parameter to findInDirCase() is the start path. You will most likely pass in an empty string here. It only exists to support recursive calls to the function as the directory structure of the site is traversed. Note that if your error handler script does not live at the site root, you will need to assert the root as the working directory otherwise the look-up will fail:

chdir ($_SERVER['DOCUMENT_ROOT']);

This allows all the glob calls inside findInDirCase to start from the site root. Do not specify the document root as the start path parameter: this will work but result URLs from findInDirCase will be prefixed with this path and will be broken.

The search (target) path to findInDirCase is passed in as an array; this allows for the function to dock one element off the front each time a match is found. For a path /Misc/PHP/pathcasefix.php the array should look like this:

[0] ⇒ "Misc"
[1] ⇒ "PHP"
[2] ⇒ "pathcasefix.php"

explode() can be used to generate this path from the request URL, but the leading and sometimes trailing slash on the URL will leave empty elements in the array which will confuse findInDirCase. Thus, I wangle the data a little first to generate a clean array:

// Separate out the query string if there is one $URLparts = explode('?', $_SERVER['REQUEST_URI']); if ($URLparts[1]) $URLparts[1] = '?' . $URLparts[1]; // Turn the request URI into an array. Because leading and trailing slashes upset explode // (they create empty array values) do some ugly hacks to stop that from happening. $startDirArr = explode("/", substr(urldecode($URLparts[0]), 1)); $arrLen = count($startDirArr) - 1; if ($startDirArr[$arrLen] == "") unset ($startDirArr[$arrLen]);

Finally, call the look-up function. If the function returns a string, you can redirect to it. If the function returns FALSE, continue displaying a 404 page as normal; nothing was found.

// Perform the check if (($caseCheck = findInDirCase("", $startDirArr)) !== FALSE) { // Make sure it's properly encoded $caseCheck = str_replace('%2F', '/', rawurlencode($caseCheck)); // We found a match for the path; redirect and exit header ("Location: http://telcontar.net/$caseCheck{$URLparts[1]}", TRUE, 301); exit(); }

I hope that this proves useful to someone someday!

I have also put together basic example script that you can use as a basis.

Caveats

A few things that you need to watch out for:

Return to the PHP code repository