Author Topic: Getting the Browser Default Language in PHP  (Read 5911 times)

Offline admin

  • Administrator
  • Sr. Member
  • *****
  • Posts: 296
    • View Profile
Getting the Browser Default Language in PHP
« on: November 13, 2010, 02:26:18 PM »
If you’re doing international (i18n or Iñtërnâtiônàlizætiøn) work (or just want to make your site available in several languages), you’ll likely need to determine the users default language in your PHP code to determine which language to serve up. Searching the web yields one common code piece frequently; unfortunately as you’ll soon see it may not give you the results you need as it ignores parts of the HTTP spec which may or may not be critical to the accuracy of the results.


HTTP Language Headers

The interchange between browser and server transfers information about the client and its capabilities in headers — user agent, what it will accept, and (what we’re interested in) language. The browser sends language information in a header called HTTP_ACCEPT_LANGUAGE, which looks something like this:

 es,en-us;q=0.3,de;q=0.1


Those values state the browser accepts Spanish (es), US English (en-us), and German (de). Obviously, most browsers don’t send so many possibilities, but you get the idea. Most of the code you can find to determine default language simply searches the header for the first 2-letter language code and returns the first it finds. But looking at the example, you’ll note some additional information q=0.3 — what’s that?


HTTP Header Q-Values for HTTP_ACCEPT_LANGUAGE

As part of the HTTP spec, those are Q-Values, and must be a number between 0 and 1 (if no number appears, you can assume the value as 1). Q-Values provide not only information to what a browser supports, but what it prefers. In the previous example, es has no q-value, so it’s 1.0, while en-us is 0.3 and de is 0.1 so that means this client can handle Spanish, US English, or German — but prefers Spanish if it’s available. If it’s not, the server is free to send any of the other supported choices.

Now you see the problem — if you only search the HTTP_ACCEPT_LANGUAGE header for a match and ignore the q-value, you have no way to determine what language the client prefers — you’ll only get a match for support. Or at worse, if a q-value is 0 (meaning no support at all), you’ll get a language the client specifically tells you not to send. Why simple reg-ex solutions work becomes obvious after examining some actual HTTP_ACCEPT_LANGUAGE headers sent by popular browsers:

•en-us,en;q=0.5 (Mozilla)
•en-US,en;q=0.9 (Opera)
•en-us (Internet Explorer)
•en (Lynx)

In those cases a simple string match works, even if q-values are ignored. But if the actual HTTP_ACCEPT_LANGUAGE HTTP header contains multiple languages with differing q-values like "en,de;q=0.9" (a person whose primary language is English, but knows German) simple string searches fail spectacularly. Obviously, we must consider q-values if our results are to be correct.The solution is simple. Break apart the string into it’s language components (they’re separated by commas), and then pick the one with the highest q-value to use (assume any language lacking q-values have a value of 1.0). In our example, we’ll split the string and get the following array back:



Algorithm

The solution is simple. Break apart the string into it’s language components (they’re separated by commas), and then pick the one with the highest q-value to use (assume any language lacking q-values have a value of 1.0). In our example, we’ll split the string and get the following array back:

•es — Spanish, assume q-value = 1.0
•en-us;q=0.3 — US English, with q-value of 0.3
•de;q=0.1 — German, with q-value of 0.1
Now with the languages identified,use regular expressions to extract the q-value, if it exists. Once all the q-values are assigned, select the one with the highest q-value, if it exists. If multiple languages have the same q-value, it’s safe to use any of them equally.

The Code
The following is the PHP code.

Code: [Select]
function getDefaultLanguage() {
   if (isset($_SERVER["HTTP_ACCEPT_LANGUAGE"]))
      return parseDefaultLanguage($_SERVER["HTTP_ACCEPT_LANGUAGE"]);
   else
      return parseDefaultLanguage(NULL);
   }

function parseDefaultLanguage($http_accept, $deflang = "en") {
   if(isset($http_accept) && strlen($http_accept) > 1)  {
      # Split possible languages into array
      $x = explode(",",$http_accept);
      foreach ($x as $val) {
         #check for q-value and create associative array. No q-value means 1 by rule
         if(preg_match("/(.*);q=([0-1]{0,1}\.\d{0,4})/i",$val,$matches))
            $lang[$matches[1]] = (float)$matches[2];
         else
            $lang[$val] = 1.0;
      }

      #return default language (highest q-value)
      $qval = 0.0;
      foreach ($lang as $key => $value) {
         if ($value > $qval) {
            $qval = (float)$value;
            $deflang = $key;
         }
      }
   }
   return strtolower($deflang);
}

Then in your code, just call getDefaultLanguage() and you’ll get a string back with the highest q-value language sent by the browser in the HTTP_ACCEPT_LANGUAGE header.

Caveats

First, be sure to use UTF-8 as your character encoding. If you’re not using UTF-8 right now, convert all your documents to it — you’ll be glad you did later.

Second, if you’re sending different language-specific content at the same URL, be sure to send the appropriate Vary header. If you don’t, intermediate proxy caches might be confused and serve the wrong language to some people. To do that, just use the following first in your PHP code: header("Vary: Accept-Language"). But be warned Internet Explorer has some bugs with the Vary header you should be aware of.

For more on q-values, IE bugs, and more explanation on the regular expressions and headers in general, read our previous article Serving XHTML With the Correct MIMETYPE[2] for discussions of similar issues. You are serving your XHTML correctly as application/xhtml+xml aren’t you?


So what?

What’s this good for? In a future article, we’ll demonstrate how to use this method to get instant translations of your web pages into many different languages — automatically.