HTML5/AngularJS/Nginx crawlable application

Full Ajax

A lot of java web applications and java web frameworks use an architecture that does not allow separate ui and backend development. Thus, there is no way to separate team of highly specialised frontend and backend development into ui-team and backend-team. Regardless of the preferences of the developer he has to understand of how presentation and businnes-logic-layer works. If ui-developer knows just data model (which connect application tempates and controllers) and how to run server – it is a huge success. In particularly bad cases, ui-developer needs to re-build entire application when changing few lines in of the javascript code, or know about language of jsp files when he wants to correct css style. Also, formation and transfer html files from server instead of pure data affecting performance of server and network.

Nowadays, modern browsers (with HTML5, WebSocket etc) no longer need to score backend server with something different from business logic. Now ui-development can be carried out on simple nginx server with api-stubs instead of real backend server. Frameworks for documentation auto-genartion (like JSONDoc) also help ui and backend developers to reduce the cost of communication. Transferring pure json data also significantly reduce load on backend servers. After all, the compressed javascript code of ui-client can be kept in browser cache (reducing network and nginx load).

But if modern browsers can easy handle the increased liability companies, the search engines need a little help.

For proper indexing angularjs applications we need the following things:

  • sitemap.xml
  • html5Mode
  • nginx
  • old-fashion backend server

HTML5

Html5 mode turns angularjs routes example.com/#!/Home into example.com/home (links href attribute must also be declared without hashbang).

Activate your html5Mode in angularjs:

$locationProvider.html5Mode(true);
$locationProvider.hashPrefix('!');

Hashbang goes for compatibility with browsers that do not support html5 urls.

Now we need to make our nginx server follow requests from example.com/home to main index.html file for the application. To do this, we point out the following directive in the config file:

location / {
        expires -1;
        add_header Pragma "no-cache";
        add_header Cache-Control "no-store, no-cache, must-revalidate, post-check=0, pre-check=0";
        root /var/web;
        try_files $uri $uri/ /index.html =404;
    }

The string try_files $uri $uri/ /index.html =404; means that now all non-existent url will be forwarded to index.html file, but without rewriting url in the browser address bar. This solution is already working (and also compatible with the old format hashbang references), and if your application should not be indexed by search engines then you can finish.

SEO

Now we will help search engine to process our application correctly. To do this, we will prepare hints for search-bots and generate snapshot pages. For a start we will tell bot what you want to index the page with sitemap.xml file. The simplest version of a file is listed bellow (a link to the page, and the date of the last update; more detailed format is on the site www.sitemaps.org.

<urlset xmlns='http://www.sitemaps.org/schemas/sitemap/0.9'>
  <url>
    <loc>http://senior-java-developer.com/java/basics</loc>
    <lastmod>2013-07-12</lastmod>
  </url>
  <url>
    <loc>http://senior-java-developer.com/</loc>
    <lastmod>2013-07-12</lastmod>
  </url>
</urlset>

Great! The search engine will request referrals from our website and receive content as index.html. But javascript processing not integrated into bots. We will tell the bot that there is real content under index.html. To do this, add the <head> of the page:

<meta name="fragment" content="!" />

This will give the bot opportunity to take the next step. Seeing fragmet=! bot will request the page again, but it will add ?_escaped_fragment_= parameter to the url tail. Nginx will forward that request with parameter to different location:

if ($args ~ "_escaped_fragment_=(.*)") {
  rewrite ^ /snapshot${uri};
}  
location /snapshot {
    proxy_pass http://api;
    proxy_connect_timeout  60s;
}

That’s it, now all requests from bot will see snapshot responses from backend server.

Real url Bot url Backend url
example.com example.com/?_escaped_fragment_= localhost:8080/snapshot/
example.com/home example.com/home?_escaped_fragment_= localhost:8080/snapshot/home

To build snapshot I use angularjs views and thymeleaf view framework. Since thymeleaf and angularjs support html5 tag attributes you can even use a single template file, but I prefer not to mix them. The line of html view would look like this: <div ng-bind="text" th:utext="${text}"></div>. Single file. Cool!

Done. Now the search bot will request the necessary references and index them properly. For now “Fetch as bot” in google and bing webmaster tools does not support fragmet=!, so you can’t immediately check if you have everything configured right, and you should wait when bot comes to you app.

8 thoughts on “HTML5/AngularJS/Nginx crawlable application

  1. FUCK YOU GOOGLE

    none of these work

    MY PAGE STILL INST INDEXED AND ITS BEEN FOR MONHS AND I TRIED ALL OF THESE

    GOD DAMN STUPID GOOGLE

    could’ve rewriten my page in another language by now … all in time by the time i figure out HOW THE FUCK GET IT INDEXED

    god damn google
    FUCK YOU

  2. Thank you, Alexey!! I was able to get HTML5 URLs to work with your solution, after searching the web for ages 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax