webhosting

Drupal en de kunst van snelheid

dit artikel is als gast blog verschenen op true.nl/blog

De meeste bedrijven die een website online hebben, hebben als doel om er geld mee te verdienen. Voor sommige websites is dat een directe manier van geld verdienen, omdat ze een webshop hebben, donaties verkrijgen of leadgeneratie hebben. En voor vele bedrijven is het een vorm van marketing, om gevonden te worden door hun prospects. Webperformance is dus niet een technisch trucje, maar een must voor een ieder. Als je geld verdient online, verlies je dat door een langzame website!

Snelle website

Content Management Systemen als Drupal zijn geweldig voor een gebruiker, zowel de redacteur als de bezoeker kan eenvoudig content toevoegen. Plaatjes worden automatisch geschaald en de tienduizenden modules leveren vaak de gevraagde functionaliteit. Maar dat komt met een prijs, ook in opensourceland. Voor elke pagina die het systeem samenstelt, moet de inhoud uit de database gehaald worden. Elke pagina die overzichtslijsten heeft (views) is een query naar de database. Elke pagina van een grotere website kan zo al snel honderden queries bevatten, enkel om een enkele pagina aan een enkele gebruiker te serveren. En dus ook met een snelle database vele seconden om de pagina aan de gebruiker te leveren. Een statische HTML-pagina serveren, is vele malen sneller.

Snelheid is de som van alle elementen

Dus aan de ene kant wil de business flexibiliteit om met een CMS content te kunnen veranderen. En aan de andere kant is het een eis dat een pagina binnen enkele seconden bij de gebruiker op het scherm staat. Om dit probleem op te lossen, is er niet een enkele actie die je kunt doen. Snelheid is de som van alle elementen tussen content en browser, de oplossing raakt dus ook alle elementen in deze keten; netwerk, operating systeem, het CMS, http en HTML. De enige oplossing om de performance te verbeteren van elke website – ongeacht het CMS – is dan ook er voor te zorgen dat de gehele keten wordt geanalyseerd en verbeterd waar noodzakelijk.

CSS en JS aggregate

Voor Drupal geldt dat het natuurlijk standaard is dat je CSS en JS aggregate gebruikt, hierdoor worden de tientallen CSS- en Javascript-bestanden samengevoegd en geminimaliseerd. Mooi. Tientallen minder hits is een aantal seconden sneller door een paar vinkjes. Maar nu begint het grijpen naar het hoger gelegen fruit. Hieronder staan enkel van de best practises, zoals deze geleerd zijn in de gemeenschap.

Minder hits

Vanuit de klant is het vaak gewenst om veel derde diensten te integreren, te meten met analystics en te delen via share this en a/b testing met Optimizely. Allemaal valide, maar elk Javascript-bestand is een extra DNS-request, een extra hit en extra parsing aan de browserzijde. Probeer de klant te overtuigen dat minder hits beter is Deze stap is waarschijnlijk het ingewikkeldste te realiseren, maar wel een die van belang is.

Caching in Drupal

Standaard heeft een CMS als Drupal caching staan in de database, het verste weg van de gebruiker vandaan. Dus alles wat deze caching kan verbeteren, is noodzakelijk. Dit kan door views en blokken te cachen.

Memcache en Varnish

Maar het verder ‘naar voren brengen’ van de cache, zodat deze niet in de database komt of zelfs Drupal ‘raakt’, is nog beter. Memcache is een goede methode, een platte tabel in het geheugen die direct door de webserver geraadpleegd wordt en waarvoor een Drupal module bestaat. Nog beter is het om Varnish te gebruiken, een reverse proxy, waardoor de verzoek nooit bij het CMS komt, maar direct geserveerd wordt. Dit werkt echter enkel voor niet ingelogde gebruikers en wil je in het CMS meten hoe vaak een artikel gelezen is in Drupal dan moet je hier dus omheen werken met een extra Javascript-call bijvoorbeeld. Typisch kan je met Varnish honderden keren meer gelijktijdig bezoek verwerken, terwijl het maken van de ‘index,html’-pagina onder deze druk van seconden naar 20 milliseconde gaat.

SSL-decrypter

Om er voor te zorgen dat de content zo ‘vers’ mogelijk blijft, dien je de expiratie-headers in de webserver zo te zetten dat de caching lang genoeg is, zonder dat de site gedateerd wordt en kan je de purge-module gebruiken. Je doet er goed aan om in de headers van de site te controleren dat het verzoek ook daadwerkelijk gecachet wordt en uit Varnish geserveerd wordt. Sommige modules en maatwerk willen dit nog wel eens voorkomen. Als jouw website https gebruikt, kan deze niet door Varnish gecachet worden. Hiervoor kan je een SSL-decrypter gebruiken voor Varnish, bijvoorbeeld pound, zodat Varnish toch gewoon http-verkeer ziet.

Zoekfunctionaliteit

Als de website een zoekfunctionaliteit heeft en je deze eigenlijk wil uitbreiden naar zoeken in binaire bestanden (PDF, MS-Word etc), neem dan een extra virtuele server met SOLR, installeer de module. En meest van belang, zorg dat de interne search de content niet meer indexeert, want de searchtabel is vaak de helft van de omvang van de database.

SPDY

Als je https gebruikt, denk dan ook aan de optie SPDY te gebruiken, de opvolger van het oude stateless http-protocol en de kandidaat voor http/2.0. Talloze browsers ondersteunen dit protocol al. Met SPDY worden de elementen van de pagina in een keer in een request gecomprimeerd geserveerd en dit kan zeker bij netwerken met een hoge latency het laden van een pagina halveren. Voor vele webservers (waaronder Apache en Nginx) zijn modules voor SPDY beschikbaar. Als dit nog geen optie is, kan je kijken naar de Google module ‘pagespeed’. Deze beschikt over een keur van opties; on the fly Javascript en CSS in de HTML injecteren als dat sneller is, lazyloading van plaatjes ‘onder de vouw’, het genereren van sprites en talloze andere opties die je site aanzienlijk sneller kunnen maken.

CDN

Een CDN kan een oplossing zijn voor tal van problemen: DDoS-bescherming, lagere roundtrip-tijden voor internationale gebruikers, maar ook meerdere requests serieel doen om de site sneller te laden. Dit kan je ook doen, zonder een echt CDN te gebruiken. De meeste browsers hebben een laag maximum aan het aantal requests dat men een dergelijke tijd kan doen naar een enkele server. Dus voor een pagina met 40 plaatjes vuurt de browser bijvoorbeeld 16 requests op de webserver af en wordt het volgende request pas geplaatst als er een succesvol is verwerkt. Hiervoor zou men dus kunnen werken subdomainen, om de pagina voor example.com te serveren gebruikt men CNAMES als images1.example.com en images2.example.com en doet men round robin loadlbalancing over deze namen. Hierdoor kan de browser nu 16 requests naar images1 sturen en 16 naar images2 op het zelfde moment. Er is een Drupal-module die dit voor je doet en ook kan integreren met de grote echte CDN’s als Akamai. Deze truc heeft enkele nadelen, je moet voorkomen dat searchengines images1.example.com gaan indexeren en er is een tradeoff omdat een extra DNS-request ook tijd vergt. Daarbij zal in de toekomst mede door SPDY deze techniek snel overbodig zijn.

Tips op servergebied

Op het gebied van de server zelf zijn er ook tal van tips van toepassing, al is dit verre van laaghangend fruit. Door bijvoorbeeld het renderbare deel van de pagina binnen de 1400 bytes te houden, kun je ervoor zorgen dat dit binnen een packet valt voor een ethernetverbinding. Het veranderen van de TCP slowstart-paramaters is een kernel hacking-ervaring, maar draagt ook bij aan een snellere website. Als gesteld, op elk gebied van business tot ethernet-frames dient men integraal webperformance te analyseren en verbeteren.

Meten van de snelheid

Er zijn vele tools die je kunnen helpen bij het meten van de snelheid en dit te verbeteren. Webpagetest en Yslow! zijn goede punten om te beginnen. Steve Souders is de autoriteit op dit gebied wereldwijd en zijn blog heeft tal van goede aanknopingspunten, 2bits heeft veel Drupal-specifieke artikelen online.

Zen and the art of innovation

Commodity sinks, innovations rises. An old rule. That is the reason why the drop has to keep on moving, have shorter release cycles and adopt new technologies faster to make sure we don't become what we replaced; old outdated systems that are slow to adapt and fast to extinct.

There are two sides to this, we have to grab new technologies faster and dump older technologies sooner. And so be sure that adopting new technologies faster doesn't create a legacy we have to release faster. Or at least, this is my opinion.

"Nobody" in the world logs in with openID anymore. Many appliaction to application in backend still might be using it, but nobody uses it to authenticate anymore. The last bastion Janrain just announced that it will close the doors. Drupal will drop OpenID form core as well and might have done this as well a long time ago. So when Drupal 8 will see the light in janury 2014, and we still would have release cycles over a 1000 days, the maintainers will still be dealing with an OpenID implementation and supporting it in Drupal core 7 up to the time D9 ships, somewhere around Q4 2017. Extrapolating our current release cycles most likely later, much later.


This is not to criticise anyone,I think it was a wise step to include OpenID back in the Drupal 6 days and it is a wise step to remove it from D8. But the time between making these releases and hence these decisions should be shorter. And especially the time that it impacts the code to maintain.

You want to know another example of an innovation that was once great and is now holding us back? gzipping pages. In core ever since 4.5 and was a great feature (though I had some problems back then :-)) But it is wrong. Holding us back in duplicate functionality that has to be maintained and is better being served in another OSI layer.

Back when webservers didn't compress pages and elements by default, it made perfect sense to do so from Drupal. A great way to save bandwidth and deliver the pages faster to the user. But now all webservers compress pages (and other elements like a big word document as an attachments served form /files/ !) by default, it is code that has to go. The innovation was great, but it sunk down to lower in the stack and became a commodity in all major webservers. And thta is the risk with all innovations and if one keeps holding to innovations that are already commodity, one ends up over there as well.

This holds true for many elements of frontend performance. Right now is seems like a good thing to combine multiple CSS or JS files in to one file. But once SPDY becomes mainstream this can better be done in HTTP protocol, not in the CMS.

And traditional frontend performance states that we have to use sprites in the template.

While if we add one module and one line of code this is all done at the webserver level with image sprites.

And we should use selective DATA URI's in our template. Most frontend devs will puke; binary data in a template? We are some old ugly old tchnology.
Again, with one command, the webserver layer will migrate these smaller images from flat files to inline DATA URI's.

Take a look at this impressive list of options where modpagespeed -a webserver module- can help you with:

  • Optimize Caching: Canonicalize JavaScript Libraries, Extend Cache, Extend Cache PDFs, Local Storage Cache, Outline CSS, Outline JavaScript
  • Minimize Round Trip Times: Combine CSS, Flatten CSS @imports, Inline CSS, Combine JavaScript, Inline JavaScript, Move CSS Above Scripts, Configuration file directive to shard domains, Sprite Image, Pre-Resolve DNS
  • Minimize Request Overhead: Rewrite Domains, Configuration file directive to map domains
  • Minimize Payload Size: Collapse Whitespace, Combine Heads, Elide Attributes, Minify JavaScript, Optimize Images, Prioritize Critical CSS, Deduplicate Inlined Images, Remove Comments, Remove Quotes, Rewrite CSS, Rewrite Style Attributes, Trim URLs
  • Optimize Browser Rendering: Convert Meta Tags, Defer Javascript, Inline Preview Images, Lazily Load Images, Move CSS to Head, Optimize Images, Convert JPEG to Progressive, Rewrite Style Attributes
  • Other: Add Head, Add Instrumentation, Inline @import to Link, Make Google Analytics Async, Insert Google Analytics Snippet, Pedantic, Run Experiment

Now for some of these actions there might be a Drupal module (lazyloading), for some functions one has to write good CSS/HTML/JS (CSS above scripts), some need good content editors or backend processes (de-duplicate inline images, progressive jpeg's) and some are just not doable yet in the frontend in an easy way (DATA-URI's).

So as a frontend dev (ops), do yourself a favour and do use the page speed module out for Apache and nginx AND keep writing good templates. And as a community Drupal community member, make sure that we keep innovating on the top , and let code free at the end where it is better being served outside of our hands.

(btw Mike Ryan, more retro future at this pinterest board :-)

SPDY and webperformance

Robert M. White
TL;RD

  1. Performance matter for all websites
  2. Performance is not just (80%) frontend
  3. SPDY kills 80% of your frontend problems

What
In the Drupal and broader web community, there is a lot of attention towards the performance of websites.

While "performance" is a very complex topic on its' own, let us in this posting define it as the speed of the website and the process to optimize the speed of the website (or better broader, the experience of the speed by the user as performance.

Why
This attention towards speed is for two good reasons. On one hand we have the site that is getting bigger and hence slower. The databases get bigger with more content and the the codebase of the website is added with new modules and features. While on the other hand, more money is being made with websites for business even if you are not selling goods or run ads.

Given that most sites run on the same hardware for years, this results in slower websites, leading to a lower pagerank, less traffic, less pages per visit, lower conversion rates. And in the end, if you a have a business case for your website, lower profits. Bottemline: If you make money online, you are losing this due to a slow website.
UFO's
When it comes to speed there are many parameters to take in to account, it is not "just" the average pageloading time. First of all the average is a rather useless metric without taking the standard deviation into account. But apart from that, it comes down to what a "page" is.

A page can be just the HTML file (can be done in 50ms)
A page can be the complete webpage with all the elements (for many sites around the 10seconds)
A page can be the complete webpage with all elements including third party content. Hint: did you know that for displaying the Facebook Like button, more Javascript is downloaded then the entire jQuery/backbone/bootstrap app of this website, non cacheable!
And a page can be anything "above the fold"



Moon Retro future
And then there are more interesting metrics then these, the time to first byte from a technologic point of view for example. But not just technical PoV. There is a website one visits every day that optimzes its' rendable HTML to fit within 1500 bytes.
So ranging from "First byte to glass" to "Round trip time", there are many elements to be taken into account when one measures the speed of a website. And that is the main point: webperformance is not just for the frontenders like many think, not just for the backenders like some of them hope, but for all the people who control elements elements in the chain involved in the speed. All the way down to the networking guys (m/f) in the basement (hint sysadmins: INITCWND has a huge performance impact!) Speed should be in your core of your team, not just in those who enable gzip compression, aggregate the Javascript or make the sprites.

Steve Souders (the webperformance guru) once stated in his golden rule that 80-90% of the end-user response time is spent on the frontend.

Speedy to the rescue?
This 80% might be matter of debate in the case of a logged in user in a CMS. But even if it is true. This 80% can be reduced by 80% with SPDY.
SPDY is an open protocol introduced by Google to overcome the problems with HTTP (up to 1.1 including pipeling, defined in 1999!) and the absence of HTTP/2.0. It speeds up HTTP by generating one connection between the client and the server for all the elements in the page served by the server. Orginally only build in chrome, many browsers now support this protocol that will be the base of HTTP/2.0. Think about it and read about it, a complete webpage with all the elements -regardless of minifying and sprites- served in one stream with only once the TCP handshake and one DNS request. Most of the rules of traditional webperf optimalisation (CSS aggregation, preloading, prefetching, offloading elements to different host, cookie free domains), all this wisedom is gone, even false, with one simple install. 80% of the 80% gone with SPDY, now one can focus on the hard part; the database, the codebase. :-)

The downside of SPDY is however that is is hard to troublshoot and not yet avaliable in all browsers. It is hard to troubleshoot since most implementations use SSL, the protocol is multiplexed and zipped by default and not made to be read by humans unlike HTTP/1.0. There are however some tools that make it possible to test SPDY but most if not all tools you use every day like ab, curl, wget will fail to use SPDY and fallback like defined in the protocol to HTTP/1.0

Measure
So can we test to see if SPDY is really faster and how much faster?
Yes, see Evaluating the Performance of SPDY-Enabled Web Servers (a Drupal site :-)
SPDY performance

So more users, less errors under load and a lower page load time. What is there not to like about SPDY?

Drupal
That is why I would love Drupal.org to run with SPDY, see this issue on d.o/2046731. I really do hope that the infra team will find some time to test this and once accepted, install it on the production server.


Performance as a Service
One of the projects I have been active in later is ProjectPAAS, bonus point if you find the easteregg on the site :-) . ProjectPAAS is a startup that will test a Drupal site, measure on 100+ metrics, analyse the data and give the developer an opinionated report on what to change to get a better performance. If you like these images around the retro future theme, be sure to checkout the flickr page, like us on facebook, follow us on twitter but most of all, see the moodboard on pinterest

Pinterest itself is doing some good work when it comes to performance as well. Not just speed but also the perception of speed.

Pinterest lazyloading with color
Pinterest does lazyload images but also displays the prominent color as background in a cell before the image is loaded, giving the user a sense of what to come. For a background on this see webdistortion


Congratulations you just saved 0,4 seconds
If you are lazyloading images to give your user faster results, be sure to checkout this module we made; lazypaas, currently a sandbox project awaiting approval. It does extract the dominant (most used) color of an image and displays the box where the image will be placed with this color. And if you use it and did a code review, be sure to help it to get it to a real Drupal module.


From 80% to 100%
Lazyloading like this leads to better user experience. Because even when 80% of the end-user response time is spent on the frontend, 100% of the time is spend in the client, most ofthen the browser. The only place where performance should be measured and the only page where performance matters. Hence, all elements that deliver this speed should be optimized, including the webserver and the browser.

Now say this fast after me: SPDY FTW. :-)

ProjectPAAS

The Outer Limits ... 'Cold Hands, Warm Heart'
A couple of weeks ago we launched the website of a service we have been working on hard for over half a year. The project started as a SAAS about performance and hence the internal project name was “ProjectPAAS”. As it goes with internal project names, it became the name of the service it self.

12 seconds start now

I still have problems explaining what the service is doing in an elevator pitch. But basicaly one installs a module on a to be tested staging site from d.o with the funky URL /project/paas, configures the service on the portal of projectpaas.com and then wait an hour or two. We start a service to measure your site from the outside and from the inside, analyse the data, make a report and when you check your mail you get an in depth report on all the elements of the chain that are relevant to the performance of the website.

1964 ... orbital assembly

We measure from one or more selectable (EC2) locations in the world with over 150 metrics and we only report on real data, no yslow wisdom. We know what influence speed, we see how it is configured at your site (with the module or from the outisde) and we simulate to find the the optimal value would be for your use case.

The cliché for example that one needs parallel download (images[1-4].example.com) to bypass the maximumum connection a browser can have to a host, is just that, a cliché. When one takes DNS lookup,TCP slow start and the sliding window in to account, for certain usecase, having images[x].example.com might actually be slower. So we are opinionated, we measure, we analyse, we report, you gain speed.

Easteregg

ProjectPAAS report 0.6
I really like retro future so we used this for a theme around the site and facebook. But since easter (Dutch "pasen" is coming up,
do check the projectpaas.com website, find the easteregg and twitter about it. :-)

This posting isn't as much about the service of ProjectPAAS as it is about why we made the service. To share our experience and to get feedback from you. There are two reasons we made it, one is internally driven and one is externally.

The internal reason is that we have been building some of the most visited sites and webapps in Drupal in the Netherlands. So after some time we got good at performance, we understood what to do and what not to do for the complete stack of elements that define speed, HTML, CSS, Linux, Apache, MySQL and yes, Drupal. Word got out that we were good and siteowners that have been building their site at another company, came to us for advice on how to get more speed in their site.
Once we had done a dozen of these reports, we wanted to make the reports more easily accessible for the site owners and website builders. This is part of why westarted the Performance Reporting

Land here

The external reason might be more interesting for you. We made the SAAS because we think that the CMS landscape will change and our business will change.

The landscape will change. 10 years ago everybody had his/her own CMS, there were more CMS-es then websites it seemed. 5 yeas ago it was clear who were going to be the winners in the consolidation, 80% of the proprietary "solutions" were gone and open source was no longer a dirty word in enterprises. Within the open CMS-es, the global top 5 was visible though especially in Europe there were still many local open source CMS-es. This consolidation perse was good open source and especially for Drupal shops.

1962 ... 'Planet Of Storms' (USSR)
However, the market won't stop here. Most of the Drupal websites are not complex, they don't have any connections to backend systems, less than 10k pageviews per day and are relatively expensive to build and most of all expensive to maintain. Here is the business case for open source SAAS, solutions based on open source software like Aqcuia and Wordpress.com offer. These solutions with standard modules and a customisable template is good enough right now for 20% of the Drupal sites out there and will cost a fraction of what building it "by hand" will cost.

The users of these open source SAAS hosting solutions will only grow. Good for the parties offering these services, bad for the Drupal shops that have been building relatively simple portfolio sites. By itself, this trend might have a big impact those coding Drupal core, modules or working in for example the security team. This is not meant in a bad way, but with most of the sites going towards a smaler group of SAAS companies, the number of "independent" individuals adding to core or writing modules might actually get lower, they might have another itch. It will be very interesting to see how this will develop, I might be completely wrong here.

Performance takes time

Traditionally most Drupal shops do projects, do maintenance and do consulting. Some have found a nice niche, a place geographically apart, a specific vertical or a certain service like migration from another CMS. However, most Drupal shops build relatively simple websites for SOHO plus. I know there are many shops that work for high end enterprises. But not all the 280.000 Drupal sites fit in the Alexa top 100.000. So I do think that if you are a Drupal shop, you have to find your sweet spot the next couple of month. On the one hand we have operational excellence (a SAAS to host sites like gardens or a service like ProjectPAAS itself) and on the other hand customer intimacy (the complex sites with lots of integration with backend systems and complex workflow). There might be space between these two, but the portfolio site area will get very crowded and Drupal will not be the best tool to serve this in my opinion. This is part of the reason why we build our first SAAS around a product we understand and is close to our core business. We are already planning next services that might still be build in Drupal but will target a broader audience.

ProjectPAAS logo
For the moment, if you are intersted in our product, dont be shy and talk to us on twitter or faces us. Potential resellers or users are welcome to fill out our form. We really do hope that our product can help you build faster websites and thereby push Drupal even more ahead of the curve.

Pong access.log with logstalgia

Let your webservers accesslog be the source of a game of pong :-)

If you are a brew OSX user, a oneliner to install :-)

see logstalgia

XML feed