TCP/IP

SPDY and webperformance

Robert M. White
TL;RD

  1. Performance matter for all websites
  2. Performance is not just (80%) frontend
  3. SPDY kills 80% of your frontend problems

What
In the Drupal and broader web community, there is a lot of attention towards the performance of websites.

While "performance" is a very complex topic on its' own, let us in this posting define it as the speed of the website and the process to optimize the speed of the website (or better broader, the experience of the speed by the user as performance.

Why
This attention towards speed is for two good reasons. On one hand we have the site that is getting bigger and hence slower. The databases get bigger with more content and the the codebase of the website is added with new modules and features. While on the other hand, more money is being made with websites for business even if you are not selling goods or run ads.

Given that most sites run on the same hardware for years, this results in slower websites, leading to a lower pagerank, less traffic, less pages per visit, lower conversion rates. And in the end, if you a have a business case for your website, lower profits. Bottemline: If you make money online, you are losing this due to a slow website.
UFO's
When it comes to speed there are many parameters to take in to account, it is not "just" the average pageloading time. First of all the average is a rather useless metric without taking the standard deviation into account. But apart from that, it comes down to what a "page" is.

A page can be just the HTML file (can be done in 50ms)
A page can be the complete webpage with all the elements (for many sites around the 10seconds)
A page can be the complete webpage with all elements including third party content. Hint: did you know that for displaying the Facebook Like button, more Javascript is downloaded then the entire jQuery/backbone/bootstrap app of this website, non cacheable!
And a page can be anything "above the fold"



Moon Retro future
And then there are more interesting metrics then these, the time to first byte from a technologic point of view for example. But not just technical PoV. There is a website one visits every day that optimzes its' rendable HTML to fit within 1500 bytes.
So ranging from "First byte to glass" to "Round trip time", there are many elements to be taken into account when one measures the speed of a website. And that is the main point: webperformance is not just for the frontenders like many think, not just for the backenders like some of them hope, but for all the people who control elements elements in the chain involved in the speed. All the way down to the networking guys (m/f) in the basement (hint sysadmins: INITCWND has a huge performance impact!) Speed should be in your core of your team, not just in those who enable gzip compression, aggregate the Javascript or make the sprites.

Steve Souders (the webperformance guru) once stated in his golden rule that 80-90% of the end-user response time is spent on the frontend.

Speedy to the rescue?
This 80% might be matter of debate in the case of a logged in user in a CMS. But even if it is true. This 80% can be reduced by 80% with SPDY.
SPDY is an open protocol introduced by Google to overcome the problems with HTTP (up to 1.1 including pipeling, defined in 1999!) and the absence of HTTP/2.0. It speeds up HTTP by generating one connection between the client and the server for all the elements in the page served by the server. Orginally only build in chrome, many browsers now support this protocol that will be the base of HTTP/2.0. Think about it and read about it, a complete webpage with all the elements -regardless of minifying and sprites- served in one stream with only once the TCP handshake and one DNS request. Most of the rules of traditional webperf optimalisation (CSS aggregation, preloading, prefetching, offloading elements to different host, cookie free domains), all this wisedom is gone, even false, with one simple install. 80% of the 80% gone with SPDY, now one can focus on the hard part; the database, the codebase. :-)

The downside of SPDY is however that is is hard to troublshoot and not yet avaliable in all browsers. It is hard to troubleshoot since most implementations use SSL, the protocol is multiplexed and zipped by default and not made to be read by humans unlike HTTP/1.0. There are however some tools that make it possible to test SPDY but most if not all tools you use every day like ab, curl, wget will fail to use SPDY and fallback like defined in the protocol to HTTP/1.0

Measure
So can we test to see if SPDY is really faster and how much faster?
Yes, see Evaluating the Performance of SPDY-Enabled Web Servers (a Drupal site :-)
SPDY performance

So more users, less errors under load and a lower page load time. What is there not to like about SPDY?

Drupal
That is why I would love Drupal.org to run with SPDY, see this issue on d.o/2046731. I really do hope that the infra team will find some time to test this and once accepted, install it on the production server.


Performance as a Service
One of the projects I have been active in later is ProjectPAAS, bonus point if you find the easteregg on the site :-) . ProjectPAAS is a startup that will test a Drupal site, measure on 100+ metrics, analyse the data and give the developer an opinionated report on what to change to get a better performance. If you like these images around the retro future theme, be sure to checkout the flickr page, like us on facebook, follow us on twitter but most of all, see the moodboard on pinterest

Pinterest itself is doing some good work when it comes to performance as well. Not just speed but also the perception of speed.

Pinterest lazyloading with color
Pinterest does lazyload images but also displays the prominent color as background in a cell before the image is loaded, giving the user a sense of what to come. For a background on this see webdistortion


Congratulations you just saved 0,4 seconds
If you are lazyloading images to give your user faster results, be sure to checkout this module we made; lazypaas, currently a sandbox project awaiting approval. It does extract the dominant (most used) color of an image and displays the box where the image will be placed with this color. And if you use it and did a code review, be sure to help it to get it to a real Drupal module.


From 80% to 100%
Lazyloading like this leads to better user experience. Because even when 80% of the end-user response time is spent on the frontend, 100% of the time is spend in the client, most ofthen the browser. The only place where performance should be measured and the only page where performance matters. Hence, all elements that deliver this speed should be optimized, including the webserver and the browser.

Now say this fast after me: SPDY FTW. :-)

ProjectPAAS

The Outer Limits ... 'Cold Hands, Warm Heart'
A couple of weeks ago we launched the website of a service we have been working on hard for over half a year. The project started as a SAAS about performance and hence the internal project name was “ProjectPAAS”. As it goes with internal project names, it became the name of the service it self.

12 seconds start now

I still have problems explaining what the service is doing in an elevator pitch. But basicaly one installs a module on a to be tested staging site from d.o with the funky URL /project/paas, configures the service on the portal of projectpaas.com and then wait an hour or two. We start a service to measure your site from the outside and from the inside, analyse the data, make a report and when you check your mail you get an in depth report on all the elements of the chain that are relevant to the performance of the website.

1964 ... orbital assembly

We measure from one or more selectable (EC2) locations in the world with over 150 metrics and we only report on real data, no yslow wisdom. We know what influence speed, we see how it is configured at your site (with the module or from the outisde) and we simulate to find the the optimal value would be for your use case.

The cliché for example that one needs parallel download (images[1-4].example.com) to bypass the maximumum connection a browser can have to a host, is just that, a cliché. When one takes DNS lookup,TCP slow start and the sliding window in to account, for certain usecase, having images[x].example.com might actually be slower. So we are opinionated, we measure, we analyse, we report, you gain speed.

Easteregg

ProjectPAAS report 0.6
I really like retro future so we used this for a theme around the site and facebook. But since easter (Dutch "pasen" is coming up,
do check the projectpaas.com website, find the easteregg and twitter about it. :-)

This posting isn't as much about the service of ProjectPAAS as it is about why we made the service. To share our experience and to get feedback from you. There are two reasons we made it, one is internally driven and one is externally.

The internal reason is that we have been building some of the most visited sites and webapps in Drupal in the Netherlands. So after some time we got good at performance, we understood what to do and what not to do for the complete stack of elements that define speed, HTML, CSS, Linux, Apache, MySQL and yes, Drupal. Word got out that we were good and siteowners that have been building their site at another company, came to us for advice on how to get more speed in their site.
Once we had done a dozen of these reports, we wanted to make the reports more easily accessible for the site owners and website builders. This is part of why westarted the Performance Reporting

Land here

The external reason might be more interesting for you. We made the SAAS because we think that the CMS landscape will change and our business will change.

The landscape will change. 10 years ago everybody had his/her own CMS, there were more CMS-es then websites it seemed. 5 yeas ago it was clear who were going to be the winners in the consolidation, 80% of the proprietary "solutions" were gone and open source was no longer a dirty word in enterprises. Within the open CMS-es, the global top 5 was visible though especially in Europe there were still many local open source CMS-es. This consolidation perse was good open source and especially for Drupal shops.

1962 ... 'Planet Of Storms' (USSR)
However, the market won't stop here. Most of the Drupal websites are not complex, they don't have any connections to backend systems, less than 10k pageviews per day and are relatively expensive to build and most of all expensive to maintain. Here is the business case for open source SAAS, solutions based on open source software like Aqcuia and Wordpress.com offer. These solutions with standard modules and a customisable template is good enough right now for 20% of the Drupal sites out there and will cost a fraction of what building it "by hand" will cost.

The users of these open source SAAS hosting solutions will only grow. Good for the parties offering these services, bad for the Drupal shops that have been building relatively simple portfolio sites. By itself, this trend might have a big impact those coding Drupal core, modules or working in for example the security team. This is not meant in a bad way, but with most of the sites going towards a smaler group of SAAS companies, the number of "independent" individuals adding to core or writing modules might actually get lower, they might have another itch. It will be very interesting to see how this will develop, I might be completely wrong here.

Performance takes time

Traditionally most Drupal shops do projects, do maintenance and do consulting. Some have found a nice niche, a place geographically apart, a specific vertical or a certain service like migration from another CMS. However, most Drupal shops build relatively simple websites for SOHO plus. I know there are many shops that work for high end enterprises. But not all the 280.000 Drupal sites fit in the Alexa top 100.000. So I do think that if you are a Drupal shop, you have to find your sweet spot the next couple of month. On the one hand we have operational excellence (a SAAS to host sites like gardens or a service like ProjectPAAS itself) and on the other hand customer intimacy (the complex sites with lots of integration with backend systems and complex workflow). There might be space between these two, but the portfolio site area will get very crowded and Drupal will not be the best tool to serve this in my opinion. This is part of the reason why we build our first SAAS around a product we understand and is close to our core business. We are already planning next services that might still be build in Drupal but will target a broader audience.

ProjectPAAS logo
For the moment, if you are intersted in our product, dont be shy and talk to us on twitter or faces us. Potential resellers or users are welcome to fill out our form. We really do hope that our product can help you build faster websites and thereby push Drupal even more ahead of the curve.

Pong access.log with logstalgia

Let your webservers accesslog be the source of a game of pong :-)

If you are a brew OSX user, a oneliner to install :-)

see logstalgia

Stealing network connectivity via powerlines?

Using neighbours network

This IS weird. I have my own protected wifi network. My macs are connected to it. I have a DHCP server giving 10.0.1/24 to wifi hosts. I have a DSL line towards XS4ALL.

Today I installed an ethernet over power out of the box between a fixed IP macmini down and WiFi / DSL router upstairs. When I could not mount a disk from my WiFi network towards the fixed macmini, I started digging... And found out the above picture; I AM connected to MY wifi network, however, the DSL router somehow has default gateway towards a network of a neighbour that seems to be connected to Versatel!?!?

Yes, my fixed macs still use my own WiFiconnection. But my wifi macs while using my own network go via the mesh network of the powerlines via DHCP server of the neighbour towards to the internet via my neighbour.

If you thought that stealing bandwidth via WEP Wifi was cool in the late 90ies, this Ethernet of Power breaking will be even bigger.

But.. but.. Surley the protocol running Ethernet of Power is encrypted? Yes. But with a default key! Both my neighbour and I shopped the same box at the local shop with the same boxes with the default key installed. Making our power of ethenret devices / lines ONE network. Once I "resetted" the key manually on both my devices, I couldnt see my neighbour anymore and it were two networks again.

So
1) never trust anything
2) always change the defaults
3) You will hear about breakins like this in the near future. For example snffing all the traffic via the office next to a political party...

Will put "encryption" on the power network to prevent this... I hope

Cookies, privacy, politiek en vooruit rijden door in achteruitkijk spiegel te kijken


((c) Arty Smokes )
Het is altijd leuk te zien dat politiek helemaal niet vooruitzien is maar achteruit kijken. Een soort auto besturen door in de achteruitkijkspiegel te kijken. Het gaat prima, zolang de condities niet veranderen; er een bocht komt. Echter, in de huidige wereld verandert alles ten alle tijde continue. Bochten, hellingen, splitsingen, ze zijn aan de orde van de dag. Dus als de politiek zich bezig gaat houden met het sturen en de motor van de auto terwijl ze in de achteruitkijk spiegel kijkt, dan weet je dat we de vangrail gaan raken.


( (c) bass_nroll)
Toen heel Nederland zich 5 jaar geleden begon te ergeren aan de absurde kosten voor het gebruik van data op een mobiele telefoon in het buitenland, ging de politiek zich bezig houden over de hoge kosten van SMS in het buitenland. Ik gok dat de politiek zich ook bezig hield met de hoge prijzen voor haver voor paarden die internationale koetsen trokken, toen de de treinen al decenia tussen de landen pendelen. Jawel, regeren is vooruitzien met een achteruitkijk spiegel.

Zo is het ook amusant te zien dat de NL en EU zich plotseling druk gaan maken over "cookies", kleine text bestanden (in de campingbrowser) of regels in een text bestand in echte browsers). Van tekst bestanden kan je geen geslachtziekte krijgen. En al rond 1996 wist ik hoe ik er mee om moest gaan, blockeren wat je niet wil en gebruiken wat handig is. Zonder cookies kan je moeilijk inloggen op sites en dus gebruik maken van diensten omdat HTTP nu eenmaal stateless is. Een gebruiker wil niet zonder. En gebruiker weet echter meestal niet wat de nadelen zijn, een cookie wordt gebruikt om een gebruiker te identificeren en kan dus gebruikt worden om iemand te... identificeren. Een cookie kan per definitie enkel gelezen worden op het domain waarop deze gezet is, sbs6.nl kan mijn cookie van nos.nl dus niet lezen. Daar is geen probleem. Maar het probleem is dat er domains zijn die nagenoeg tegenwoordig op elke pagina vorkomen, advertenties van google bv in combinatie met google analytics. Google kan dus vrij eenvoudig een gebruiker volgen over 70% van alle internet sites. Probleem: Wellicht. Echt: Nee

Tuurlijk, Google (maar ook anderen als facebook die mij vraagt of ik als eerste van mijn vrienden iets wil "liken" op een site) weten heel veel van mij. En ondanks dat Google haar policy in 2007 aanpast heeft weet ze -en tientallen andere grote bedrijven en honderden ads agencies- heel veel van mij. Erg? Wel, ik heb liever goede reclames dan slechte. Behavioraal of niet.

Hoewel ik wel van mening ben dat het absurd is om en camera boven je webste te hangen om Google te laten meten wie er binnen en welke jas hij draagt voor enterprises en met name overheden. Zeker als er zeer goede alternatieven zijn door zelf de ruwe analyse van data te doen in realtime en "gratis" aan de hand van de opensource oplossing aan de hand van bijvoorbeeld piwik.org/. En het lost een van de grootste problemen op; als websites webapplicaties worden zie je de enkel de pagina load; niet de interactie. Zie hiervoor mijn oude posting op when webpages become webapplications and the influence on statistics. Dus zelf je ruwe data analyseren, is de beste oplossing.

Natuurlijk doen alle uitgevers alsof de hemel vol cookies op ons dak komt. Maar ik ben geen Gallier, Ich bin ein Groninger. Dus niet bang voor de cookies of de hemel. Kom maar op.


((c) nettsu)

Een achterhoede gevecht dus dat cookie gedoe. Want er zijn echt heel veel andere manieren om een gebruiker uniek te tracken. Natuurlijk het source IP adres, maar dat is niet echt uniek als er vele adressen achter een "NAT" adres zitten. Maar ook de browser zelf is heel vaak uniek. Door de headers die verstuurd worden kan je zien welke versie het is, welke plugins geinstaleerd zijn en welke fonts ik heb. Die zaken samen zijn veel unieker dan men denkt en kunnen ook vor tracking gebruikt worden. Test je eigen browser eens op http://panopticlick.eff.org/. In mijn geval was mijn browser uniek op de reeds 1.6 miljoen geteste bowsers.

Your browser fingerprint appears to be unique among the 1,636,036 tested so far.
Currently, we estimate that your browser has a fingerprint that conveys at least 20.64 bits of identifying information.

Lees de informatie van de EFF op Every Browser Unique en de PDF

De combinatie van IP adres en browser maken echt wel dat Google of een ander bedrijf zonder cookies mij uniek kan tracken in mijn surfgedrag over het net. Natuurlijk wil men liever een gebruiker tracken dan een device, maar als ik af en toe gebruik maak met mijn IP adres en browser van Google diensten waarbij ik moet inloggen (google apps) ben ik honderd procent identificeerbaar.

Om duidelijk te maken hoe zeer dit vooruit rijden is door in de achteruitkijk spiegel te kijken, de hele cookie discussie is voorbij als we IPv6 hebben. IPv6 kent geen NAT, IPv6 maakt je device overal te wereld uniek. Per definitie; je koelkast, je TV, de PC van de kids. Allen kennen een uniek adres en dat is geen toekomst visie of vaag geblaat van een paarse broek. In mijn gezin zijn op elk moment van de dag zo'n 20 IP adressen in gebruik; iPhones, iPads, Macbooks, Mini mac, iMac, camara's, Wii en zelfs mijn TV hebben een IP adres. Nu nog ge-NAT maar straks echt een op een traceerbaar, lees IPv6 and the future of privacy

What does it mean to shift from the present addressing system (IPv4) to the ‘new’ system (IPv6)? To begin, it means that there is a lot more of IP real-estate; whereas IPv4 offers roughly 4.3 billion addresses, IPv6 provides 340 trillion trillion trillion (!) unique addresses. One can quickly appreciate the numerical difference. More significantly, it means that the system of LANs that we have today will no longer be required because of IP address scarcity. Each of the Internet-enabled devices in my home could have its own IPv6 address – there is no real need to route all the data through a single IP address that is provided by my ISP.
In a situation where all Internet enabled devices have a constant address, the regular refrain “we don’t know who’s IP address we’re monitoring; it is possible that a set of users are sharing the same address!” is quickly disabused. With a persistent IP address, depending on the degree of algorithmic surveillance, it is possible to develop very, very good understandings of who is presumably the agent ‘using’ the IP address. Similar to how marketers can figure out who you are with very little information, advertising companies such as Doubleclick are in a comparable situation to develop very detailed, very personal, accounts of the individuals that regularly use Internet enabled devices. In a situation where all devices have unique IP addresses, this could facilitate more accurate advertising (read: better targeted and more invasive), and that government agencies and ISPs alike could more accurately identify and track particular users online.

Heerlijk toch, de politiek houdt zich met problemen van 15 jaar geleden bezig. En implementeert oplossingen die over 5 jaar volledig onzin zijn. ""Telling the future by looking at the past assumes that conditions remain constant. This is like driving a car by looking in the rear view mirror." (Herb Brody) Dank Den Haag. Dank.

XML feed