
Say you want to find out if a site is using Drupal. You could dive in to the headers as was described by Lullabot some time ago and see if it is the birthday of Dries in the headers:
Sun, 19 Nov 1978 05:00:00 GMT
A much easier way and more generic is installing the "BuildWith Technology Profiler" extension in Chrome(ium). This add-on not just finds Drupal sites, but also other CMS-es like WP, Joomla and dozen of others as well as scans to see if for example Google Analytics code is on the page. A must have for the curious browser. If you find a nice site, you might tag it in delicious with "yads" (yet another drupal site) and or "drupalsite", take a look at some of my findings at http://delicious.com/bertboerland/yads.
Bit what if you want to know what version a specific Drupal site is running? Well, you could look for the CHANGELOG.txt file in the root but that file is often deleted. For good or for bad reasons. Personally I think it is good practise to give as little information as possible to the outside world, for example not echoing the version of the webserver you are running. This can be done in Apache by two lines
ServerTokens ProductOnly
ServerSignature Off
and this was done on drupal.org as well.
There has been some debate if Drupal should hide it's text files as well, like CHANGELOG.txt. Some other CMS-es do this or use a DIE to protect it from prying eyes. In the end consensus was that removing these text files will not make your site more safe; good procedures and adequate updating of core and contributed modules will!
So fingerprinting most Drupal site is easy, one just looks at the CHANGELOG file and knows what version the site is running. Hoewever, if you dont trust the changelog file or it is removed, it is still rather easy to fingerprint a Drupal site.
It can for example be done in the following way:
- Download a couple of Drupal core files. Unzip / tar -x them.
- Go through all directories to see what files changed. This can be done by something like:
diff -r -q drupal-7.7 drupal-7.8 | grep -iv info >> drupaldiffall
- Fingerprinting works best on JS or CSS files so grep the from drupaldiffall and put the in drupaldevjscss
- Now find the files that have changed most often.
cat drupaldiffjscss | grep -i "files" | cut -d " " -f 2 | cut -d "/" -f 2,3,4,5,6,7,8,9,10 | sort | uniq -c | sort | tail -10
12 misc/autocomplete.js
12 misc/collapse.js
12 misc/drupal.js
12 misc/farbtastic/farbtastic.js
12 misc/jquery.js
12 misc/progress.js
12 misc/tabledrag.js
12 misc/tableselect.js
12 misc/textarea.js
12 modules/color/color.js
So out of these lets pick the color.js file that changed 12 times. Note that with Drupal 7 CSS and JS most of the time don’t change at all where in the late 6 versions, these files changed more and more often. Hence the tail -10 outcomes will differ based on the source Drupal cores you downloaded (and yes I suck at regular expressions)
- The next step is to make the color.js file unique identifiable in all version. Here is where our old MD5 friend comes handy, the syntax might be different on BSD based systems versus GNU/Linux, but it will be something like:
find ./ -name color.js | xargs md5 > rainbow
And the rainbow file itself will be
cat rainbow
MD5 (.//drupal-5.22/modules/color/color.js) = 61098c218594ab871b48cd43459dc2ed
MD5 (.//drupal-5.23/modules/color/color.js) = 61098c218594ab871b48cd43459dc2ed
(etc)
- Now all we have to do is find the color.js file in a site we want to fingerprint and match it against this rainbow file:
grep `curl http://drupal.org/modules/color/color.js | md5` rainbow
MD5 (.//drupal-6.22/modules/color/color.js) = f5ea11f857385f2b62fa7bef894c0a55
So according to this Drupal.org is running the latest stable 6 version. Doing the same for the Belgium/Dutch site will give you less useful information:
grep `curl http://drupal.nl/modules/color/color.js | md5` rainbow | wc -l
7
So all we know now (if we didn't wc the outcome) is that is is one of the latest 7 versions of Drupal 7. So you have to start digging deeper:
more drupaldiffjscss | grep "drupal-7" | grep "Files " | cut -d " " -f 2,3,4,5,6,7,8,910 | sort | uniq -c (or visit http://drupal.nl/CHANGELOG.txt :-)
So why would one need this information you might ask. Since it is clear that in the wrong hands it will lead to... . Well, the bad guy knowing what version you are running. And to be honest, if the bad guy goes through so much trouble finding out what version you are running, (s)he was going to find out anyway.
But like all tools, it can be used for the Good. My employer takes over a lot of sites build by others (comes with the Drupal growing pains, the freedom of the GPL and the fact that the market is getting closure to an adolescent stage). Most of the times we have to give a raw estimate of maintaining and expanding the site, yet the prospect doesn't know what version he is running and doesn't want to ask his current supplier. By doing a quickscan on amongst others what version the site is running we know how well it was maintained and what budget would be needed to upgrade to the latest version. You might have a different usecase. For the Good.