willy

All your "I'm not agree with you. Sorry" are belong to us

Recently I have seen a couple of weird comments on my site. Not that having weird comments on a weird blog is weird, I have lots of proof that the internet is full with people like me.

But now there are some odd comments on my site:

Subject: No
I'm not agree with you. Sorry. Google google1980@gmail.com
http://www.google.com/index.html

The first time this happened I published it (anonymous comments are queued on my site). But now it happend again and again. Poor spelling in a "All your base are belong to us" way. And complete off topic.

But google for "I'm not agree with you." and you will find that this comment has been posted on lots of sites. And more then that, all the sites where the comment is listed are Drupal sites!

Since the link in the posting is innocent (google.com) I see no direct spam related reason for mass dumping this comment. Unless...

I think there are people out there, that are mass posting this innocent looking comment targeting at Drupal sites. And after some time, they will google for where the comment is published. And then they know, where they can push their real spamrun to.

What you should do:

  1. Check the source IP address of the comment, write it down, check who owns it and file a complaint at the provider. Some people think it is useless, I worked at a tier 2 ISP and we actually dropped customers that were abusing our IP space (but see below)
  2. Unpublish the comment! Dont let google index that you publish this comment. It is mass "spam", bad grammer and not relevant so unpublish
  3. Maybe even better: dont publish anonymous comment without approval. And be sure to read the Call for a Blogger's Code of Conduct

Regarding the first point, all my spam comments came from 87.230.18.23. When I visited this IP http://87.230.18.23/ I saw it was a node in the tor network, "Tor is used to anonymize web browsing and publishing, instant messaging, IRC, SSH, and other applications using the TCP protocol. I still filed an abuse report at abuse-mailbox: net-abuse @ hosteurope.de but since they host a tor site I dont think it will make a difference.

I hope people will act upon this posting, even if "you is not agreeing with me, otherwise you is sorry." :-)

Chat with me using Google Talk

Here you can chat with me using Google Talk. My account is bert.boerland at gmail dot com. If you are using a Jabber client, you can connect to me as well. An other option to chat with me is by using Skype (userid:bert.boerland). I have both IM client on all (windows) machines and are usually logged in on both. No MSN and that kind of IM clients for me.


Digg the glasses of Bert

When I was still a student, I once got home so drunk that it seemed like a fun thing to do to hide my glasses from myself, so I could find them "the next day". Needless to say it was already, "the next day".

So when I woke up I could see my glasses nor could see anything at all, I was getting "pissed" at myself. It took me 3 days to find them, hidden in the freezer...

Backyard Digging

Het regeerakkoord 2007 in een wolk

Goed er komt een neiuw kabinet. Niet helemaal mijn droomkabinet maar een kleur waar ik voor sta is vertegenwoordigd. Zullen we het maar het "wolkenkabinet" noemen; dicht bij droom ver van daad. Dit brengt me naar deze posting. Ik snap niet waarom mainstream media nog niet de tekst van het regeeraccoord geautomatiseerd gecloud hebben.

Zie hier:

aandacht ( 29 ) aanpak ( 19 )aantal ( 14 ) afspraken ( 19 ) andere ( 24) basis ( 19 ) bedrijven ( 15 ) beleid ( 30 ) bestaande ( 15 )bestuurlijke ( 14 ) betrokken ( 15 ) binnen (16 ) burgers ( 31 ) daarbij ( 21 ) dan ( 29 ) dit ( 35 ) duurzame ( 18 ) economie ( 16 ) economische (18 ) eigen ( 18 ) elkaar ( 18 ) europese ( 17 ) extra ( 16 )geen ( 17 ) gemeenten ( 27 ) gericht ( 19) gestimuleerd ( 16 ) geven ( 18 ) goed ( 16 ) goede ( 18 ) grote ( 21 ) hebben ( 29 ) heeft ( 18 ) internationale ( 16 ) inzet ( 14 ) jaar ( 30 ) kabinet ( 21 ) kabinetsperiode ( 16 ) kader ( 28 ) kan ( 35) kinderen ( 21 ) komen ( 27 ) komende ( 20 ) komt ( 35 ) krijgen ( 16 ) kunnen ( 36 ) kwaliteit ( 29) leven ( 22 ) maar ( 23 )maatregelen ( 16 ) maatschappelijke ( 35 ) maken ( 20 ) mede ( 14 ) meer ( 50 ) mensen ( 55 ) middelen ( 17 ) minder ( 17 ) moet ( 33 ) moeten ( 32 ) mogelijk ( 31 ) mogelijkheden (16 ) na ( 19 ) nederland ( 34 ) nederlandse ( 14 ) niet ( 46 )nieuwe ( 39 ) nodig ( 24 ) nog ( 14 ) onder ( 20 ) onderwijs ( 31 ) ons ( 26 ) ontwikkeling ( 31 )onze ( 22 ) organisaties ( 19 ) ouders ( 15) overheid ( 43 ) partners ( 14 ) plaats ( 19 ) project ( 20 )publieke ( 28 ) rol ( 14 ) ruimte ( 28 ) samen ( 23 ) samenhang ( 17 ) samenleving ( 43 ) samenwerking ( 18 )scholen ( 15 ) sociale ( 30 ) staan ( 15 ) ten ( 27 ) toekomst ( 14 ) tussen ( 27 ) uit ( 27 ) veel ( 16 ) veiligheid ( 23 ) verdere ( 17) vertrouwen ( 19 ) voortgezet ( 15 ) waar ( 21 ) waarin ( 32 )we ( 28 ) werken ( 36 ) willen ( 18 ) zich ( 35 ) zo ( 16 )zorg ( 29 ) zullen ( 54 )

Een wolk van hoevaak welke woorden voorkomen. leuk om te zien en je conclusies te trekken over waar het zwaartepunt ligt binnen het regeeraccoord. Woorden die vaker voorkomen zijn groter, woorden die minder vaak voorkomen kleiner.

Mocht een journalist willen weten hoe ik dit gedaan heb, het was minder dan 30 minuten werk. wget het regeeraccoord, maak van het PDF bestand een postscript bestand (pdf2ps) en dan een text bestand (ps2ascii). delete alle commas en leestekens (tr --delete ',.'), zet de tekst in kleine letters (cat regeer.txt | tr [[:blank:]] '\n' | tr [A-Z] [a-z] > clean.txt). Sorteer hoevaak een woord vookomt (cat clean.txt | sort | uniq -c | sort -n) en neem daar de laatste 50 entries van. Schoon deze met de hand op noisewords (de het een etc...), weeg het aantal keren dat een woord voorkomt (waarde/maximale waarde) en geef dit als gewicht mee aan het font.

Wellicht een lesje voor de heren "beroeps journalisten".

Drupal, Spam and the art of Google

The other day I was hit hard on one posting with a spamrun. Not uncommon unfortunately, lost of Viagra pushers and real Rolex shifters post comments at many blog sites. While looking in to it, I saw that moments for the spamrun a real human was visiting that same node with a weird referrer URL. (S)he was Googling for "powered by drupal" "post new comment".

On the result of that query you will find a posting of mine around position 6. So thats how they find sites for dumping their nonsense comments, just FGI! Not that smart, I would use the same way of finding a Drupal site where I could post a comment, google for it.

But Google is smarter then the scum of the earth. You want to see? Go to Google looking for "powered by Drupal" and "post new comment". You will see lots of Drupal sites. Now go to the pager at the bottom and click "next page" or click a random page number. And again. And again. Do it a couple of times and you will hit a page like this

We're sorry...

... but your query looks similar to automated requests from a computer virus or spyware application. To protect our users, we can't process your request right now.

We'll restore your access as quickly as possible, so try again soon. In the meantime, if you suspect that your computer or network has been infected, you might want to run a virus checker or spyware remover to make sure that your systems are free of viruses and other spurious software.

We apologize for the inconvenience, and hope we'll see you again on Google.

See? If you look for something that can be used, you are being served. If you look too much and most likely want to misuse that data, you are blocked. Cool, Google is protecting the blogosphere and Drupal in particularity!

Now dont try to Google to much for words to find out how to get blocked, because in the end you will. I once was at a customer location where a couple of hundred users where behind one NATted IP address. There has been a roque client that did automated searches on Google. The result was that all those hundreds of real users where blocked and needed to fill in a CAPTCHA to continue using "the startpage of the internet".

XML feed