
Edit: The Google Gadget has been changed and most of this post is now irrelevant. I leave it up, though, so that people can see what not to do.
I want to thank Rich Schiavi, though, for finally getting in touch with me about this situation and removing the XML file in question.
Many years ago, when I first moved here to England, I had a lot of time on my hands. I was adjusting to a whole new culture, I was jobhunting, and I was playing around with web design. As you do.
One of the things I made during that period was something called The Big Lebowski random quote generator. This was before I purchased the hard copy version of the screenplay (or “acquired” that .txt version I now have), and I spent a few good hours furiously typing and rewinding, wearing my video copy out as I got down the best lines from one of my all-time favourite movies.
It ran on a Perl script, because I knew how to install Perl scripts. I never felt the need to update the script, because, hey, it did what it needed to do, and the only changes I made were purely cosmetic — a cleaned up HTML page for it to load onto, editing the .htaccess so that Perl scripts would load on .html instead of just .shtml, that sort of thing.
The random quote generator was a quiet hit. It showed up on MetaFilter in 2004, and spread its way through various blogs from then on.
And it was nice. A nice bit of Internet fun that didn’t do much, but brightened people’s day.
Everything was quiet until last week, when I received an email from a random person. The Lebowski Quote Generator Google Gadget wasn’t displaying properly — could I have a look and fix it?
I definitely knew I hadn’t made a Google Gadget for it, so after a small amount of research, I discovered The Big Lebowski Google Gadget, created by Rich Schiavi.
It’s a very simple XML file — it creates an HTML file, and JavaScript pulls my website onto the page:
<script language="JavaScript">
function pissonrug ()
{
// should really figure out how big the quote is and adjust size
var url = "http://www.dymphna.net/randomquotage/lebowski.html?"+Math.floor(Math.random()*2001);
_IG_FetchContent(url, function (responseText)
{
var idx1 = -1; // start of HTML
if (idx1 -1)
{
idx1 = responseText.indexOf("div class=\"quote\"");
}
var idx2 = -1;
if (idx2 -1)
{
idx2 = responseText.indexOf("<p class=\"nav\"");
}
//
if ((idx1 != -1) && (idx2 != -1))
{
var response = responseText.substring(idx1-1,idx2);
document.getElementById("quote").innerHTML = response;
}
else
{
document.getElementById("quote").innerHTML = "error, dude!";
}
// look for "quote" find first <p> after up to first </div>
}
);
}
pissonrug();
</script>
(I tried to clean it up a bit so that you could see what it was doing. But JavaScript isn’t my regular thing, so I might have made a bigger mess of it.)
Well, you know how it is. This aggression will not stand, man. So I emailed Rich Schiavi to talk to him about it. The email bounced back – mailbox full. I went to the domain name he uses for his email – moviewares.com – and tried the email address there. Again, bounced back, mailbox full.
In the meantime, I receive an email from another random person. The Gadget hasn’t been working, and could I fix it, please?
I look at my page and discover that, no, it’s not working. It’s not working, because the Perl script can no longer execute on .html files. And I don’t just have a The Big Lebowski random quote generator, no. I have a whole range of them, because I was that bored.
(Brief tangent: Wow, check out my ugly-ass 2002 design skills. I ought to do something about that site.)
I go back to the Google Gadget page and discover that 4000 people are using it. That’s 4,000 people who are loading my page every time they go to Google. 4,000 people hitting “refresh” each time they want a bit of Lebowski during their long working day. From maybe around 10-20 people visiting it to 4,000, no wonder my webhost cut down the rights for that little script.
I make some changes to the site, add snippy comments to the original page and the Google Gadget page, and send a help request to Google’s legal team, hoping that they can get the Gadget taken down before it blows out my bandwidth.
But even if it does get removed, it’s not going to stop someone else doing this. Or even Rich Schiavi (I’m going to keep on repeating that name, so that when someone searches for him, they get this page) doing it again. So I’m asking y’all for help.
Disallowing moviewares.com from accessing the site doesn’t work, because it doesn’t look as though it’s coming from there — it looks as though it’s coming from Google, who I naturally don’t want to block. Changing the id used in the CSS prevents the JavaScript above from working, but it just takes one clever sod to look at source code and get the new one.
So what can I do? Is there anything I can do?
(And, no, the irony of me wanting someone to stop using my already copyright-infringed content isn’t lost on me. It’s not about the quotes, it’s the screenscraping of my website. If he wanted help setting it up on his own site, I would’ve helped him.)
Nov 14, 02:33 PM |
Commenting is closed for this article.
Easily amused 30-something Southern Californian now living in Nottingham.
Read more...
check your email. should be resolved. i offer up a solution that might work i’d be glad to assist or implement to prevent this.
if you want, i’ll implement that page as a flash .swf which can’t be scraped.
By rich | Dec 4, 02:18 AM