Referring urls in Google Analytics are stripped from query parameters. This makes it hard to know e.g. what thread a referral came from, since that information is typically contained in the referrer’s query parameters. There exists a couple of workarounds to the problem, which a Norweigan blogger has improved upon.

The final remedy is to edit the analytics account and add a filter. The filter should have the following settings:

Filter Type
Custom filter, select Advanced
Field A -> Extract A
Referral: (.*)
Field B -> Extract B
Campaign Medium: (^referral$)|(^organic$)
Output To -> Constructor
Campaign Content: $A1
Field A Required
Yes
Field B Required
Yes
Override Output Field
Yes

Dumping mms streams

March 10th, 2008

It’s becoming common for sites to use streaming media without providing downloads. Luckily mplayer, a handy command-line movie player, comes to the rescue – allowing one to dump the stream to a file with the following command.

mplayer mms://foo.tld/bar.wmv -dumpstream -dumpfile bar.wmv

I recently noticed that eigenclass.org lists the top referrers to each page (look at the bottom of the page). It strikes me as an excellent idea from a SEO standpoint as you automatically create reciprocal links to pages. The pages listed will probably both have contents related to the topic and be popular (i.e. probably be useful). It also creates an incentive for sites to link to you.

That leaves the question why there aren’t more sites doing it. Personally I think it’s simply something that people haven’t considered. It’s not awfully hard to do.

One risk is that spammers will catch on and spam the server with referrals from the sites that they want to list (e.g. in addition to posting spam comments). It might however be easier to discriminate between spam and legitimate referrals than between e.g. spam and legitimate comments. The spammers has to commit several bots to outrank legitimate links (assuming only unique IPs are counted), and will probably not have a natural-looking temporal distribution unless the spammer spreads the referrals out over a long period of time. Additionally there’s no need to display the links instantaneously, one can wait for e.g. a week before deciding whether the referrals are legitimate. One could even scrape the referring url and run it through a bunch of checks (checking whether the contents is sane, what search engines think of it, whois age etc) since the number of urls/domains to validate will typically be small.

In any case this is certainly something that I’m going to keep in mind and apply when I get the opportunity.

HTTrack

September 8th, 2007

This is the command I used to copy dkplp.org using HTTrack

httrack "http://dkplp.org/about/overview.html" -O "~/websites/dkplp" "+dkplp.org*" "-dkplp.org/wiki*" "-dkplp.org/w*" "-dkplp.org/forum*" "-*.zip" "-*.tar.gz" "-dkplp.org/scarab*" "-dkplp.org/jdoc*" -v -%F ""

I then used a handy oneliner to do some search and replace on the downloaded files.

I have used Google’s search API to create search engines before with SOAP, it worked fine but was a bit of work. I recently noticed that they seem to have made it easy to just tack on a custom look and feel on top of their engine. It’s probably worth keeping in mind for next time.

Google sitemaps in webgen

June 27th, 2007

I recently started using webgen for the Gecode/R website. I have just found a plugin which generates google sitemaps for it.

Next CMS to try

April 9th, 2007

I have been on a search for a content management system to use for static portions of sites such as dkplp.org. I want a system to use when the site-construction itself isn’t the interesting part. The current CMS used is Lenya, but it feels over sized for the purpose and has some quirks.

CMS Made Simple seems like a CMS which might fit the task. So I will try to replace Lenya with it when when I get the time.

Crawling via google's cache

December 29th, 2006

Google’s cache is rather extensive, especially when it comes to popular sites. Therefor a neat trick to keep in mind if one every has to crawl a slow site or don’t want to hog the poor server’s bandwidth is to go through google instead. If you want to fetch google’s cache for a page with url http://foo.tld/bar then just the google search for cache:foo.tld/bar (e.g. http://www.google.com/search?q=cache%3Afoo.tld/bar) . It will display google’s cache of its latest crawl, ready to be reused.

This is of course not to be recommended for sites that don’t rank well in google as they are rarely thoroughly crawled.

Permit Cookies Firefox plugin

December 24th, 2006

Cookies is another thing that I’m not too fond of. To me cookies are currently used similar to how javascript is used, i.e. rarely necessary for functionality but often used to track users. Therefor I find the Permit Cookies Firefox plugin handy. It allows you to reject cookies from domains that are not white-listed. So one can still use cookies for e.g. remembering logins while still avoiding the bulk of malicious cookies.

NoScript Firefox plugin

December 20th, 2006

NoScript is a useful firefox plugin which allows the user to block javascripts (either via a whitelist or a blacklist). Personally I think it’s nice to be able to block javascripts as they are usually not necessary for functionality but often used to track users and for advertising. As a bonus one also avoids the bulk of cross side scripting vulnerabilities.

The python wikipediabot

December 18th, 2006

If you’re an editor of a wiki or two running MediaWiki then be sure to have a look at the python wikipediabot if you have not yet done so. It’s a very handy framework which makes it easy to create bots for repetitive tasks.

GWT becomes open source

December 13th, 2006

The Google Web Toolkit (GWT) is now open source under the Apache 2.0 license. This means that I might be starting to use GWT when I find it appropriate.

I personally hate using JavaScript because of the difficulty of debugging it (and making sure that it works for everyone) and because of not wanting to force people to run JavaScript in their browsers (I myself run NoScript). With GWT I can get around the first problem by writing in Java and hence use JavaScript when there’s no feasible alternative.

Flow control is often needed (or at least helps) when creating advanced templates in MediaWiki. This is where the excellent ParserFunctions extension comes in handy. It provides all kind of handy if constructs and other valuable utilities.

It took a while before I found it though, probably because it hasn’t received enough attention, possibly because most templates don’t require it.

Custom google maps

December 10th, 2006

Some site use a slightly customised google map. Few use the more advanced functionality such as replacing the map with a custom map and making it more interactive.

There’s a really good site that covers pretty much everything that’s possible to do with google maps. It’s a handy link to keep around as one never quite know when one might need it.

Creative commons meta data

December 7th, 2006

The creative commons meta data is needed in order for google and others to properly recognise a page as licensed under creative commons.