Friday, 29 May 2009

Top 500 Sites in Australia According to Alexa

Need to analyse a list of top sites in Australia, and the following Python scripts helped me to get the site name + domain name off from Alexa with minimum effort:

#!/usr/bin/env python
import re, urllib
r = r'<a  href="/siteinfo/(.*?)"  ><strong>(.*?)</strong>';
for i in range(25):
    u = 'http://www.alexa.com/topsites/countries;%d/AU' % i
    for x, m in enumerate(re.findall(r, urllib.urlopen(u).read())):
        print '%d. %s (%s)' % (x + 1 + i * 20, m[1], m[0].strip())

YMMV. Considering how Alexa has been trying to obfuscate their HTML pages to prevent scrapping, I won’t be surprised that this script stops to work tomorrow…

Comments

1.
Avatar for Stewart
Posted by Stewart on Mon, 6 July 2009 9:20 am

Great thanks for this will this work for the UK or another country well i suppose theres only one way to find out.

Thanks Scott


2.
Avatar for Brisbane Translator
Posted by Brisbane Translator on Tue, 25 August 2009 8:57 pm

Hi,
Well translator is the process through we can change one language to an other wanted language just we have idea about that things or that language on which we are going to discussion or communication.


3.
Avatar for Angel Dresses
Posted by Angel Dresses on Tue, 5 January 2010 3:19 am

Does this still work? I would try it myself but not sure it if would be safe.


4.
Avatar for marquee for sale
Posted by marquee for sale on Sat, 9 October 2010 10:50 am

What a great piece of coding. It allows me to pull down the top competitors in Australia for my market, like marquee for sale, i can then study their links and seo etc for better ranking of my own website.


Add a comment

Gravatar is used. Email address is required but will not be displayed. Please keep your comment on topic. No spamming and/or bad language. First time poster will be moderated. Scott reserves the right to delete/edit your comments.