Python

One good thing about working at Google has been that I’ve finally learnt Python. Properly. Which means I can whip up, in just a couple of minutes, a tested & documented script to, say, replace ~2,500 HTML files with redirects. So now anyone using an old link to any HTML page under https://homepage.mac.com/wadetregaskis will get automagically redirected to the page’s new location, here on https://wadetregaskis.com (shoved aside under /MobileMe/Sites/, ’cause in theory I’ll be doing a real website from scratch).

#!/usr/bin/python

import errno
import os
import sys
import urllib


if 3 != len(sys.argv) or '-h' in sys.argv or '--help' in sys.argv or 'help' in sys.argv:
    print >> sys.stderr, '''Usage: {0} SITE_FOLDER NEW_BASE_URL

SITE_FOLDER is the path to the local folder containing the site to be redirected.
NEW_BASE_URL is the base URL of the new site, to which the redirects will point.  It is up to you to ensure proper URL escaping is performed; it will not be done by this script (except for the paths this script generates, to then append to this base URL).

For example, if the local path /www/mysite/ is used for SITE_FOLDER, and NEW_BASE_URL is given as "http://foo.com/", then for a file /www/mysite/index.html a redirect will be created to "http://foo.com/index.html".'''.format(sys.argv[0])
    exit(errno.EINVAL)

(_, siteFolder, newBaseURL) = sys.argv

if not newBaseURL.endswith('/'):
    newBaseURL = newBaseURL + '/'

siteFolderPathLength = len(siteFolder)

if not siteFolder.endswith('/'):
    siteFolderPathLength += 1

for (folder, subFolders, files) in os.walk(siteFolder):
    relativePath = folder[siteFolderPathLength:]
    
    if relativePath:
        relativePath = relativePath + '/'
    
    relativePath = urllib.quote(relativePath)
    
    for file in files:
        (_, fileExtension) = os.path.splitext(file)
        
        if fileExtension in (".htm", ".html", ".xhtml"):
            replacementContents = '''<html>
    <head>
        <meta http-equiv="refresh" content="0;url={0}{1}{2}">
        <script src="http://www.google-analytics.com/urchin.js" type="text/javascript" />
        <script type="text/javascript">
            _uacct = "UA-103051-1";
            urchinTracker();
        </script>
    </head>
    <body></body>
</html>'''.format(newBaseURL, relativePath, urllib.quote(file))

            #print "Would replace {0}/{1} with:\n\n{2}\n\n".format(folder, file, replacementContents)

            with open(os.path.join(folder, file), "w") as f:
                f.write(replacementContents)

What’s more difficult, however, is updating all the absolute paths and other references to homepage.mac.com within the HTML files, now that they’re hosted on wadetregaskis.com. I did a bunch by hand, then realised there’s over a thousand HTML files in the Photos subfolder that each contain many homepage.mac.com links. Sod.

I had a quick look for a utility to perform this, but found only a handful, and none that support Macs. I’m sure there’s tools out there, probably Perl or Python or Ruby scripts, that would ultimately work, but finding them is probably more trouble than just reproducing them myself.

I also tried BBEdit, knowing it had FTP support (but no FTPS, dangerously), but it doesn’t appear willing to perform a search & replace via it. Bollocks.

Maybe I can write another script, but I’ve never done FTPS in Python before. Hmm…

Leave a Comment