Thursday, December 13, 2007

Stripping line breaks with Python

I've been snatching lots of files from Project Gutenberg lately. Gutenberg is a great resource!

The problem is that the files have all their line breaks hard coded, instead of just paragraph breaks. This messes up the output from a PDA e-book reader (my Nintendo DS, actually) or from the printer. I did a quick search for small scripts to do this, but couldn't find any so I wrote my own.

The program takes an input filename and an output filename. This adds a bit more to the code, but this way, I can use it in a batch program that will loop across several files for conversion.

Code follows.



#!/usr/bin/python

import sys

if len(sys.argv) < 2:
print "Oops, need a filename to open."
sys.exit()
elif len(sys.argv) < 3:
print "Oops, need a filename to write to."
sys.exit()

filename1=sys.argv[1]
filename2=sys.argv[2]

fp1=open(filename1,"r")
fp2=open(filename2,"w")

while 1:
line=fp1.readline()
if line == "":
break
if len(line)==2:
fp2.writelines("\n\n")
else:
fp2.writelines(line[:-2])
fp2.writelines(" ")



Of course, if anyone has something shorter out there, I wouldn't mind using that instead.