Making Docs Dance
I hate pointing and clicking again and again and again... If I have a bunch of word files I want to extract data from them, process it, and print them from the commandline. Then I can do a bunch of them at once.
Extracting text
antiword is a tool which extracts text from word files. It can do a bunch of extra stuff too. This makes it easy to pull data out, use standard text processing, then produce output.
Making new files
python-docx (confusingly there seem to be two modules with this name, I found the one by Mike Maccana to be more useful (better documentation). An easy way to programmatically produce Word files.
Printing from the command line
Although antiword supports this (convert doc to pdf, then pipe to printer via a2ps or similar), it has problems with the UTF-8 encoding. Instead, just use libreoffice from the commandline;
libreoffice -p *.doc
Making PDF output
libreoffice --convert-to pdf *.doc
No comments:
Post a Comment