Search This Blog

Tuesday, 20 May 2014

Automating MS Word on Linux

Making Docs Dance


I hate pointing and clicking again and again and again... If I have a bunch of word files I want to extract data from them, process it, and print them from the commandline.  Then I can do a bunch of them at once.

Extracting text

antiword is a tool which extracts text from word files.  It can do a bunch of extra stuff too.  This makes it easy to pull data out, use standard text processing, then produce output.

Making new files


python-docx (confusingly there seem to be two modules with this name, I found the one by Mike Maccana to be more useful (better documentation).  An easy way to programmatically produce Word files.  

Printing from the command line


Although antiword supports this (convert doc to pdf, then pipe to printer via a2ps or similar), it has problems with the UTF-8 encoding.  Instead, just use libreoffice from the commandline;

libreoffice -p *.doc

Making PDF output

libreoffice --convert-to pdf *.doc

No comments:

Post a Comment