Search This Blog

Thursday 22 May 2014

Towards an automatic REF report

Get list of faculty members
Use a tool to get a list of all their publications, via google scholar.  I could use scholar.py

./scholar.py -c 100  --citation=bt -a "Matthew Studley"

Then extract the publication title and look up its impact.  Alternatively, get impact for item direct from scholar?



Tuesday 20 May 2014

Automating MS Word on Linux

Making Docs Dance


I hate pointing and clicking again and again and again... If I have a bunch of word files I want to extract data from them, process it, and print them from the commandline.  Then I can do a bunch of them at once.

Extracting text

antiword is a tool which extracts text from word files.  It can do a bunch of extra stuff too.  This makes it easy to pull data out, use standard text processing, then produce output.

Making new files


python-docx (confusingly there seem to be two modules with this name, I found the one by Mike Maccana to be more useful (better documentation).  An easy way to programmatically produce Word files.  

Printing from the command line


Although antiword supports this (convert doc to pdf, then pipe to printer via a2ps or similar), it has problems with the UTF-8 encoding.  Instead, just use libreoffice from the commandline;

libreoffice -p *.doc

Making PDF output

libreoffice --convert-to pdf *.doc