In part 1 of Converting LaTeX to Word, I explained how I used Pandoc to convert from LaTeX to Word (doc, docx, RTF), but there were problems getting figure reference numbers to show up, because by default Pandoc cannot handle automatic numbering and referencing of figures like Latex can. The [pandoc-reference-filter] package was written to solve this problem.
Is it possible to get Pandoc to produce a.docx file from a markdown file which includes Latex equations, so that the equations are rendered and embedded as images in the Word document, instead of being rendered using the native Word equation objects? Pandoc is a powerful multi-format document converter, and it is able to convert Markdown 1 to LaTeX. So Pandoc is capable of writing the LaTeX corresponding to what you wrote as Markdown saving your time. And if you need a complex LaTeX command that Pandoc doesn’t support you can directly put LaTeX in the Markdown.
I am using pandoc through RStudio to convert.Rmd files to docx format. I am able to insert Latex equations and have them appear in my docx file correctly. However, there are a few errors here and there where equations appear correctly if I paste the code into something like this but end up garbled in the docx file.
[Note: Unfortunately I couldn’t get Pandoc to recognize the filters package – probably due to my inability to install Python packages in Windows correctly (doh!) – so I stopped trying because Latex2RTF works well for me, for now.]
How to get figure and table references to show up with the Pandoc LaTex to Word conversion scripts
- You need Python and of course Pandoc
- Install [pandoc-reference-filter] and [pandocfilters version 1.2.3]
- And then finally get your main ingredients together: see the previous post here
- Follow the markup and usage examples in [pandoc-reference-filter], compile with Pandoc, and your figures should be numbered and referenced correctly
Pros and Cons of using Pandoc to convert from Latex to Word
+ Pros
- You can use the 7000+ (as of this date) style files already available in the Zotero and CSL repositories. And BTW, csl style files are much easier to edit than bst files! This, IMO, is a huge benefit to using Pandoc for your conversions.
- You can convert to many other formats besides doc/rtf (e.g. HTML)
- Can easily define (hardcode) the name of the output file in the conversion script. This is handy because you might want to call the first draft “filename_v1.doc”, and after a revision, call it “filename_v2.doc”. For each revision, you just have to change the output filename in the script and every time you run the script, it will give it the name you predefined.
– Cons
- You need Python (hopefully this isn’t a dealbreaker for most people, though I couldn’t get it to work myself)
- Page breaks still don’t work, but there might a solution to that, somewhere, in some corner of the interwebs…
- Some special Latex commands may not work (check the FAQ and the mailing lists for further help)
- To solve this last issue, it’s been suggested to write directly in Markdown rather than Latex – though this defeats the purpose of writing in Latex.
Other posts in the LaTeX to RTF conversion series
- Using Scrivener (AKA Converting LaTeX to Word – part 4) — coming soon
I was preparing a journal article for a medical journal, and the only formats they accepted were Word DOC and RTF.
Pandoc Latex To Word Bibliography
Oh the horror of having to use Word to write a journal paper
I was horrified because I use LaTeX, where one can produce beautifully typeset documents containing millions of figures, tables, references, and subsections, all without having to worry about referencing each of these items manually in the document. I once tried to write a paper with lots of references and sections in Word, and gave up after a week – every time I made a single change by – for example, reordeing the references or adding a new section – all my inline references would get totally destroyed. So I was definitely not looking to repeat this experience.
So then what?
Once I decided I wasn’t going to touch Word with a 1000-foot pole, it was just a matter of finding some software that could convert from latex or pdf to Word. Easier said than done. While I did find a few options, I eventually settled on Pandoc. It was smooth sailing from there on. Well, not so much.
How to actually use Pandoc
Once I had everything installed, I figured it would be quite simple to convert from LaTeX to Word. In the end it wasn’t that difficult at all, it was just a matter of getting the syntax right. There were also some issues with references, but since I couldn’t see any error messages, it took some trial and error to figure things out. (See the bottom of this post for example code that actually works).
Pandoc can actually convert to/from quite a number of formats. This post focuses only on latex -> word. See the Pandoc page to find out the many other formats you can use!
Here are the ingredients you need:
- Bibtex bibliography file (see below, no special characters!)
- Journal citation style file (you can usually find what you need from CSL or Zotero)
- Optional: Journal abbreviations file (this was a pain and half to find and unfortunately I didn’t note where I got it from, so you can find my version here)
- Input file (latexfile.tex)
- Miktex or whatever you normally use to compile your latex file
- Decide if you want doc, docx, or rtf (I tried all three til I found the format I liked; there are subtle differences between the three formats)
Here are some lessons I learned:
- Not all LaTeX markups work, so it’s a bit of trial and error. I still can’t get the figure or table numbers to show up (which sort of defeats the purpose of not using Word, but at least the citations are OK. And the figures/tables show the markup text, so maybe search/replace can be a temporary workaround until I figure out how to get it to work.)
- Your bibtex file cannot contain any non-ascii (special) characters. I spent an hour or two going through my bibtex file removing all sorts of non-ascii characters
- Seems obvious, but your LaTeX file should not contain any references to bibtexkeys not in the biblio file (or it crashes with no warnings)
- Pandoc needs to be called from Windows Powershell (not CMD) and works with command line instructions
- I still haven’t been able to get latex page break commands to work, and it seems this has to do with Pandoc generating HTML-style documents
- Likewise, I will probably have to manually add page numbers to the Word doc when sending it to the co-authors for review
Other tricks & tips:
- I also installed Vim for Windows so I can write more productively (without having to grab the mouse every 5 seconds) – yeah I know, I use Windows!
- I then wrote a simple batch file so I can call the pandoc script from within Vi. Weeee 🙂
- Note that this is the workflow I created for me – if you use other referencing software, then some of this might be easier for you. Also some people successfully use Word to write manuscripts – if you are one of those people, do let me know how you do it without pulling all your hair out (and without manually inserting citations and references)!
Example code for this workflow (minimum working example, if you prefer) available here as a zip file. Winhki checksumcalculator antivirus download free. Program files like Pandoc not included.
Enjoy, and let me know if these tips were helpful in your quest to write more journal papers!
Pandoc Latex To Word Calculator
…I can now finally continue with writing the paper. Unless I can find another source of productive procrastination, such as, say, starting a blog?…………
Pandoc Latex To Word Converter
Update – see also:Converting LaTeX to Word – part 3 (Pandoc revisited)andConverting LaTeX to Word – part 2 (LaTeX2RTF)