Hack 52. View Microsoft Word Documents in a Terminal
Avoid the load time of OpenOffice.org and view Microsoft Word documents in a terminal.
The simplest way to view a Microsoft Word document in a terminal is to use the catdoc command. But catdoc turns a Word document to plain text, which does little or nothing to preserve the format of the original Word document. Obviously, it's nearly impossible to view a Word document in a terminal exactly the way it would look in Word. Heck, competing word processors have trouble importing Word documents without upsetting the format, and they have the advantage of being a graphical desktop application. But this hack is still a vast improvement over the popular catdoc program, because it preserves at least some of the formatting of the original document by converting the Word document to HTML.
You'll need both the wvWare set of file conversion utilities and the hybrid web browser/pager w3m, along with a little scripting magic to view Word documents in a terminal or console while retaining at least some of the original formatting.
7.5.1. wv, the All-Purpose Word Converter
There is a way to retain at least some of the original formatting while printing the document to the screen. For this, you need a set of utilities under the name of wvWare. You can find the home page for wvWare at http://wvware.sourceforge.net. Packages of wvWare are readily available for almost all Linux distributions, although the package name is usually just wv. For example, if you don't already have it installed on your system, you can install wv in Debian Linux with this command:
# apt-get install wv
Users of the yum package can get the RPM version of wv with this command:
# yum install wv
7.5.2. w3m, the All-Purpose Web Browser/Pager
That's not all you need for this hack. You also need a popular pager/browser called w3m. Packages of w3m should be available for most Linux distributions, and the package name is usually w3m. For example, you can install w3m in Debian Linux with this command:
# apt-get install w3m
Users of the yum package can get the RPM version of w3m with:
# yum install w3m
The w3m program is rather unique in that it is a web browser that works like a pagerthat is, you can pipe text into w3m and use w3m to simply page back and forth through the text. Some versions of w3m even render graphics in a frame-buffer console without having an X Windows desktop running.
You can combine the two utilities to get the desired result of viewing a Word document in a terminal. Use wvWare to convert a Microsoft Word document to HTML format, and then pipe the output into the w3m pager to view it. Here's the full command you need to make it work (this command assumes wvHtml.xml is stored in the /usr/lib/wv directory, which might not be the case on your Linux system):
$ wvWare -x /usr/lib/wv/wvHtml.xml document.doc | w3m -T text/html
That's a lot of typing every time you want to view a Word document, so turn it into a script called viewdoc to make it easier to use in the future. Log in as root and use your favorite editor to create the following script:
#!/bin/bash wvWare -x /usr/lib/wv/wvHtml.xml $1 2>/dev/null | w3m -T text/html
Note the one subtle addition, 2>/dev/null. This simply redirects any error messages to the twilight zone so that they do not interfere with the presentation of the Word document. Store it as /usr/local/bin/viewdoc and make the script executable with this command:
# chmod +x /usr/local/bin/viewdoc
Now all you have to do to view a Word document in a text console or terminal is issue this command:
$ viewdoc document .doc
Not only does this technique preserve at least some of the formatting of a Word document, but also, hyperlinks are live and you can activate them to visit the URL from within the w3m viewer you're using to view the document. Figure 7-3 shows an example of a Word document viewed with w3m. Note both the bold headings and the live link to http://www.bootsplash.de/files.
Figure 7-3. A Word document viewed in HTML text format