Images Extraction from Microsoft Word Files

There can be all kinds of different information in an MS Word document: tables and diagrams, pictures and formatting elements, formulas and links. However, sometimes there can be a need to extract some image from the document and to save it separately. These images may be screenshots or scanned photos inserted directly into the word processor. Today we’re going to present two and a half ways which can help the readers to extract images from documents. We don’t consider the variant of making a screenshot and further inserting it into a graphical editor as this is a very time-consuming task and may lead to considerable distortion of the initial image or to a quality loss.

The first way is available both for new and older versions of Microsoft Office and consists in saving the document in a web-page format (*.htm, *.html). When saving, pay attention to the image compression which is being used (Tools-Compress Pictures…), you should remove the tick Automatically perform basic compression on save and choose appropriate output quality.

In the directory to which the page is saved there will be a folder under the name of the file being saved plus “.files”. In this very folder all the extracted images will be placed together with several supporting files.

It should to be noted, that there appeared an ability to save images with initial resolution in Office 2010.

In our view, the second way looks more elegant and it doesn’t require Microsoft Office package installed on the computer. However, the file should be saved in the Office 2007 or a newer format and should have a docx extension. It’s possible to save a file in such format not only with the help of the office package Version 12 (Office 2007), but also with a special converter for Office XP or 2003. Actually, an Office 2007 document is a ZIP-archive of several supporting files and the user’s images. We opened the folder with the documents in the new format with the help of the WinRAR x64 3.93 archiver. Below you can see information about one of the files in the folder.

We decided to extract the files from the archive, for which we use the Extract… button on the archiver toolbar. After unpacking we should go to the word\media directory located inside the folder for extraction. The extracted images will be placed in this catalogue. Here we should note that image cropping parameters are not saved, that is, if some image has been cropped with the help of the Microsoft Word editor, it will be full-sized after the extraction, meaning that the trimmed edges will be present too.

If a special archiver hasn't been installed on PC, then it's possible to change the file extension from docx to zip and use the embeded zip-folders feature to deal with it.

Our third way is mainly based on the second one, namely, it employs the WinRAR archiver for processing the document, but indirectly. We installed the Far 2.0 x64 file manager (build 1420) to which we linked preinstalled WinRAR for processing archives. It’s possible to make such linking with the help of the following registry file.

"Extract"="winrar x {-p%%P} {-ap%%R} -y -c- {%%S} %%A @%%LNM"
"ExtractWithoutPath"="winrar e -av- {-p%%P} -y -c- {%%S} %%A @%%LNM"
"Test"="winrar t -y {-p%%P} -c- {%%S} %%A @%%LNM"
"Delete"="winrar d -y {-w%%W} {%%S} %%A @%%LNM"
"Comment"="winrar c -y {-w%%W} {%%S} %%A"
"CommentFiles"="winrar cf -y {-w%%W} {%%S} %%A {@%%LNM}"
"SFX"="winrar s -y {%%S} %%A"
"Lock"="winrar k -y {%%S} %%A"
"Protect"="winrar rr -y {%%S} %%A"
"Recover"="winrar r -y {%%S} %%A"
"Add"="winrar a -y {-p%%P} {-ap%%R} {-w%%W} {%%S} %%A @%%LN"
"Move"="winrar m -y {-p%%P} {-ap%%R} {-w%%W} {%%S} %%A @%%LN"
"AddRecurse"="winrar a -r0 -y {-p%%P} {-ap%%R} {-w%%W} {%%S} %%A @%%LN"
"MoveRecurse"="winrar m -r0 -y {-p%%P} {-ap%%R} {-w%%W} {%%S} %%A @%%LN"

Now Far can work with docx documents just like with ordinary folders, namely, to look through their files and copy data into unarchived ordinary catalogues. It’s also possible to open files in related editors by simply pressing Enter.

However, just inserting a graphical file into an archive won’t do the trick: attempting to open the file Microsoft Word will report an error. For such operation one will need to edit xml-files with the structure of the document, but this is a whole new story.

Here we finish our brief review of several ways of extracting users’ images from MS Office documents. We just wanted to add that moving to the open document format in Office 2007 will make its support by third-party developers much more comprehensible and transparent.

Add comment

Security code

Found a typo? Please select it and press Ctrl + Enter.