2013-04-30

"Cleaning" a document prior to editing with MSWord

It really doesn't matter what text editor or wordprocessor the author used to create his manuscript. Before editing I always have to clean it, map it and format it as I like it to appear both on my Kindle and on screen.

My aim is to finish with a document that has no more than three different character/paragraph formats, has no manual page breaks, has numbered chapter headings with an outline level, a table of contents, and page numbers.

To do this, these are the stages I have to go through:

Redundant characters are mostly in the class of characters called "print controls". These are tabs, multiple paragraph marks, new line (and sheet feed), manual page and section breaks.

a. remove the tabs. Manually, this is done using find/replace (ctrl+h). In the search box you enter ^t, and you leave the replace box blank. And you click replace all.

b. replace any new line or sheet feed characters with paragraph marks (see below). You can enter these characters by accident with certain key presses, and they can creep in when switching formats from other software. I could go into why, but I won't. Also some users have noticed that paragraph styles don't get automatically applied to the paragraphs that terminate with a new line character, and so use them as a sort of workaround. The sort of workaround that you discover you don't need if you read the manual.

find/replace (ctrl+h). In the search box you enter ^l and in the replace box you enter ^p
click replace all

c. replace section and page breaks with paragraph marks.

find/replace (ctrl+h). In the search box you enter ^m and in the replace box you enter ^p
click replace all
find/replace (ctrl+h). In the search box you enter ^b and in the replace box you enter ^p
click replace all

d. remove multiple paragraph marks. I will explain this.

find/replace (ctrl+h). In the search box you enter ^p^p and in the replace box you enter ^p
click replace all
repeat until it stops finding them


e. reserve italics.

reserving italics is creating a begin block and end block for italics, and putting it around every instance of text in italics. I use the BBCode for italics, so that any text currently in italic will be surrounded by [i] and [/i].

find/replace (ctrl+h). leave the search box blank, but from the "formatting" button select font, and from the dialog box, select "italic". In the replace box put [i]^&[/i]
click replace all

f. delete headers and footers.

go to "file/page setup" and open the layout tab (in Word 2010 this is the design tab under Header and Footer tools) and make sure that "different first page" and "different odd and even pages" are unchecked.

Open the header on the first page. Select all (ctrl+a) and press delete. Go to the footer, select all, and press delete. Close the headers and footers (doubleclick on the page).

g. clear all styles and formatting from the document.

there are a number of ways to do this, but the most sure is as follows:

anything up toWord  2007: from the style selector in the formatting toolbar, OR from the menu format/styles and formatting, select EITHER "clear formatting" to remove all styles, formatting and decorations from the text OR "Normal" to apply the default style to the whole document. The former will strip decorations (bold, italic) from the text, but the latter will not.
anything from Word 2007: from the Home menu select "clear formatting" or "Normal" from the styles section.

Wait until MSWord has finished "repaginating". You can force this by scrolling to the end (ctrl+end) and then waiting for the last page, complete with page number, to appear.

In all versions, you can add a button to the formatting toolbar to clear formatting or to apply Normal style.

If the "Normal" style applied in the document is not your "usual Normal" you will still have to cut and paste the entire text into a new blank word doc. If you didn't write the original document with the copy of word on the computer in front of you, I advise you to do this anyway.

AT this stage I now have the body text formatted exactly as I want it for editing; Times 11pt, first line indent of 1.27cm, 6pt spaces between paragraphs, 1.5 line spacing, justified. Three stages remain:

H. put the italics back in.

this has to be done in two stages. first we find the marked blocks and italicise them.

find/replace (ctrl+h). In the options, check "use wildcards". In the find box type \[i\]*\[\/i\]. In the replace box type ^& and then click "formatting", select font, and select italic.
click replace all

find/replace (ctrl+h). In the options, uncheck "use wildcards". In the find box type [i]. leave the replace box blank, and BE SURE to click "clear formatting" while the cursor is in the replace box.
click replace all


find/replace (ctrl+h). In the options, uncheck "use wildcards". In the find box type [\i]. leave the replace box blank, and BE SURE to click "clear formatting" while the cursor is in the replace box.
click replace all

I. style the chapter headings.

Not all authors even put chapter headings. If there's nothing there but a manual page break, I will have numbered them before I got rid of the manual page breaks! In that case it's dead easy, I do a find replace for "paragraph-mark any-digits paragraph-mark" and apply my custom chapterheading style. In any case I usually have to do this manually. The chapter heading style has two critical elements: Outline Level 2 (so that it can be mapped for quick navigation and the table of contents can be inserted),  AND Page Break Before: this is found in the paragraph properties. In Word 2003 you go format/paragraph/line and page breaks, and check the box "page break before" when defining the style. I don't know how you do it in Word 2010.


J. finally I insert a table of contents at the top.

Paragraph Marks

MSWord uses the traditional symbol ¶ to indicate that a paragraph has been terminated. In fact, word concatenates a carriage return with a line feed to make a special character (the infamous CRLF). In order to use Word as it is intended, you need to know that this symbol, visible when you click the "Show/Hide ¶" button on the toolbar, exists to tell the wordprocessor that a paragraph has been ended. It also tells it that all style and format applied to the should be applied to the whole paragraph.

(In MSWord, styles can apply either to characters or to paragraphs. You will not normally need to know about styles that apply only to characters - unless something goes wrong!)

The is inserted whenever you press enter on your keyboard. So whenever you press enter, you are telling MSWord that you have just finished a paragraph. THIS IS WHY you should not press enter multiple times to create space.

Space between paragraphs is achieved through the "spacing before" and "spacing after" options in the paragraph properties (format/paragraph). This is exactly the same as the fact that tabs do not exist so that you can indent the first line of a paragraph. First line indents are achieved from the same paragraph properties dialog.

There are good reasons for this, the main one being that (as you will soon discover), it is very easy to change the layout and format if there are no unnecessary print controls - like multiple ¶s or manual page breaks.

What about laying out my title page?

Lots of people use multiple ¶s to space out the text on the title page. This is exactly the wrong way to space out the text. Text with big spaces between should be done using "text boxes". Text boxes are drawing objects which act like mini documents within a document, so they can be formatted individually and moved around as needed without affecting the rest of the document.




No comments: