If all goes well, this will be part one of however-many-it-takes posts on tricks for easy ebook formatting. These are not the be-all or end-all, but they’re the various methods I’ve developed to format books. I’ll be doing explanations of each part as I have time, and once I’ve explained all the tricks, I’ll pull them together into a cohesive step-by-step guide. These posts all assume a basic knowledge of HTML and CSS.
*Note — as some have pointed out in comments, the Mac version of Word is often a different beast. *
How to Convert A Manuscript to HTML with Search & Replace
Software you need to do this:
- Microsoft Word
- Windows Notepad (or a non-Windows version of a basic text editor)
Note: this is not a lesson on creating an entire ebook. This is just how to turn your manuscript into HTML. The next lesson will be how to turn that HTML into an epub.
I’m not really exaggerating when I say I format an entire manuscript via Search & Replace. When we’ve finished edits on a book, I open the file, copy/paste, run a Word macro I’ve created & end up with the converted HTML of the book itself. It takes me about 30 seconds. This does not involve any frontmatter or backmatter (which are very important and their own separate posts) but at the end of the day, you’re not really going anywhere until you have a book.
There are many good reasons to rely on cleanly converted HTML code instead of the code generated by a program like calibre or Word itself. The two biggest:
- Possible Display Errors: the more complicated your code, the more likely it will break somewhere down the line. People read on a huge array of ereaders, phones, tablets, computers and other devices. Every format, platform, operating system and gadget reads and displays code in a slightly different way. The more work you make them do, the more likely they’re going to mess up.
- Delivery Fees: complicated code is bulky code. If you’re self-publishing on Amazon via KDP, the 70% royalty option is really 70% – delivery fees. You can find the list of delivery fees on Amazon, but as of the time of this posting they are:
- Amazon.com: US $0.15/MB
- India on Amazon.com: INR ₹7/M
- Amazon CA: CAD $0.15/MB
- Brazil: BRL R$0.30/MB
- Amazon.co.uk: UK £0.10/MB
- Amazon.de: €0,12/MB
- Amazon.fr: €0,12/MB
- Amazon.es: €0,12/MB
- Amazon.it: €0,12/MB
- Amazon.co.jp: JPY1/MB for files less than 10MB, no charge for files equal to or over 10MB
- Amazon.com.mx: MXN $1/MB
- Amazon.com.au: AUD $0.15/MB
It may seem like 15 cents isn’t a lot, but if you sell 10,000 books, that 15 cents is $1,500. Books with a lot of fancy images (like a different header image for every chapter) can quickly scale up to 30 or 50 cents per book.
So, you’ve decided you want to turn your manuscript into HTML by yourself. There are several different considerations to tackle, and I’m going to handle them one by one.
CHAPTER HEADERS
I format books in Sigil. Sigil will automatically generate a table of contents and .NCX file. (The navigation file that tells ereaders, computers & gadgets where everything in your ebook is.) To decide where your headers are, Sigil looks for header tags:
<h1>Chapter One</h1>
So how does Search & Replace help with this? When I’m going through my manuscript one last time to prep it for formatting, I make sure all of my headers look like this:
[Chapter One]
It’s an easy thing to do while you’re writing, or even after you’re finished. Once they’re in place, though, you can use Search & Replace to put anything you want around your chapter headers.
As later lessons will show, I use this for more than just basic header tags. It allows me to tweak my header class, insert page breaks before my new chapters or images after. The first step of my macro puts a sigil split marker before the header (telling sigil it can break this chapter into its own section) as well as putting the header tags around my title.
If you have subheaders (for example, a chapter header and a smaller one indicating POV) you can use different symbols. In some instances, I’ve used [ ] wherever I want to replace something with <h1> and { } wherever I want to replace it with <h2>.
Note: obviously, if you replace [ with <h1> or { with <h2>, you need to do the opposite as well. ] with </h1> and } with </h2>.
SCENE BREAKS
We like to use images for scene breaks. It’s something I like because it’s pretty, it doesn’t take up a ton of space, and it doesn’t often backfire. Some methods of adding extra blank space can be stripped out by less savvy reading apps and devices. There are certainly many ways around this, but if you’re not super savvy at CSS and HTML, a scene break image isn’t a terrible way to go.
While writing our books, we indicate scene breaks with a simple:
#
When formatting, I search for # and replace it with the code for our dividing image. In Sigil, you can add an image to your epub file and get the relative path to it easily. If you’re simply creating an HTML file to test your formatting, make sure your image is in the same folder where you save your .html file and refer to it simply as “image.gif” (or .jpg or…)
ITALICS
Now we’re getting into something fun.
The key to this trick is knowing that you can put ^& into a REPLACE box as a placeholder for anything was found during the search portion of Search & Replace. The second key is knowing that you can put your cursor into the FIND box and hit CTRL+I and it will search for italics.
So the steps to search & replace italics are:
- Open the Search & Replace box.
- Put the cursor in the Find What box and hit CTRL+I. This should make Font: Italic show up beneath the box.
- Put <em>^&</em> (or <i>^&</i> or your preferred method of emphasizing text) in the Replace With box.
- Replace!
IMPORTANT NOTE: this is the sort of situation where a computer not being smarter than a human can come back to bite you if you’re not careful. We’ve all written something in italics, hit return, started typing the next paragraph and realized italics was still on. You can turn it off and keep going, but if you do that–and I can’t stress this enough–Word thinks the italics ends at the start of the next paragraph.
This results in situations like this:
She shook her head. <em>I guess I’ll just have to do this the hard way.
</em>The software silently agreed.
This situation is not terrible, unless you drop your code into something like sigil and let it “fix” your code on its own. It will probably look at that stray </em> tag all on its own and decide you didn’t mean for there to be italics there at all, or-worst case scenario–it could decide you wanted everything that follows to be in italics because it just ate the stray </em> and can’t remember that it was ever there. Those italics could cover entire chapters until it finds another </em> and cuts it out.
(Computers are only as smart as we tell them to be.)
The other problem is multiple lines of italics, which end up looking like this:
<em>This isn’t the greatest example in the world.
This is just a tribute.</em>
The former problem can be detected by searching for ^p</em> and replacing it with </em>^p which takes away the paragraph break before the tag and puts it afterwards. (^p is the next lesson.) The latter…well, you’ll just have to be smart. (And, if you use sigil, manually fix the errors!)
I know that the WARNING, DANGER section of this part is longer than the trick, but by paying attention to the ways these tricks can fail, you’re a lot more likely to survive using them.
PARAGRAPH TAGS
This is, arguably, one of the most basic steps, but I put it after italics for a reason–if you have funny italics, they’re a lot easier to fix before you start messing with your paragraphs.
The two most important codes you need to know for this step are ^p and ^l. Those are the codes for a paragraph and a line break, respectively. Hopefully most of you know which one you use when you’re writing. If you don’t, the quickest way to find out is to turn on the formatting marks in word. That will make your document look like this:
Most manuscripts I’ve seen fall into one of four categories:
- First line indented manually, with paragraph breaks.
- First line indented automatically, with paragraph breaks.
- No indents, two line breaks between paragraphs.
- No indents, two paragraph breaks between paragraphs.
Undoubtedly, there are a million more variations. Whatever you have separating your paragraphs, the general idea with this trick is that you want to replace that with </p>^l^l<p>. And no, I did not put those out of order. Consider the following:
This is paragraph one.^p
This is paragraph two. ^p
This is paragraph three.^p
Replacing ^p with </p>^l^l<p> gives us
This is paragraph one. </p>^l
^l
<p>This is paragraph two.</p>^l
^l
<p>This is paragraph three.</p>^l
^l
<p>
This effectively puts </p> at the end of every paragraph, inserts two line breaks, and puts <p> at the start of your next paragraph. This will work throughout your entire manuscript. BUT! It will not put the first <p> before your first paragraph, and it will leave an extra one at the end. So you’ll have to fix the start & end…but everything in the middle should work just fine.
Sometimes it takes a little finessing to figure out what the exact magic substitution is for YOUR manuscript, but you can change just about anything. Some important search/replace codes to know for this section:
- Paragraph: ^p
- Line Break: ^l
- Tab: ^t
- Manual Page Break: ^m
You can find a huge selection of codes here.
SPECIAL CHARACTERS
This is apparently a controversial topic. I googled to see if anyone had a definitive guide to best practices for ebook formatting and my browser blew up with drama.
Suffice to say–not all devices can necessarily recognize the fancy smart-quotes and beautiful em-dashes that Word kindly auto-formats for us. And the less standard your character, the more likely you are to have a problem. (Says the woman who somehow uses façade once per book.)
There are ways to get around this. You can remove them completely and replace them with the low tech versions–straight quotes and double dashes–but that won’t help you with your tildes and your umlauts. The best way to make sure everything displays properly is to replace your special characters with the HTML entity. Some very common ones:
- Left/Right Double Quotes: “ (ld = left double, quo = quote) ” (rd = right double, etc..)
- Left/Right Single Quotes: ‘ (ls = left single) ’ (…etc)
- Em-dash & Ellipsis: — and …
When searching & replacing, it can be easiest to find the special character somewhere (like the link above) and copy it into the FIND box. Then type the HTML entity into the REPLACE WITH box.
Honestly, search/replacing every possible code every time would be prohibitive. And this step is where macros truly become vital. If you sit down once with a list of every special character you might possibly use, you can use Word’s very simple ability to record a macro. What does that mean, practically speaking? Once you’ve got the macro recording, it will keep track of every single thing you run a Search/Replace for. (Even if that thing doesn’t exist in the manuscript.) Then, the next time you sit down to replace special characters, you can run the macro and it will repeat everything you just did.
I have macros for all of these steps, and one macro I can run that runs all of the macros. It took me a little time to finesse all of these steps to work exactly the way I wanted them, but in the end? I can turn out clean HTML in a matter of seconds.
CLEANING UP
There are always some loose ends hanging around. Maybe I have places where I had extra paragraph breaks, so I have <p></p> scattered throughout my manuscript. Well, that’s easy to fix. I just want those to go away, so I can run a Search & Replace to find <p></p> and replace it with nothing. (Yes! That’s an option!)
Well, okay, but if I replace <p></p> with nothing, I might have a ton of line breaks all in a row. All that blank space won’t hurt my HTML, but it’s not very pretty. If I want to decrease my extra line breaks, I can run a search for ^l^l^l and replace it with ^l^l. If there are still extra spaces, I can do it again until I can’t find any groups of three line breaks.
(Do not replace ^l^l with ^l unless you want NO space between your paragraphs.)
Very last of all, I copy everything I’ve just generated and paste it into a very basic text editor. Notepad is a good one. Really, you don’t want anything more complicated. Saving it in plain text makes sure we’ve ditched any odd Word formatting that might be clinging to it–and trust me. Nothing is more likely to go bad on you than Word formatting.
At this stage, I usually put <html> at the top and </html> at the end, save it as testbook.html and open it up in a browser to see how things are looking. You’ll be able to tell straight off if your chapter headers are actual headers, if your italics look okay–and if not, where they broke–and if all of your special characters are as special as they should be. (And if your scene divider image is in the right directory, you should see it, too!)
There are many things I do once the book is in sigil to make this book more pretty. But this very first step is about making the code pretty, and if you’ve gotten this far and made it out alive, your code is probably looking pretty good.
THE BIGGEST SECRET IS
That you can automate just about anything if you think about a way to let the computer know where it is. Word can search for more than italics. It can search for different font faces, different font sizes… if you have different character POVs and want them to display slightly differently, you could use different fonts in your manuscript and find a way to make that work.
The only limit to Search & Replace is how creative you are when it comes to bossing computers around.
QUESTIONS?
Feel free to ask them. I will do my best to answer, or point you to someone who can. 🙂
NEXT UP…
Putting this manuscript into SIGIL! (I love it.)
This is really cool, Bree!! I love it.
One thing: Word won’t search and replace for italics on a mac – only on a pc. Which is totally weird.. I usually send my file to a pal with those instructions. I’m going to add your warning search replace so my pal can get rid of word thinking italics ends at the start of a new para – usually I pick those out by validating and finding errors, and it takes forever.
I’m excited for part two, your Sigil tips.
You know, I have Apple Scripts that automate most of this. Remind me and I will link them.
I did not know that about the Mac version, but an apple script would be awesome! I’m trying to figure out if there’s an easy way to explain how to import macros, as I have mine available, but of course the method seems to change with every version of Word.
I mention that because you can’t record macros on the Mac version of Word.
I never knew what macros were, but I guess it doesn’t matter with me on my mac! But it’s okay, because what I do is I use textmate to do the quotes, ampersands, em-dashes and all that (following the teaching of Guido Henkel) which handles it in a click within the text editor. Now I’m checking ALL THE BOXES so I don’t miss your Sigil bit.
This is so awesome, Bree! I’ve read it once and feel a bit overwhelmed, so I’ll reread it until I decide if I’m going to tackle learning formatting on my own, rather than hire someone. (I also have a Mac, so there’s that.) Can’t wait to read the next segment!
Bree, don’t you want to do an online skype class or something? I’ll pay or donate or send brownies. Serioulsy, this is so helpful. Thank you!!!!
Oooh, nice! I have my macros in both PC and Mac since I work on both. I don’t record on my Mac, though. I input the code directly instead.
I wrote a blog post a while back giving the code for a macro that would highlight naughty words (just, back, really, passive verbs, etc). I kind of want to see your actual macro code – I bet Mac users could create macros by duplicating the code directly instead of deal
I would love to add a link to your blog post! And I considered posting my macro, but I was dithering over whether or not I should try to find a helpful way to explain how to import it for people who aren’t comfortable when the VB window pops up. It seems like there SHOULD be an easier way to import…but it’s Microsoft, so maybe not? LOL
But I’d be happy to share it with you!
Oh yes! I would love to see the actual code to figure out the build. IF you don’t want to share so openly, feel free to email me at elliston.kl at gmail. And this is my post about macros with instructions and the code for highlighting a list of words. http://www.kristinleighelliston.com/?p=403
Thanks so much for sharing what you know!
Stupid phone…instead of dealing with recording. End of post for real this time. 😉
This is great, Bree! I’m bookmarking it and will be coming here a lot as I learn to format e-books.
Thanks!