Export HTML content from Google Docs to your website or web application, keys

Video thumbnail

I wanted to show you how to export any Google Docs document to an HTML page and then use that content, for example, in a web viewer, like I do with my books on the platform.

1. The Problem with HTML Generated by Google Docs

When we use the Google Docs download option (Web page (.html, compressed)), an HTML file is generated with a serious drawback:

  • Inline Styles: The document is a massive HTML page that includes a large number of classes and inline style attributes. For example, it defines font-style, font-family, and constantly uses inline block.
  • Style Conflict: We cannot inherit these predefined styles on our website because they would break the design we already have defined. The goal is for the styling to be provided entirely by our application.
order: 0.00px solid #000000; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px); width: 566.93px; height: 316.85px;"><img alt="" src="images/image83.png" style="width: 566.93px; height: 316.85px; margin-left: -0.00px; margin-top: -0.00px; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px);" title=""></span></p><p class="c14 c12"><span class="c0"></span></p><p 

2. ️ The Solution: Code Sanitation with JavaScript

To solve this, we must clean the code before registering it on the server. I use a CKEditor to manage the book content (separated by chapters) and the following process to sanitize it:

A. The Need to Modify the HTML Outside the Editor

We cannot modify the content directly in the CKEditor using JavaScript, as the editor detects the change and overwrites it, annihilating the cleanup.

Replicating the Content: Before submitting the form, I capture the CKEditor content (editor.getData()) and replicate it in a hidden <div>, which is what I will modify:

document.querySelector("#htmlCkeditor").innerHTML = editor.getData()

So, before registering the content, which is what I'm doing here before submitting the form, I read the content here. In this case, it is important that you replicate the editor's content in another HTML block, which is what I'm doing since you cannot edit it directly, or at least.

Once in the hidden div, I use JavaScript's querySelectorAll to select all HTML elements that could have been generated by Google Docs (paragraphs, lists, images, headings, etc.) and remove the unwanted attributes from them:

document.querySelectorAll("#htmlCkeditor p, #htmlCkeditor ul, #htmlCkeditor li, #htmlCkeditor img, #htmlCkeditor h1, #htmlCkeditor h2, #htmlCkeditor h3, #htmlCkeditor h4, #htmlCkeditor h5, #htmlCkeditor h6, #htmlCkeditor span, #htmlCkeditor a"
                               )
                           .forEach(e => {
                               e.removeAttribute("class")
                               e.removeAttribute("style")
                           });

This code completely removes all the class and style attributes that Google Docs inserted, leaving only the semantic structure.

3. ✅ Final Result

With the content completely clean (sanitized), it can now be sent to the server (in my case, using Livewire). I also take advantage of this phase to remove unwanted blocks generated by the CKEditor itself (such as <br> line breaks).

This way, you can migrate your Google Docs documents and apply your own CSS style (for paragraphs, headings, or code blocks) without conflicts.

I agree to receive announcements of interest about this Blog.

I will tell you about the key points you need to keep in mind when you want to export a book or document in Google Docs to a web application or page.

| 👤 Andrés Cruz

🇪🇸 En español