Table of Contents
Introduction
Prerequisites
Methods of generating PDF from HTML
Native browser printing with CSS
Using Screenshots from DOM
Using SVG from DOM
JavaScript Libraries
Puppeteer and Headless Chrome
Pros and Cons of each method
Conclusion
Resources/ Further Reading
Introduction
“Why download a webpage as PDF?” you might ask. Think of these crucial scenarios web users face — saving for offline use, obtaining information despite poor internet connectivity, printing and storing hard copies of documents for later use, e.t.c — and you have your answer.
PDF is a short form for Portable Document Format. It is a file format that captures a printed document as an electronic image that is easy to navigate, view, print, and share with others. As such, various techniques have been designed to convert files to PDFs.
This article explains five different methods for converting HTML to PDF in React. It also covers the pros and cons of each approach and helps you identify the right choice for your projects.
Prerequisites
To follow this tutorial effectively, you need to know some basics of JavaScript and React. As such, it is also important you have the following on your computer.
Node JS installed
A Code Editor (VS Code, Sublime Text, Atom) installed.
Methods of Generating PDF from HTML
Native Browser Printing and CSS Print Rules
Browsers have the innate ability to save and print PDFs from pages. The command ctrl/ Cmd + P automatically generates a PDF of a website.
With this in mind, embed the window.print method in a button component to generate a web page PDF. Details are shown below.
export default function NativeBrowserPDF() {
return (
<button onClick = { () => window.print()}>
Download PDF
<button>
)
}
CSS print rules change the website’s appearance in PDF format. Print rules customize PDF using CSS and only apply to the page's print view (@media print).
@media print {
.native-browser-button {
display: none;
}
.texts div {
break-after: always;
}
}
From the above, the CSS hides the download button and inserts a page break after each div.
Pros:
It is easy to use
There is no need for external libraries
Users can select and search for text
Cons:
It is typically challenging to obtain the exact document layout in different browsers.
Limit to code-generated PDFs
Document content depends on the browser's window size
Using Screenshots from DOM
In generating a PDF from a DOM screenshot, you need to install two packages: html2canvas converts the DOM to a screenshot, and jsPDF changes the screenshot into a PDF. Here’s how to go about it:
HTML => Canvas => Image => PDF
//import the packages
import html2canvas from 'html2canvas'
import jsPdf from 'jspdf'
//Using html2canvas guide,convert DOM to screenshot
const exportPDF = () => {
const input = document.getElementById(‘App’)
html2canvas (input, {logging: true, letterRendering: 1, useCORS: true}).then (canvas => {
const imgWidth = 208;
const imgHeight = canvas.height * imgWidth / canvas.Width;
const imgData = canvas.toDataURL (‘img/png’);
//convert to pdf and set properties of the document with jsPDF
const pdf = new jsPDF(‘p’, ‘mm’ ,’a4’);
pdf.addImage(imgData, ‘PNG’ , 0 , 0 , imgWidth, imgHeight);
pdf.save(“ReactApp.pdf”);
})
}
Pros:
Bears lots of similarities with HTML
It can give code-generated PDF
It is easy to use
Cons:
User unable to select or search
External libraries/ packages are required
Document content dependent on the size of the browser
Using SVG from DOM
This method also incorporates DOM in addition to the two libraries above (html2canvas and jsPDF). However, there is a slight difference in the flow.
DOM => SVG => PNG => PDF
//Convert DOM to SVG using html2canvas
html2canvas (document.querySelector = ("#div-Id")).then(canvas =>
{document.body.appendChild(canvas)
});
//Convert SVG to png using vanilla JS
Const input = document.getElementById (‘App’);
html2canvas (input).then( (canvas) => const imgData =
canvas.toDataURL(‘image/png’);
});
//Convert png to pdf using jsPDF
pdf = newjsPDF () ;
pdf.addImage (imgData, ‘PNG’, 0, 0);
pdf.save(“download.pdf”);
});
The pros and cons of this method are the same as the above, only that generating PDFs from SVGs gives a much clearer output than from a screenshot image.
JavaScript Libraries
JavaScript Libraries generate PDFs for web pages, including jsPDF (as discussed earlier), PDFKit, and React-PDF. A major limitation of their use is that you must recreate page content.
However, this might be useful if you intend to create a PDF from scratch by obtaining content from a different source.
import { Page, Text, View, Document, StyleSheet } from'@react-pdf/renderer'
const styles = StyleSheet.create({
page: {
backgroundColor: '#E4E4E4"
},
section: {
margin: 10,
Padding: 10
}
});
const MyDocument = () => (
<Document>
<Page size="A4" style={styles.page}>
<View style={styles.section}>
<Text>Section</Text>
</View>
</Page>
</Document>
);
This code snippet is from the React-PDF library. It shows a separate HTML and stylesheet. You can embed the React-PDF component in a Link or button on the webpage to enable users to download a separate PDF file.
Pros:
Allows PDFs generated from code
Identical results in browsers of different sizes
Able to select and search text
Cons:
Time-consuming
Cannot copy content already existing on the HTML
Two different variations of the same design
Use of Puppeteer, Headless Chrome with Node JS
Puppeteer on headless chrome generates high-quality PDFs in React. Unlike other methods written on the client side, this is completed on the server side.
Puppeteer is a Node library that allows you to, through an API, control a headless Chrome. It enables you to perform tasks you already do on a manual browser, such as generating PDFs from websites.
A headless browser lacks a Graphical User Interface (GUI). Features such as Tabs, Input fields, buttons, etc., do not exist. One can only manually control a headless browser through an automated script and command-line interface.
const puppeteer = require( 'puppeteer')
async function puppeteerPdf () {
const browser = await puppeteer. launch({ headless: true });
const page = await browser.newPage();
await page.goto( 'https://…', {waitUntil: 'networkidle0'}); await
page.addStyleTag({ content: '#save-button { display: none}' })
const pdf = await page.pdf({ format: 'A4' });
await browser.close();
return pdf
}
The above shows Puppeteer launch in headless browser mode as headless: true. It directly navigates into the URL of the webpage, changes some styles, and then generates a pdf file. It is then sent to the client side.
In the Puppeteer documentation, setting waitUntil: 'networkidle0' means the navigating process is completed when there is no network connection for 500ms and above.
The code below shows how it is fetched on the client side, transformed into the blob, and saved.
function puppeteerPdf() {
return fetchPdfData(). then((pdfData) => {
const blob = new Blob( [pdfData], {type: 'application/pdf'})
const link = document.createElement ('a')
link.href = window.URL.createObjectURL(blob)
link.download = “fileName.pdf"
link.click()
})
}
It is essential to go through the API documentation as there are various ways to create PDFs using Puppeteer.
Pros:
It can create PDF straight out of text instead of using an image
It allows the selection and search of text on PDF
Higher resolution as opposed to other methods
Cons:
Requires running a server
Implementation sometimes is not straightforward
Conclusion
There are numerous choices on ways to go about generating PDFs from web pages. You must explore all options and stick with the one that works for you.
Please do well to practice with the codes and live page for further learning.