Python批量转换HTML为PDF
wkhtmltopdf
简介
wkhtmltopdf and wkhtmltoimage are open source (LGPLv3) command line tools to render HTML into PDF and various image formats using the Qt WebKit rendering engine. These run entirely “headless” and do not require a display or display service.
wkhtmltopdf 和 wkhtmltoimage是一个开元的命令行工具,用来转换html为pdf和各种图像格式。
安装
下载地址:https://wkhtmltopdf.org/downloads.html
mac的话可以直接安装了,其他系统就看着办吧。
1 | brew install Caskroom/cask/wkhtmltopdf |
使用方式
- Download a precompiled binary or build from source
- Create your HTML document that you want to turn into a PDF (or image)
- Run your HTML document through the tool.
- For example, if I really like the treatment Google has done to their logo today and want to capture it forever as a PDF:下载安装-》创建HTML文件-》命令行执行
1
wkhtmltopdf http://google.com google.pdf
Pdfkit
A JavaScript PDF generation library for Node and the browser.
简介
PDFKit is a PDF document generation library for Node and the browser that makes creating complex, multi-page, printable documents easy. It’s written in CoffeeScript, but you can choose to use the API in plain ‘ol JavaScript if you like. The API embraces chainability, and includes both low level functions as well as abstractions for higher level functionality. The PDFKit API is designed to be simple, so generating complex documents is often as simple as a few function calls.
pdfkit 是 wkhtmltopdf 的Python封装包。
安装
1 | npm install pdfkit |
支持模块
支持以下方式:
- URL
- 文件
- 字符串
1 | pdfkit.from_url('https://www.google.com.hk','out1.pdf') |
代码示例
1 | #!usr/bin/env python |
问题总结
'ascii' codec can't decode byte 0xb4 in position 11: ordinal not in range(128)
- 解决:加上encoding,
with open(src + filename, encoding="utf-8")
- 解决:加上encoding,
- 注意文件数量,否则数量太大而且没设置线程数的话机器会卡死
- 解决:使用
threading.Semaphore(10)
- 解决:使用