Creating and Modifying PDF Files in Python

by David Amos May 25, 2020 intermediate python

The PDF, or Portable Document Format, is one of the most common formats for sharing documents over the Internet. PDFs can contain text, images, tables, forms, and rich media like videos and animations, all in a single file.

This abundance of content types can make working with PDFs difficult. There are a lot of different kinds of data to decode when opening a PDF file! Fortunately, the Python ecosystem has some great packages for reading, manipulating, and creating PDF files.

In this tutorial, you’ll learn how to:

  • Read text from a PDF
  • Split a PDF into multiple files
  • Concatenate and merge PDF files
  • Rotate and crop pages in a PDF file
  • Encrypt and decrypt PDF files with passwords
  • Create a PDF file from scratch

Along the way, you’ll have several opportunities to deepen your understanding by following along with the examples. You can download the materials used in the examples by clicking on the link below:

Extracting Text From a PDF

In this section, you’ll learn how to read a PDF file and extract the text using the PyPDF2 package. Before you can do that, though, you need to install it with pip:

$ python3 -m pip install PyPDF2

Verify the installation by running the following command in your terminal:

$ python3 -m pip show PyPDF2
Name: PyPDF2
Version: 1.26.0
Summary: PDF toolkit
Home-page: http://mstamy2.github.com/PyPDF2
Author: Mathieu Fenniak
Author-email: biziqe@mathieu.fenniak.net
License: UNKNOWN
Location: c:\\users\\david\\python38-32\\lib\\site-packages
Requires:
Required-by:

Pay particular attention to the version information. At the time of writing, the latest version of PyPDF2 was 1.26.0. If you have IDLE open, then you’ll need to restart it before you can use the PyPDF2 package.

Opening a PDF File

Let’s get started by opening a PDF and reading some information about it. You’ll use the Pride_and_Prejudice.pdf file located in the practice_files/ folder in the companion repository.

Open IDLE’s interactive window and import the PdfFileReader class from the PyPDF2 package:

::...
免责声明:
当前网页内容, 由 大妈 ZoomQuiet 使用工具: ScrapBook :: Firefox Extension 人工从互联网中收集并分享;
内容版权归原作者所有;
本人对内容的有效性/合法性不承担任何强制性责任.
若有不妥, 欢迎评注提醒:

或是邮件反馈可也:
askdama[AT]googlegroups.com


订阅 substack 体验古早写作:


点击注册~> 获得 100$ 体验券: DigitalOcean Referral Badge

关注公众号, 持续获得相关各种嗯哼:
zoomquiet


自怼圈/年度番新

DU22.4
关于 ~ DebugUself with DAMA ;-)
粤ICP备18025058号-1
公安备案号: 44049002000656 ...::