PDF Page Extract readme

Summary

PDF Page Extract is a lightweight utility to extract pages from PDFs. It is built around Ghostscript, and requires Ghostscript to be installed on the computer. The program provides a means to set the document title and author of the new PDF; these details are specifically not read from (and cannot be read from) the existing PDF. The assumption of this application is that the extracted pages require a replacement title, and that the source PDF is not going to have any useful metadata to retrieve regardless. This application was created to support processing of archived literature in relation to KBK, but as it may have some vague usefulness to others, it is made available as a free, open source utility. Please do not expect it to be significantly altered from its present design as it is nonetheless created for a very specific purpose.

Usage

PDF Page Extract offers a single window in which the source PDF, target folder for the extracted pages, choice of pages and output metadata can be specified. Following a successful page extraction, these parameters are committed to a settings file %APPDATA%\Telcontar\PDF Page Extract.ini such that they need not be re-entered for subsequent tasks.

The various fields in the window are as follows:

Source document: Choose a PDF from which pages are to extracted. Alternatively, drop a PDF file onto the application window to select it.
Output folder: Choose the folder into which the extracted PDF will be placed. Alternatively, drop a folder onto the application window to select it.
Pages: Specify which pages from the source document to extract. This can be any valid Ghostscript page subset, e.g. “5”, “2,4,6,8”, “100-120”, “10,12-15,18”.
Output title: Specify a document title for the output PDF. This must be in simple ASCII, as feeding Unicode metadata into Ghostscript is awkward. Unicode characters will be mangled. This title will also be used as the filename of the PDF, with suitable adjustments (e.g. double quote “"” becomes two single quotes “''” and other invalid characters become hyphens).
Author: The author of the source material. The program remembers the 20 most recent authors entered.

The Ghostscript location is detected automatically from the Registry, and the discovered Ghostscript version and binary name are displayed for reference. Either 64-bit or 32-bit Ghostscript is acceptable.

Possibly also worth noting is an undesirable behaviour of Windows, where any file that is deleted and immediately recreated retains the creation date of the deleted file. That is, if you found that you made a mistake during page extraction, delete the output PDF and extract it anew into the same location, the new file retains the created date of the deleted file. Allow a minute or so after deletion before the fresh extraction to ensure that the new file’s creation date is correct.

Changelog

1.0.2

2022-11-13

Page number ranges now support shortcut notation, e.g. 100-20 for 100-120.
The existing Subject and Keywords metadata entries are now cleared.

1.0.2 dates from 2021-12-19 but was not released for nearly a year.

1.0.1

2021-07-11

Before the program was released, the recent authors list was extended from 10 to 20, but this didn’t work, and is now fixed

1.0

Initial version, 2021-04-13.

Development

PDF Page Extract is an open source application programmed in AutoHotkey. It requires the revised ahk2exe for build directives. The installer is built using Inno Setup. All graphics were created using Inkscape.

A copy of all the development files is available for download.

Licence

PDF Page Extract is copyright 2020–2021 Daniel Beardsmore. The PDF Page Extract application (“program”) and source code are made available under the zlib license, the terms of which are included in a separate document. The program may be used without restriction free of charge.

Contents