Convert PDF to images
In order to test a Node.js application that generates some PDFs I searched on NPM and found out two libraries:
Among the first markers I use before including a library both did not have too many dependencies, but they also had few weekly downloads. The more annoying part was that they required extra software to be installed: ImageMagick/GraphicsMagick, Ghostscript and Poppler. Another thing that disturbed me was that most API methods mentionned page number while I wanted to convert full PDF, not a blocking point but a sign my use case was not their main use case.
Starting to have a bad feeling, I looked into how they were implemented and they worked.
In the end both libraries are about glueing strings together and starting ImageMagick/GraphicsMagick processed to perform the conversion.
A lot was also about getting the number of pages (that I don’t need) by executing pdfinfo
(part of Poppler) and parsing its output with regular expressions.
And after some searched, I realized that ImageMagick converting PDFs was in turn using Ghostscript.
That sounded a lot of complexity, having to learn the API of a library not widely used (goodbye Google/StackOverflow for help), having to check for library updates, … In the end that seemed both simpler and giving more control to implement the thing myself.
I went with Ghostscript as the only dependency (instead of 2 or 3).
apt-get install ghostscript
And very few lines of code giving full control on the convertion process with a widely documented tool:
import childProcess from 'child_process';
import util from 'util';
const exec = util.promisify(childProcess.exec);
export async function pdfToImages(pdfPath, imagesDirectory) {
await exec(`gs -sDEVICE=pngalpha -r144 -o ${imagesDirectory}/page-%03d.png ${pdfPath}`);
}
(I did not check input validity as this is only used for unit tests, no user supplied content)
I have no particular hate towards libraries, they can be great to wrap/abstract complex logic into something easy to use. But the chore they create to follow updates (and hope they will update), their additional weight (bring more than needed) and the overhead (learning a new API) sometimes make just doing it better.