I have successfully used Google Drive and Insync to organize all of the e-books that I have acquired during the last years. Currently plan to upload them to a personal DokuWiki instance since I use it more every day. Before I can start, I need to extract cover images to ensure that I will get a decent outcome.
Requirements
Install ImageMagick package to perform PDF to image conversion.
$ sudo apt-get install imagemagick
Additionally, you can install Poppler utilities to get PDF details.
$ sudo apt-get install poppler-utils
Extract a single cover image
Use convert
utility to convert the first page to an image.
$ convert Linux-Voice-Issue-016.pdf[0] Linux-Voice-Issue-016.png
You perform additional operations (like resize in this example) on this image during the conversion process.
$ convert Linux-Voice-Issue-016.pdf[0] -resize 200x300 Linux-Voice-Issue-016.png
Notice that from ImageMagick’s point of view, page numbers start from .
Extract multiple cover images
Use the following shell script to extract and store cover images from e-books found in sub-directories.
#!/bin/bash # Create cover images from e-books in sub-directories # This shell script is not recursive # maximum width and height of the output image maxsize="200x200" for directory in */;do if [ -d "$directory" ]; then echo "Processing sub-directory: "${directory%%/} mkdir -p "${directory}covers" for ebook in "${directory}"*.pdf; do ebook="$(basename "$ebook")" if [ ! -f "${directory}covers/${ebook%%.pdf}.png" -a -f "${directory}${ebook}" ]; then echo " Processing e-book: $ebook" convert "${directory}${ebook}"[0] -resize $maxsize "${directory}covers/${ebook%%.pdf}.png" 2>/dev/null fi done fi done
The output will look similar to the following.
Processing sub-directory: BSDmag Processing e-book: BSD_2008_01.pdf Processing e-book: BSD_2008_02.pdf [...] Processing sub-directory: LinuxFormat Processing e-book: LXF134.complete.pdf Processing e-book: LXF135.book.pdf [...] Processing sub-directory: LinuxVoice Processing e-book: Linux-Voice-Issue-001.pdf Processing e-book: Linux-Voice-Issue-002.pdf [...]
Simple shell script to generate wiki content
It is just an ugly snippet, but it will help you quickly build a PDF file list.
#!/bin/bash # create DokuWiki content # create list of PDF files in current directory dir=$(basename $(pwd)) for pdf in *.pdf; do cat << EOF {{:bookshelf:$dir:covers:${pdf%%.pdf}.png?nolink |}} **$(echo $pdf | sed s/.pdf// | sed "s/_/ /g"| sed "s/-/ /g")**\\\\ //$(pdfinfo $pdf | sed -ne "/Author:/ {s/^Author:\ *//;p}")// {{:bookshelf:$dir:${pdf}|Download e-book}} ---- EOF done
Sample output.
[...] {{:bookshelf:pragprog:covers:the-viml-primer_p1_0.png?nolink |}} **the viml primer p1 0**\\ //Benjamin Klein// {{:bookshelf:pragprog:the-viml-primer_p1_0.pdf|Download e-book}} ---- {{:bookshelf:pragprog:covers:tmux_p3_0.png?nolink |}} **tmux p3 0**\\ //Brian P. Hogan// {{:bookshelf:pragprog:tmux_p3_0.pdf|Download e-book}} ---- [...]
Additional information
The most effective way to get a number of pages from a PDF e-book is to use pdfinfo
utility from mentioned earlier Poppler utilities package.
$ pdfinfo Linux-Voice-Issue-016.pdf | awk '/^Pages:/ { print $2 }' 116
You can use ImageMagick’s identify
command to get the same information, but it is slower, as it extracts every page as an image.
$ identify -format "%n" Linux-Voice-Issue-016.pdf | head -1 116
You can analyze the first ten pages to print the one with the most colors using the following command.
$ identify -format "%s %k\n" Linux-Voice-Issue-016.pdf[0-10] | sort -nrk2 | awk 'NR==1 {print $1}' 3
This command can be handy if you need to search for a cover image.