I have successfully used Google Drive and Insync to organize all of the e-books that I have acquired during the last years. Currently plan to upload them to a personal DokuWiki instance since I use it more every day. Before I can start, I need to extract cover images to ensure that I will get a decent outcome.

Requirements

Install ImageMagick package to perform PDF to image conversion.

$ sudo apt-get install imagemagick

Additionally, you can install Poppler utilities to get PDF details.

$ sudo apt-get install poppler-utils

Extract a single cover image

Use convert utility to convert the first page to an image.

$ convert Linux-Voice-Issue-016.pdf[0] Linux-Voice-Issue-016.png

You perform additional operations (like resize in this example) on this image during the conversion process.

$ convert Linux-Voice-Issue-016.pdf[0] -resize 200x300 Linux-Voice-Issue-016.png

Notice that from ImageMagick’s point of view, page numbers start from .

Extract multiple cover images

Use the following shell script to extract and store cover images from e-books found in sub-directories.

#!/bin/bash
# Create cover images from e-books in sub-directories
# This shell script is not recursive

# maximum width and height of the output image
maxsize="200x200"

for directory in */;do
  if [ -d "$directory" ]; then
    echo "Processing sub-directory: "${directory%%/}
    mkdir -p "${directory}covers"
    for ebook in "${directory}"*.pdf; do
      ebook="$(basename "$ebook")"
      if [ ! -f "${directory}covers/${ebook%%.pdf}.png" -a -f "${directory}${ebook}" ]; then
        echo "  Processing e-book: $ebook"
        convert "${directory}${ebook}"[0] -resize $maxsize "${directory}covers/${ebook%%.pdf}.png" 2>/dev/null
      fi
    done
  fi
done

The output will look similar to the following.

Processing sub-directory: BSDmag
  Processing e-book: BSD_2008_01.pdf
  Processing e-book: BSD_2008_02.pdf
[...]
Processing sub-directory: LinuxFormat
  Processing e-book: LXF134.complete.pdf
  Processing e-book: LXF135.book.pdf
[...]
Processing sub-directory: LinuxVoice
  Processing e-book: Linux-Voice-Issue-001.pdf
  Processing e-book: Linux-Voice-Issue-002.pdf
[...]

Simple shell script to generate wiki content

It is just an ugly snippet, but it will help you quickly build a PDF file list.

#!/bin/bash
# create DokuWiki content
# create list of PDF files in current directory

dir=$(basename $(pwd))

for pdf in *.pdf; do
cat << EOF
{{:bookshelf:$dir:covers:${pdf%%.pdf}.png?nolink |}}
**$(echo $pdf | sed s/.pdf// | sed "s/_/ /g"| sed "s/-/ /g")**\\\\
//$(pdfinfo $pdf | sed -ne "/Author:/ {s/^Author:\ *//;p}")//

{{:bookshelf:$dir:${pdf}|Download e-book}}
----

EOF
done

Sample output.

[...]
{{:bookshelf:pragprog:covers:the-viml-primer_p1_0.png?nolink |}}
**the viml primer p1 0**\\
//Benjamin Klein//

{{:bookshelf:pragprog:the-viml-primer_p1_0.pdf|Download e-book}}
----

{{:bookshelf:pragprog:covers:tmux_p3_0.png?nolink |}}
**tmux p3 0**\\
//Brian P. Hogan//

{{:bookshelf:pragprog:tmux_p3_0.pdf|Download e-book}}
----
[...]
Notice that DokuWiki does not like mixed case names – see Page Names documentation.

Additional information

The most effective way to get a number of pages from a PDF e-book is to use pdfinfo utility from mentioned earlier Poppler utilities package.

$ pdfinfo Linux-Voice-Issue-016.pdf | awk '/^Pages:/ { print $2 }'
116

You can use ImageMagick’s identify command to get the same information, but it is slower, as it extracts every page as an image.

$ identify -format "%n" Linux-Voice-Issue-016.pdf | head -1
116

You can analyze the first ten pages to print the one with the most colors using the following command.

$ identify -format "%s %k\n" Linux-Voice-Issue-016.pdf[0-10] | sort -nrk2 | awk 'NR==1 {print $1}'
3

This command can be handy if you need to search for a cover image.