Processing scanned images

In 1911 tourist Maud Moreland wrote a travelogue, Through South Westland, lavishly illustrated with photographs. As part of converting this to an ebook, I needed to download some of these scanned images and clean them up, ready to upload to Wikimedia Commons. Here’s my workflow; I’m sure there are better ways of doing this, if you’re a whiz at image editing, but this works for me.

When working with a book that’s been scanned by Google Books or the Internet Archive, there will be a folder among all the download options containing all the page scans, including all the photos, in JPEG-2000 format (.jp2). I generally download the entire folder as a ZIP file and pull out the pages I need. Remember to save a copy of the originals.

I want to use a more meaningful filename schema:
Through South Westland (1916) · Moreland · 344.jpg
so the only part of the filename that changes is the page number, which makes the name less descriptive but the images much easier to place images into Wikisource.

On the Mac I select all the files and use File > Rename… to replace the Internet Archive text with my filename, removing the first zero from the page number. We’ll change the files from .jp2 to .jpg when we export them.

I use Affinity Photo to edit images. I could do a fair bit just in the Preview app that comes with my Mac, but Affinity Photo is a bit more powerful,m while costing nowhere near as much as Photoshop.

Make the image fill the screen (Command-0). Rotate it 90 degrees if needed, and crop it down to the photo edge, rotating it manually a little if it’s not straight. Crop out all the text: photo captions are added in Wikisource.

Google likes to add a sepia background to its scans, but this makes the images into RGB files with three 8-bit colour channels. By converting from RGB to greyscale, we cut the file size by two thirds and lose no actual information. You can do this at Document > Convert Format > Grey/8 (which converts the colours to 256 shades of grey, fine for our purposes).

Some of the scans are quite murky, so I adjust the levels in the Adjustment palette > Levels > (Default). You can see in this image doesn’t occupy the full range of tones available, so drag in the black and white points to the edge of the graph to make the darkest pixels black and the lightest ones white. Then move the gamma slider left to brighten the photo a little by bringing up the shadows. Merge the results.

Apply some sharpening with Unsharp Mask, if there are enough hard edges in the photo that might benefit from it. Affinity Photo lets you set a Before/After slider, so you can adjust the intensity and see a preview.

File > Export it as a JPEG. Choosing a High Quality setting won’t harm the image, and more than halves the file size (the full-quality Affinity file is 1.88 MB). Save it into a new folder specifically for fixed images, and leave the original file unchanged.

There are ways of automating most of this workflow: you can run batch jobs in Affinity Photo to convert photos to JPEGs, and record a macro that converts an image to black and white and runs a bit of Unsharp Mask. You’ll still need to crop and tweak your photos by hand though – no substitute for that.

I hope seeing this basic image processing workflow has been helpful, and you picked up something useful. At some point I may cover the next step: a bulk upload to Wikimedia Commons.