Sunday, November 26, 2023

Discovering scala-cli while fixing my digital photo archive

Over the years I built up a nice digital photo library with my family. It is a messy process. Here are some of the things that can go wrong:

  • Digital cameras that add incompatible exif metadata.
  • Some files have exif tag CreateDate, others DateTimeOriginal.
  • Images shared via Whatsapp or Signal do not have an exif date tag at all.
  • Wrong rotation.
  • Fuzzy, yet memorable jpeg images wich take 15MB because of their resolution and high quality settings.
  • Badly supported ancient movie formats like 3gp and RIFF AVI.
  • Old movie formats that need 3 times more disk space than h.265.
  • Losing almost all your photos because you thought you could copy an Iphoto library using tar and cp (hint: you can’t). (This took a low-level harddisk scan and months of manual de-duplication work to recover the photos.)
  • Another low-level scan of an SD card to find accidentally deleted photos.
  • Date in image file name corresponds to import date, not creation date.
  • Weird file names that order the files differently than from creation date.
  • Images from 2015 are stored in the folder for 2009.
  • etc.

I wrote countless bash scripts to mold the collection into order. Unfortunately, to various success. However, now that I am ready to import the library into Immich (please, do sponsor them, they are building a very nice product!), I decided to start cleaning up everything.

So there I was, writing yet another bash script, struggling with parsing a date response from exiftool. And then I remembered the recent articles about scala-cli and decided to try it out.

The experience was amazing! Even without proper IDE support, I was able to crank out scripts that did more, more accurately and faster than I could ever have accomplished in bash.

Here are some of the things that I learned:

  • Take the time to learn os-lib.
  • If the scala code gets harder to write, open a proper IDE and use code completion. Then copy the code over to your .sc file.
  • One well placed .par (using scala-parallel-collections) can more than quadruple the performance of your script!
  • You will still spend a lot of time parsing the output from other programs (like exiftoool).
  • Scala-cli scripts run very well from Github actions as well.


Next time you open your editor to write a bash file, think again. Perhaps you should really write some scala instead.