Thursday, 17 April 2008

Swimming in metadata

Shooting from the bunker, Northumberland, March 2008

In a comment to my earlier post, Les Richardson commented:
However, you are now moving into a digital world... how are you going to integrate your film archive along with digital capture? Are you going to add metadata (even simple stuff) about your film collection? Do you have a unique number (or key) to identify each image? ... How are you going to avoid being drowned in data?
I felt these questions warranted their own response. Slightly delayed due to effort of editing a backlog and actually having to go to the office for a couple of days.

Integrating the film archive with the digital is a tricky business. I've got a numbering system for my electronic files which specifically distinguishes between the two, in that i note the roll & type of film used. However, the way the files stack up is the same. The tricky part is the film archive also has a physical counterpart, which digital does not. Maybe one day we'll all be converting our digital files to transparencies.

I'm not adding metadata for two reasons. One is I'm lazy and that seems like effort. The other is that all I'm really interested in is the what and when and that is all wrapped up in the filename I use. Plus, there is so much I've forgotten about the cluttered mass that is my film collection that metadata would be a pointless task.

If you hover over any of my web images you'll see my numbering system buried in the filename. I've had to tweak it slightly recently but essentially each frame is uniquely identified which makes it easy to search and cross-reference. Plus, by archiving my final work by shooting location I can easily track the where to the when and all related images. That returns maximum data usefulness for minimum effort, at least for me. My biggest headache is if something comes along that requires a change in naming convention: do I continue from that point or go back and rename the whole lot? I've not quite nailed that one yet.

As someone who deals with masses of data on a daily basis, I'm sure I could deal with more but I have also learnt the value of boiling down data collection to the bare essentials.


  1. Reading over your entry about your methodology, it sounds very logical and will certainly continue to work in the future. I assume that all derivative files will also be in a subfolder of the location shot with a similar name to the original with an additional descriptor.

    The only issue, IMO, is the one of 'reusability'. When we first started using computers at schools (I'm a teacher), we used them for notes and tests/assignments. We would then save this work (and it was a lot of extra work at the beginning), but the next (and following years) would pay huge dividends.

    In terms of photography, if there we was some work done at the time of capture and entry to add some more information about the files (some bulk, some individual), then there would be a richer source for searching and reporting.

    For example, what about all pictures (across all locations) that have something in common. How would you quickly search for that?

    On the other hand, one could always say, "We don't need no stinking searches!" (grin).

    It all depends on the usage required for the images (or their deemed utility value.... or their possible monetarized value)

    My 2 cents worth.

  2. Good points, all.

    Your correct about my derivative files - sub-folders & descriptor tags.

    I'm still struggling with the linked searching business. I do use a type marker in the filename, though, which enables me to track across about 10 categories.

    The one thing I'm trying to avoid is reliance on any particular databasing software (e.g. Bridge or Lightroom). Once I get a handle on any further sub-categorization and a ready way to reference it all, it should be a relatively simple job to batch-tag my archive.

  3. So then there's the issue where you want to add a new category for searching... rewrite every file to add the attribute? Obviously at some point you're going to have to move to an external data store to store additional attributes for searching.

    Then the question becomes one of what software. My solution would be to use a standard SQL database to store this data. This would be either MySQL or PostgreSQL. Given the changes in Mysql recently, it might be time to start looking at PostgreSQL. Then we can start talking about applications that read and write to this database engine.

    What would be your basic needs for this kind of application if we were going to make it 'Dead Simple' for dummies?

  4. Well, all the searching etc is quite easy. I want to cross reference the various files derived from a single source (i.e. the RAW through to the various end products), which can be done via the filenames. Would also be useful to reference to similar metadata in the RAW (e.g. the lens used). I'd also want to add sub-division tags to groups of images and be able to link/search to similar tags from a given image.

    I've actually been working with a similar system in work recently (for cross-referencing engineering documents).

    Thus, from a single image, I could find the related images either from their origin, the shoot, the (sub)category. Apart from subcategories, the data is already in the files or their names anyway.

  5. easy.. i continue to use my numbering system from the days of film for digital.
    two digits year, four running digits for the film, two (or three if needed) more digits for exposures. my first film ever has a number like that attached.
    in digital its nice to be able to break up the "film" into locations, motives, days, etc. whatever suits.
    metadata for film was and still is camera, date, location, people. on special occasions i took technical notes, now the cameras mostly do that for me.
    ah, derived files get an underscore and a name attached.

    yep, thats it.

  6. grubernd - that's a simplified version of my naming system but I want to be able to track derivatives across multiple folders/logical/physical drives and cross reference.


