Last month i asked the question "What is the Hardest Content to Classify?" and promised additional posts on the subject based on my background of 13 years developing taxonomy and indexing solutions for still images libraries, so I am continuing my thoughts in this post focusing on the basic attributes of image classification.
In my opinion, images are the hardest content items to classify, but luckily for sanities sake not all image classification is equally demanding.
The easiest elements of image classification relate to what I'm going to call image attributes metadata. This area, for me, covers all the metadata about the image files themselves, rather than information describing what is depicted in images and what images are about.
Metadata aspects in this area cover many things and there are also layers to consider:
1, The original object
-- This could a statue, an oil painting, a glass plate negative, a digital original, or a photographic print
2, The second generation images
-- The archive image taken of the original object, plus any further images, cut-down image files, screen sizes, thumbnails, images in different formats, Jpeg, Tiff etc
The first thing to think about is the need to create a fully and useful metadata scheme, capturing everything you need to know to support what you need to do. This may be to support archiving and/or search and retrieval.
Then look at what data you may already have or can obtain. Analyse data for accuracy and completeness and use whatever you can. Look to the new generation of digital cameras to obtain metadata from them. Ask image creators to create basic attribute data at the time of creation.
You'll be interested in the following metadata types:
- Scanner types
- Image processing activities
- Creator names
- Creator dates
- Last modified names
- Last modified dates
- Image sizes and formats
- Creator roles - photographers, artists, sculptures
- Locations of original objects
- Locations at which second generation images were created
- Unique image id numbers and batch numbers
- Secondary image codes that may come from various legacy systems
- Techniques used in the images - grain, blur etc
- Whether the images are part of a series and where they fit in that series
- The type of image - photographic print, glass plate negative, colour images, black and white images
This data really gives you a lot of background on the original and on the various second generation images created during production. Much of this data can either be obtained freely or cheaply, lots of it will be quick and easy to grab and enter into your systems. It should also be objective and easy to check.
My next post will cover dealing with depicted content in images. Please feel free to leave comments or questions on the subject.
Image|Flickr|Daniel Y. Go