categorization
Passionate Geographers
Anonymous — August 10, 2009 - 3:21am
I noticed a very interesting initiative recently Project Geograph: Photograph Every Grid Square.
This project is working towards collecting and making available images depicting the geography of every square kilometre of the British Isles. This ambitious project seems to be progressing very well, with many good quality images loaded to the website.
Already over 8,900 contributors have submitted nearly 1,500,000 images, with an average of 5 images associated to each geographic square across England, Wales, Scotland and Ireland. This is a great resource, preserving in amazing detail what the British Isles looked like at the start of the 21st Century. This is also a wonderful way to learn about the geography of these amazing islands and to dig deeply into their hills, valleys, towns and villages. This is also a superb source for genealogists looking at how a particular part of the British Isles looks today.
Back in 2007 I attended the Blogs and Social Media Conference 2.0 in London. One presentation which has stayed in my mind since then, was Lee Bryant's, "Engaging with Passionates". In his exceptional presentation Lee described a ground-breaking social networking case study and talked about the energy that can be released when organisations successfully tap into a group of people who are truly passionate about a given topic.
I think you'd be hard pressed to find a better example of the power of passionates than the Geograph Project. Looking at the number of contributors, the amount of the British Isles covered, and the quality of the photography and metadata created, makes a clear point - find people who are passionate about a topic, people who are committed to a hobby or interest, engage them in the right way and they will deliver time and again.
I wish everyone associated with the Geograph Project all the luck in the world, may they stay passionate and committed to what they do, and may their project benefit from their commitment.
Oh, and if you like what you see, submit a photograph, or start a similar initiative.
Ian
Classifying Images Part 3: Depicted Content
Anonymous — June 2, 2009 - 4:44am
Welcome back to my occasional image classification series.
The last time I raised the topic of image classification I discussed the basic attributes of images. This time I want to focus on the thornier issue of the content, or concepts, depicted in them.
There is a danger of treating an image like a piece of text and classifying its attributes: Who created it? When? What techniques were used? Then writing a title or caption and leaving it at that. Sometimes little more need be done to a document than record this kind of information, especially with free text searching, but lots more needs to be done to most images.
Image findability
Image findability is the process of using search and browse to access the images required. A major aspect of image findability relates to the things depicted in them. Image users often search for images based on the generic things in them and also the proper names of these things. Classifying images based on depicted content means considering anything and everything that is and can be depicted in an image. When considering this I like to focus my efforts on understanding the images I'm dealing with, the users who are trying to find and work with the images, and the ways in which these people need to search and browse for the images they need. After an assessment of these areas I then tailor my approach.
Broadly speaking people searching for depicted content are looking for a number of types:
- Places: cities, towns, villages, streets...
- Built works: parks, skyscrapers, cottages, walls, doors, windows...
- Topography: mountains, valleys...
- Groups and organisations: air forces, choirs, police departments...
- People: roles, occupations, ethnicity and nationality: mothers, doctors, Caucasians, French, Germans...
- Actions, activities and events: running, writing, laughing, smiling, birthdays, parties, book signings, meetings...
- Objects: a myriad of items...
- Animals and plants: common and scientific names...
- Anatomy and attributes of people, animals and plants: arms, legs, adults, leaves, trunks, paws, tails...
- Depicted text shown in images - often signs or writing shown in images...
Many of these generic types can also have proper named instances:
- Proper names of people, places, buildings, topography, organisations, animals etc
When dealing with depicted content I've found some of the biggest issues to be:
- Identification - knowing what is in an image
- Focus and specificity - knowing what to include and what to exclude
- Consistency - applying the same term in the same way for the same depicted content
Identification - knowing what is in an image
Depicted content is a relatively black and white area - a dog is depicted so a dog is tagged. However, it might sound a little weird, but working out what is actually in an image can be a lot harder than you think.
Take a look at the image "Do You Know What This Is?" by Sister72 
This depicted content is fairly simple to see, but understanding what you're looking at is not that easy. Even if you know roughly what you're looking at, do you know what it's actually called?
One tip is to group similar images together when you're classifying them. Also, always start by assembling as much information as possible before you begin to classify images. It is especially important to gather together the information you have from the creator or custodians of the images.
Also important, when you have the luxury, is to get the image creator to add key metadata about the image at the point of creation, or soon after.
Focus and specificity
Knowing what to include and what to exclude, what to mention and what to ignore, is also much harder than it sounds.
Firstly, some image users will want a piece of depicted content tagged whenever it appears in an image, others will only want it tagged when the image shows a very good representation of that content, and of course many people will want something in between the two extremes.
Different users have different requirements. You need to understand the domain in which you're working and see the classification of depicted image content as supporting the needs of your users.
For example, Would you tag everything in this 'Messy Room' image?
What would you miss out and why?
Looking at the image of "Mountain Goats", from Thorne Enterprises
Would you tag this with goats as well as mountains? Would this be helpful?
Let's look at four images depicting windows:
What Light Through Yonder Window Breaks'?
and
Looking at these, it soon becomes clear that even deciding to apply a simple term like 'Windows' is not always easy.
Would you apply 'Windows' to the image of the cat looking out of the window? Is a window actually depicted in that image? If the image wasn't tagged with 'Windows' how else would anyone find an image of a cat looking out of a window?
The other three images show windows as parts of buildings. but is a building always depicted? Deciding when to apply a building type or the name of a building can be hard. Should you do this every time a part of a building is shown? Only when the whole building is shown? When enough of the building is visible? Or when a section of the building that to most people would represent the build is visible? For example, what part of the Empire State Building would you consider to depict that building? Rarely does anyone see it all - how much is enough? Would you treat the images of windows in a similar way and classify them all with a building type of 'Houses', or would you ignore the structure and focus on the parts - the window, the roof?
Consistency
Achieving consistent application of terms to images revolves partly around clear term definitions, well defined application rules and guidelines, and a robust quality assurance process.
Term definitions are very important. Defining the meaning of a term, and ensuring the people choosing which term to assign understand that meaning, can be crucial to term application. For example, creating a term such as 'Bow' without defining its meaning is not going to make it easy to apply.
Application rules that are well considered, thorough and clear are also very useful. Even a simple concept often needs some form of guidance linked to it. I remember a while ago needing two terms, 'Indoors' and 'Outdoors' to allow users to find images of people who were outside and inside - a simple concept you might think, one that people often need, and one that's easy to apply - who'd need guidelines for that? However, it soon became clear that guidelines were needed after I received a series of interesting questions: Is being on a train indoors? Should studio shots always be considered indoors? Does every shot of a person have to have indoors or outdoors assigned to it? If not, when should this term be used and when not? Is this a focus issue? If so, how much of a location needs to be seen before Indoors or Outdoors is used. A clear set of application guidelines followed an interesting meeting!
Strong quality assurance processes are very valuable. People make mistakes and images generate interesting issues. Appointing staff to review a percentage of classification work based on clear guidelines, and then sharing findings with the people who assigned the terms to the images, is an important way of assessing how well the image classification is progressing and keeping a classification team synchronised.
Today I’ve talked a lot about content depicted in images, next time I’ll focus on abstract concepts which are related to an images ‘aboutness’.
Content Based Image Retrieval - Google and Similar Image Search
Anonymous — April 24, 2009 - 12:46am
I was very interested to see Google experimenting with visual similarity in still images, what I usually call Content Based Image Retrieval or CBIR.
Google Labs have just launched an image search function based on visual similarity - Google Similar Images. This new offering allows searchers to start with an initial image and then find other images that look like their example picture.
I've been reviewing these type of systems on and off since the early '90s. They've always offered much, but I never saw any evidence that the delivery matched the hype.
I've always found that using pictures instead of text to find images works best on simple 2d images: carpet patterns, trademarks, simple shapes, colours and textures. Finding objects in images was always a struggle, and looking for abstract concepts: fear, excitement, gloom, isolation, solitude.. was never been more than a vague possibility. Over the years a lot of work has been done in this area, and the search results I've seen have started to improve, but this technology is still young, and in my personal opinion still rarely delivers what most users want, need and expect.
Looking at Google Similar Images, I wonder how much of the back-end is pure content based image retrieval (CBIR), how much is using metadata in some way, and how the two are interacting? One thing that appears to be helping to often show a tight first page of results, is simply pulling the same image from different sites. I also noticed that the 'similar images' option is not available for all images - which makes me wonder why? Have some images been processed in ways that others haven't?
Diving right into the experience, I entered a query for a place in the UK and didn't see any image results with the 'Similar Images' option. I wonder whether this is to do with the presence of the results on UK websites?
I persevered, and found some interesting images and got some interesting results.
I started with a fairly standard image of a beach scene, always a favourite with testers. As you can see I got a pretty good first screen back. However, the 5th and 6th image on the top row show no sea or beach, neither do the first three images on the second row.
I moved on to an image of what looks like equipment at the top of a pole.
The results were much more mixed: studio shots of objects, fighting people, trucks etc. No images were returned that I would consider similar to the example picture.
Interesting results came from a similarity query on a clock face. A couple of the first results hit the mark, then the results set degenerated into image similarity based more on the colour and the black background than anything else.
My last attempt, before morning coffee called, was an image of a country road. I was hoping that the clear roadway might produce a pretty precise results set. However, I was a little disappointed by what I saw.
The first results page only produced one vague road on the bottom row, with most of the similarity seemingly related to colours instead of objects.
From my less than scientific dip into this Google Labs offering, it looks like the highlighted images on the Google Similar Images home page produce good results - better results than I've seen other systems come up with. Many other image queries are sure to also produce results which may well impress. However, many of the results I saw did not match the initial level of accuracy I saw from the highlighted home page pictures.
I don't want to be picky, this is still a prototype after all, and well done to Google for introducing a wider audience to this type of image search. Hopefully, after more work, the results will increasingly make more sense to people, the access points offered to depicted content and conceptual aboutness will improve and more images will be more findable for more people.
Until that time, visual search without text will help with image findability, but text, metadata, and controlled vocabulary applied to images by people is for me still king, and will continue to offer the widest and deepest access to images for a long time to come.
Ian
Synaptica and Dow Jones Taxonomy Services Video Collection: Summary: Here you will find videos that have been either produced by Dow Jones or feature a Dow Jones employee or customer discussing the topic of the development, management and governance of controlled vocabularies. This includes customer case studies, conference presentations and panel discussions and product demonstrations.
November 2008 Synaptica: SharePoint Integration In November we announced our new SharePoint Integration . This video takes you through a short demo of the SharePoint Integration: http://blip.tv/file/1475940
September 2008 Synaptica Case Study: Proquest: Finding a Common Language: Bringing Complex and Disparate Vocabularies Paula R McCoy, Manager, Taxonomy Development, ProQuest Daniela Barbosa, Synaptica Business Development Manager, Dow Jones Client Solutions, Dow Jones & Company This case study addresses the challenges ProQuest faced in managing multilingual controlled vocabularies using multiple Word documents and authority files maintained in an Oracle database. Speakers describe how implementing a thesaurus management tool helped ProQuest simplify and standardize its business semantic management to create a common language and connect disparate information assets as well as handling large and varied vocabularies and authority files, linking new and existing editorial systems and enabling hierarchical views, and automating thesaurus management tasks.This session was sponsored by Dow Jones Synaptica. http://blip.tv/file/1306890
September 2008 Centralized Taxonomy Management for Enterprise Information Systems Daniela Barbosa, Synaptica Business Development Manager, Dow Jones Client Solutions, Dow Jones & Company Paula R McCoy, Manager, Taxonomy Development, ProQuest Now that you have built your taxonomies, you need to manage and maintain them in a centralized environment that can be leveraged by all of your enterprise applications including search tools, portals, and CMS/DMS systems. This session will review some best practices in centralized taxonomy management and go through the implementation of a thesaurus management tool at ProQuest, which enabled them to create a common language to connect disparate information assets using large and varied vocabularies and authority files linked to new and existing editorial systems. This session was sponsored by Dow Jones Synaptica. http://blip.tv/file/1307166
March 2008: iKMS: Marti Heyman on ROI Analysis for Taxonomy Programs Video by: Patrick Lambe www.greenchameleon.com In this talk for the Information and Knowledge Management Society of Singapore (www.ikms.org) on 13 March 2008, Marti Heyman Director of Taxonomy Services at Dow Jones, discusses the problems associated with ROI for taxonomy programs, and the key steps in ROI analysis. In this first part she discusses the issues around ROI. This session was sponsored by Dow Jones Synaptica. Part 1 of 3: http://blip.tv/file/917758/ Part 2 of 3 : http://blip.tv/file/917962/ Part 3 of 3: http://blip.tv/file/917979/
March 2008: iKMS: Christine Connors on User Driven Taxonomies Video by: Patrick Lambe www.greenchameleon.com In this talk for the Information and Knowledge Management Society of Singapore (www.ikms.org) on March 13 2008 Christine Connors Director of Semantic Technologies at Dow Jones and Business Champion of Synaptica, explains the rationale for a hybrid approach to taxonomy development, harnessing user inputs and activity as well as the traditional controlled approach, giving examples from her pioneering work at Raytheon. This talk was sponsored by Dow Jones Synaptica. In the first part, Christine gives a general rationale for a more user driven approach. Part 1 of 3: http://blip.tv/file/917603/ Part 2 of 3: http://blip.tv/file/917629/ Part 3 of 3: http://blip.tv/file/917691/
November 2007: Synaptica Case Study Abbott: From Taxonomy to Ontology: Laying the GroundWork for the Semantic Web Presented by Jennifer Borrell, Associate Information Scientist at Abbott Laboratories Jennifer takes us through how Abbott Laboratories uses Synaptica to build and maintain their Ontologies. Presents a high level overview of how Abbott views ontologies and how they are laying the Groundwork to Improve User Productivity. Sponsored by Dow Jones Client Solutions. http://blip.tv/file/482545
August 2007: Using Tools to Manage Taxonomies Video by: Patrick Lambe www.greenchameleon.com Dave Clarke, CEO of Synaptica (Synaptica/Synapse co-founder) In this video Dave Clarke describes how tools can be used to manage taxonomies, for an iKMS evening talk on 30 August 2007. In part one Dave describes how you can use tools to manage the collaboration required in building and maintaining taxonomies. In part two Dave describes how you can use tools to support the taxonomy creation process and in part three Dave describes how taxonomy tools can link different enterprise applications, including legacy taxonomies. Part 1 of 3: http://blip.tv/file/375135/ Part 2 of 3: http://blip.tv/file/375156/ Part 3 of 3: http://blip.tv/file/375196/






