What is the Hardest Content to Classify?

This week I saw that my first blog post on Synaptica Central had been published! After a few seconds of enjoyment, as a wave of pride washed over me, I realized that I now need to post pretty frequently to give our readers something interesting to read! So, without further ado…

The first topic that came to mind as I thought about topics to blog about is the whole area of classification of different types of content: text, sound, video and images.

I often speak to clients who have a range of item types stored in a number of repositories. They're often looking to classify new content, or to work on older content in order to improve its findability. They are always looking to get more value from their content.

In these circumstances a content audit is often called for, to answer the 'What do you have?' question. This then leads to a general discussion of the content types and the ways in which they can be classified, usually using a controlled vocabulary either applied by a machine, by a person, or by a mixture of the two.

One thing that often makes people ask me questions is my fairly frequent assertion that images are easily the hardest item types to deal with.

Why are Images the Hardest Content to Classify?

-Textual items contain text. Use of auto-categorising software, free text storage and access .etc .etc makes organising and finding textual items relatively easy.

-Sound can be digitised and turned into text.

-Video often has an audio track that can be turned into text too. Computers can be used to identify scenes. Breaking a video into scenes and linking a synched and indexed soundtrack together can provide pretty good access for many people - (though there's a whole blog post on the many access points to video that these process doesn't provide).

Images on the other hand have no text, no scenes, all you have are individual images, with the meaning and access points held in the visuals.

Some will say that this is really not a problem, all you need to do is use content based image retrieval software to identify colours, textures and shapes in your images, and you'll soon be searching for images without any manual indexing. However, whilst this technology is promising, it leaves a lot to be desired.

Today, the way to provide a wide and deep level of access to still images continues to be by using people to view images, write captions and assign keywords or tags to each image based on image 'depictions' and 'aboutness and attributes'. This manual process often requires the use of a controlled vocabulary to improve consistency and application.

However, how this indexing is done and what structures support it, will be the subject of further posts- I just wanted to get my thoughts out there !

So Stay tuned.
Ian

Why My Dog is Like a Search Engine Without a Taxonomy

A couple weeks back i asked my 130 pound dog Townes if he wanted a 'biscuit' and he ignored me- i had forgotten that he isn't the smartest dog in the world and that he can not make the automatic association that a 'biscuit' is a 'cookie' so as i spoke, he just stared into another room. (read: it really isn't that he is dumb just that he wasn't trained). Now he has become a bit of a taxonomy star and it might be getting to his head.

Today i ran into someone who had just read my personal blog for the first time recently and had seen my post titled "Why my dog is like a search engine without a taxonomy" and had loved the video i made. She told me that she used it in an internal discussion about taxonomies. She wasn't the first one who has said they loved the video so i figure i would repost it here.

So that morning when i asked him if he wanted a 'biscuit' instead of a 'cookie' and all i got was was a blank stare, i told him he was "like a search engine without a taxonomy" and then he looked even more confused. Then we made this video. He is a bit bored in this video because it is the second take so the intial reaction is not 100% but hopefully you get the idea!

The moral of the story? As humans we easily make associations between words- machine and dogs can't unless they are "trained".

He is a 4 year old Great Dane in case you are wondering- and yes he got lots of biscuits and cookies that morning! (like most days....)

Author Spotlight: Ian Davis Global Project Delivery Manager Taxonomy Delivery Team

My name is Ian Davis, and I'm a Global Project Delivery Manager working in the Client Solutions Taxonomy Delivery Team and based in our London office. I work to develop and deliver a range of content and information solutions for our global clients. Projects can include discovery assessments, taxonomy strategy and creation, taxonomy mapping, search support, information architecture and website development. I also assist in the marketing and deployment of Synaptica, a semantic management tool offered by Dow Jones, and the website www.taxonomywarehouse.com.

My particular areas of interest include: developing taxonomies, thesauri, and metadata schemas, manual and automated indexing of still and moving images, deploying and using Synaptica controlled vocabulary software, the challenges of managing teams of geographically dispersed information workers, website creation and development, and the localisation of content into multi-lingual environments"

I joined Dow Jones in February, 2006, after 13 years developing taxonomy and indexing solutions for still images libraries at both Corbis Corporation and Photonica (formerly part of Amana Japan and now part of Getty Images). At Corbis, I served as head of the UK division’s image cataloguing department. At Photonica, I worked to create and implement the e-commerce website www.iconica.com and was responsible for the development of www.photonica.com. I also developed, implemented and maintained all vocabularies underpinning the classification and retrieval of Photonica's extensive digital image content. One aspect of this included creating an extensive English language thesaurus and managing the localisation of that controlled vocabulary into five European languages. I managed a team of ten still image indexers and five thesaurus developers. After leaving Photonica, I worked as an independent consultant for BUPA in the area of metadata and taxonomy creation and development, and the implementation of an enterprise search solution.

Most of my time is currently spent working on the delivery of a major client engagement in Asia. I'm managing a team of geographically dispersed staff who are working on the customisation of a large topical thesaurus and the creation of various browsable taxonomies. We're also creating a multi-lingual thesaurus by translating the large English thesaurus into three other languages and tying the whole lot together. If that wasn't enough, we're also involved in the mapping of the vocabularies we're working on to both legacy internal client vocabularies and to third party ones. We're also starting to consider how to move these thesauri and taxonomies into the world of ontologies."

Audience Centric Taxonomies: Talk to Them Like You Know Who They Are

For many organisations, providing a single way to navigate information resources is unlikely to meet the increasingly complex needs of a diverse audience of users.

Forcing users to locate resources in a fixed manner is usually the main culprit in most failed taxonomy implementations. This problem is especially pronounced when users are migrating from their own information 'silos' to organisation-wide repositoies. To reduce resistance from users and to ease the transition into an environment that is more conducive to knowledge sharing and collaboration, multiple taxonomies reflecting the preferred discovery patterns and terminology of different user groups are highly recommended.

Those who have implemented taxonomies may immediately balk at this idea, since it could lead to a maintenance nightmare. However, we found that it is possible to both customize taxonomies for different user groups and to keep maintenance efforts to the minimum. Of course, this must be done with the proper planning, understanding of user needs and the right tools.

And this is where Synaptica fits in perfectly. A key differentiating feature of the Synaptica taxonomy management tool is its ability to provide 'audience-centric' views to diverse sets of users, yet maintaining a single 'master' taxonomy. This nifty feature builds upon the foundation of a master taxonomy by allowing alternative preferred terminology to be defined for multiple audience segments via audience 'extensions'. Terms that are not relevant to a particular audience can be suppressed and additional details or depth that are needed by another audience can be developed - depending on which 'extensions' are being deployed onto the navigation platform (Intranet, portal e.g.).

Synaptica automatically manages the relationships between the master taxonomy and its extensions, thereby enabling consistent searching using diverse taxonomic views. More importantly, it makes maintenance seamless as changes in any term will result in automatic updates across all 'audience-centric' views, eliminating the need to duplicate efforts across multiple taxonomies.

With the help of the Taxonomy Services team at Dow Jones Client Solutions, various organisations are using, or are planning to use this feature in Synaptica to enhance the information discovery process and experience. For example, in a typical corporate setting, users from different departments can continue to browse their contents using familiar terminology and views, even though they are really using a larger enterprise-wide taxonomy. Such functionalities help to lessen user resistance while fostering knowledge sharing at the same time.

In a wider context, such as that of a digital library designed for the Public, 'audience-centric' views can be designed to provide optimized browsing for varied audience segments. For example, the browsing needs on a topic such as Health and Medical Sciences would be very different between the general public and academic/researcher segments. These are just some of the taxonomy design and maintenance considerations that Synaptica addresses.

What this all means is that with the help of an experienced taxonomy design team and the right tools, knowledge managers can truly lay the foundation for improved information discovery and sharing without having to worry about details that can already be addressed by using existing tools and best practices.

To learn more about Audience Centric Approaches please see this paper:'Audience-Centric Taxonomy: Using Taxonomies to Support Heterogeneous User Communities'. This paper describes how the National Library Board in Singapore intends to utilize audience centric taxonomy to provide enhanced information access to its multilingual, multi-cultural user community.

This post was co-written by Tan Pei Jiun, a Dow Jones Senior Taxonomy Consultant based in Singapore.

ImageFlickr stephanieasher

Does Taxonomy Management within Microsoft SharePoint Always Need to be Painful?

It is a known pain point for companies who have adopted SharePoint that there is no ‘out of the box’ feature for classifying the ever growing content that they have across multiple sites. As more companies adopt SharePoint globally, the grunts of pain are becoming louder and louder and we certainly hear them and are working with our customers to ease that pain.

For an introduction to Dow Jones’ taxonomy services and a better understanding of the use and benefits of taxonomies within a SharePoint implementation, attend our upcoming Webinar on Thursday, September 11th. Entitled Taxonomy and SharePoint: A Powerful Combination. During this session you will learn some of the basic ways to manage controlled vocabularies using standard out of the box features that you can use immediately as well as learn about our upcoming Synaptica integration into SharePoint.

September 11, 2008 1112pm : Webinar: Taxonomy and SharePoint: A Powerful Combination
Webinar: Taxonomy and SharePoint: A Powerful Combination
Date: September 11, 2008
Time: 11:00 – 12:00 EST
Register Now.

SharePoint helps your organization connect people to business critical information and expertise in order to increase productivity and reduce information overload by providing your employees the ability to find relevant content in a wide range of repositories and formats. Understanding and using Taxonomies within a SharePoint implementation to help users find content, is an essential part of ensuring a successful SharePoint deployment. A Taxonomy can range from quite simple to very complex. In this session we will cover the basics of evaluating what you can do to create a simple taxonomy that will yield the most benefits for your SharePoint implementations. In this session you will learn a range of Best Practices, from the basics of building a taxonomy to the implementation of a taxonomy within a SharePoint site.

Recorded Webinar Available from this session on demand

Image|Flickr|By Vanessa Pike-Russell

We Hit the Airwaves with a ReadWriteTalk Podcast

In early June, I published a ebook about hybrid approaches to Folksonomies and Taxonomies in the Enterprise that has been very well received. Beautifully designed, it provides a high-level overview on why companies should be looking towards user tagging as a part of their content strategies.

So last month when i was approached by ReadWriteTalk about being interviewed for a podcast on the subject of the ebook I was pretty excited and of course a bit nervous at the same time. ReadWriteTalk podcasts are just one of the many podcasts that i listen to on a weekly basis so i certainly did not want to embarrassed myself! But beyond stumbling over some words, i think i did a decent job discussing the reasons I wrote the ebook and highlighting what the ebook covers. Although it is my face on the cover of the ebook (i share some of the behind the scenes as to how that happened), I also spend some time talking about the design of the book and the wonderful team I worked with to get it produced and made available for free download for everyone.

Sean Ammirati interviewed me and did a great job of not only prepping me for the interview but also making me very comfortable as we began discussing the questions he had. The podcast was also transcribed so if you prefer to read it versus hearing me go on and on and on about why it is important to look at some of the benefits of hybrid approaches...you can.

If 'Search' is the Answer, You May Not Be Asking the Right Question

Daniela's latest data visualization post and a conversation with one of the Dow Jones Factiva Project Managers earlier today got me thinking about a presentation I did a couple of years back for the Taxonomy Community of Practice. Seth Earley, TaxoCoP's founder and chief moderator, had once asked me to talk about my experiences with the Google Search Appliance, and part of my argument was to not simply dismiss it - given the cost, ease of implementation, and quality of results, it may (and does) suit the needs of many organizations.

Our Commitment to Data Exchange Standards

The concept of machine automated data exchange is certainly a road that is ever changing course, with many paths splitting off going in this direction or that. Among the many "standards" that evolve it is sometimes hard to discern which paths to follow, and which will end up in a dead-end. Our own Daniela Barbosa participates in many discussions around data portability and can certainly attest to this.

In the development of Synaptica, we have traditionally sought out a policy of trying to be as technically agnostic as we can, leaving things as open ended as possible when it comes to data import and export to allow free and open exchange with external applications and other consumers of taxonomies, thesauri and other types of controlled vocabularies.

[Read More]

Are We Beauty Obsessed Even in Visualizing Data?

5 Kitties Need Homes, 4 Cute, 1 Ugly. Who went home first?

I found this picture on on flickr by Bethany King and couldn't resist. I would probably pick the ugly one and give it all the catnip it wanted but we all know that the cute ones are probably the ones that in the long run got the most treats.

Showing live 'clickable' examples of something that i am trying to explain is usually the way to go - especially if I am not sitting around a conference room with the 'all powering' whiteboard around to draw pretty pictures (i am actually a horrible artist but that's a whole other conversation!). During those meetings if the examples are 'pretty' and demonstrate the point i am trying to make - then even better because who doesn't like 'cute and pretty'?

Upcoming Events With Dow Jones Synaptica and Taxonomy Services Team


Internally here at Dow Jones we have taken to calling September 'Taxonomy Month' because of the multiple events we are participating, sponsoring and organizing.

So below, listed by date are our 'Taxonomy Month' events- we hope to see you at some of them- and please do not hesitate to contact us if you have any questions!