Synaptica Has Got Its Head in the Clouds

The way companies are using software has been shifting- and if your head hasn't been in the clouds over the last few years i am sure you have noticed the shift to SaaS (Software as a Service) offerings and more services moving to the "cloud'. From The Economist's recent 14-Page Special Report on Corporate IT titled 'Let it Rise' focused on cloud computing, Microsoft's recent Azure announcement indicating an even bigger investment to moving services to the 'cloud', the recent discussions around Tim O'Reilly's post Web 2.0 and Cloud Computing , and of course discussions about the economics of cloud computing in today's world it is evident that these models- which are not really 'new'- are here to stay.

It is a little known fact- one that i am trying hard to ensure the marketplace knows, but Synaptica is available as a hosted application with complete access to most of all the features that are available (this includes access to robust Web Services). And just like the recent buzz in the marketplace, having access to Synaptica as a 'service' is something that recently we have been getting more and more requests about.

Who has interest in a Taxonomy and Metadata Management tool as a hosted model? Well it is not for everyone who has a need for a tool like ours, but for those who are interested it really varies. For example:

  • Small to Medium, Corporate libraries or Product Manager/Marketing groups who are managing various taxonomies and do not have a lot of IT resources for bringing a tool in-house but can really benefit from a centralized taxonomy management tool that can be accessed via the internet securely by their global colleagues that work on the vocabularies collaboratively
  • Companies that have an urgent need for a tool but don't have the resources to bring it in-house quickly at that specific point and chose a hosted model as a first phase to get their taxonomy development and deployment done
  • Companies that perhaps have an technology architecture that is based on the LAMP Stack that Synaptica at this point can not fit nicely into
  • Start-ups who are building a consumer service that requires a tool to manage their controlled vocabularies (e.g. product categories, navigation taxonomy etc.) but who do not have the IT infrastructure to host an application like Synaptica (e.g. most of their stuff is already in the 'cloud')

So with our hosted model, we can provide at whatever tier a company is at- an affordable and secure way to manage an important part of their business.

And the best part? Well coming in at the low-end, with access to a Synaptica hosted annual license (with full access to all editorial and administrative features including Web Services), you can basically choose to either use one of the premier taxonomy management tools in the marketplace or if you are so inclined- you can instead choose to spruce up your office by buying a Hyacinth Macaw Parrot, or perhaps you can buy one of your employees a nice baby shower gift like this blinged out Baby Pram or even update your office outside picnic patio area with the Kalamazoo Bread Breaker Two Dual-Fuel grill - yes, it really is your choice.

A Project Taxonomy Can Avoid Hours of Frustration

Here at the Synaptica Central Blog most of our posts are focused on developing and managing complex taxonomies which is what our taxonomy consultants are usually doing at client sites during the week...of course unless they are busy blogging here ;-).

There are certainly different levels of complexity depending on the Client, but the business needs are typically robust enough that at one point the customer also looks for a tool to manage those vocabularies and Synaptica fits the bill. Typically this is because of the need to maintain relationship between terms in a thesaurus (like BT (Broader Term), Narrow Term (NT)) that are hard to manage in a spreadsheet or relationships between different vocabularies which many of the thesauri management tools in the marketplace do not allow. They may also have a need to integrate these vocabularies into other systems like search engines, CMS/DMS, DAMs etc. beyond sending excel sheets around their company which can be quite painful.

We have also however seen some pretty cool uses of the tool like Jim's recent post about Thinking Outside of the Synaptica Box about our own in-house usage. Our clients see the power of the tool and adopt it for their own needs- many times bring users into the fold that never thought that they would be creating and maintaining a "taxonomy"!

This post on Project Management from the Developer's Perspective : Project Taxonomy by Stacey Mulcahy on the O'Reilly InsideRIA blog reminds me of some of the unique ways that customers are using the Synaptica tool for.

In her post, Stacey does an awesome job of explaining how "Adopting a project taxonomy is one of the simplest pro-active ways to avoid hours of frustration caused by miscommunication. Once team members, regardless of discipline and role, utilize a shared vocabulary, interactions become more meaningful and ultimately more productive as more time is spent in communicating the message and less time clarifying its context."

Things like Synaptica's "MyWeb Views" allow Admins to quick created 'Read Only' Views for the whole organization to be on the same page- for example with a link to images likes Stacey suggests in her post so everyone gets on the same page as to what a specific term means- for the organization as a whole- or possibly only for that specific project that the team is working on.

It is a must read post- and if you are thinking about the different ways controlled vocabularies are being used in your enterprise and already have Synaptica in house and just want to get others in your organization to benefit from the tool- look at your Project Managers and let them know that you have a tool in house that can simplify the way they manage their taxonomies with their project teams to avoid hours of possible frustration.


E-commerce, Comercio Electrónico, Commerce en Ligne, Elektronischer Handel ...

According to a recent global survey conducted by The Nielsen Company about trends in online shopping, over 85 percent of the world’s online population has used the Internet to make a purchase.

Finding or not finding products and services on e-commerce sites is key to success regardless of what language an online shop operates in. The conversion rate of a search; i.e. the rate of how many products will actually be bought through searches, is one of the central measures of how successful an e-commerce site is.

The end-user expects an interface that is intuitive and easy to use as well as a navigation and search that directs him or her to relevant products and services. How the user's search terms are actually associated with the "right" search results is of no interest to the online shopper, but is a complex issue that all e-commerce sites and online shops have to deal with.

Having worked with many e-commerce customers in Europe, I have come across a lot of the complexities that optimizing the search capabilities of a site can bring and that an end-user will literally only see the tip of the ice-berg of.

From content, controlled vocabularies, search metrics and process questions that need to be addressed, having the right tools to optimize a search is probably the simplest but no less important question.

Often, search engines focus on what they are made for: Searching. Managing vocabularies for search improvement is usually not one of the areas that vendors specialize in or focus on. The most relevant features we encounter that are often not covered by search engines are:

  • Central management of vocabularies (products, services, colours, materials, and other filters) to ensure that there is one version in place from which extensions can be built if needed
  • Allow for different users to contribute to a controlled vocabulary through different levels of access rights, so for example working directly with content editors to share input
  • The possibility to add comments to terms (why has x been introduced as a synonym to y)
  • Being able to monitor the progress and changes that have been made
  • Being able to retrieve historical information
  • Creating Audience Centric Views
  • just to name but a few!

Next to many other aspects, being able to manage controlled vocabularies in an efficient and effective way is one of the prerequisites to optimize the search capabilities of an e-commerce site. Not only will it help drive online sales, because users will find the most relevant products and services, but it will also contribute to a positive shopping experience so that new shoppers will return.


An Overview of Semantic Technologies at Dow Jones

An overview of how the Dow Jones Enterprise Media Group uses semantic technologies / solutions for our own organization and for customers. This brief presentation was given at MIT to the Cambridge Semantic Web meetup on October 14, 2008.

Classifying Images Part 2: Basic Attributes

Last month i asked the question "What is the Hardest Content to Classify?" and promised additional posts on the subject based on my background of 13 years developing taxonomy and indexing solutions for still images libraries, so I am continuing my thoughts in this post focusing on the basic attributes of image classification.

In my opinion, images are the hardest content items to classify, but luckily for sanities sake not all image classification is equally demanding.

The easiest elements of image classification relate to what I'm going to call image attributes metadata. This area, for me, covers all the metadata about the image files themselves, rather than information describing what is depicted in images and what images are about.

Metadata aspects in this area cover many things and there are also layers to consider:

1, The original object
-- This could a statue, an oil painting, a glass plate negative, a digital original, or a photographic print

2, The second generation images
-- The archive image taken of the original object, plus any further images, cut-down image files, screen sizes, thumbnails, images in different formats, Jpeg, Tiff etc

The first thing to think about is the need to create a fully and useful metadata scheme, capturing everything you need to know to support what you need to do. This may be to support archiving and/or search and retrieval.

Then look at what data you may already have or can obtain. Analyse data for accuracy and completeness and use whatever you can. Look to the new generation of digital cameras to obtain metadata from them. Ask image creators to create basic attribute data at the time of creation.

You'll be interested in the following metadata types:

- Scanner types
- Image processing activities
- Creator names
- Creator dates
- Last modified names
- Last modified dates
- Image sizes and formats
- Creator roles - photographers, artists, sculptures
- Locations of original objects
- Locations at which second generation images were created
- Unique image id numbers and batch numbers
- Secondary image codes that may come from various legacy systems
- Techniques used in the images - grain, blur etc
- Whether the images are part of a series and where they fit in that series
- The type of image - photographic print, glass plate negative, colour images, black and white images

This data really gives you a lot of background on the original and on the various second generation images created during production. Much of this data can either be obtained freely or cheaply, lots of it will be quick and easy to grab and enter into your systems. It should also be objective and easy to check.

My next post will cover dealing with depicted content in images. Please feel free to leave comments or questions on the subject.

Image|Flickr|Daniel Y. Go

Synaptica Central : Dow Jones Video Library

Video might have killed the Radio Star but in today's video streaming world it certainly is helping distribute knowledge and that is why we are publishing a video page to augment our blog postings.

Very often i talk to clients and they are in need of information to learn about key concepts or even just to share a third party view with their colleagues about specific topics around controlled vocabularies that I know someone on the team has presented or written about. It could be for example providing a white paper about Audience Centric Views, a video overview of Taxonomy Management Tools and how to use these tools to collaborate around developing controlled vocabularies or a real life case study of an existing client using Synaptica. In the past, I have kept these references in a .txt file on my desktop that I reference when I need to, but since this blog is being used as a resource for both us internally here at Dow Jones as well as the community, i figured it would be a good time to start a Video Library of our Dow Jones public resources.

So without any further ado- our Dow Jones Video Library has been published.

This is just the start of turning Synaptica Central into a must go to resource for our community, so please watch this space for additional resource pages from recommended white papers, industry standards references, must see videos, must listen to podcasts and must read books!

Have suggestions of things we should make sure we add to our resource pages? Please leave them in the comments or drop me a note at

Image|Flickr|traed mawr

In Developing a Custom Taxonomy Only Time Can Tell

OK Quick Monday Quiz: How Many Minutes Does It Take to Create a Category (aka term, node, leaf, etc)???

I suspect that anyone who has worked on developing a taxonomy has heard this question or a variation of it. It seems like we get it daily! Once a client decides they need or want a taxonomy – they need or want it immediately so figuring out when becomes the next question.

After almost 30 years of being involved in the development of controlled vocabularies, thesauri and taxonomies I should be able to say it takes X minutes per term but I’m still forced to tell clients that it will depend on a number of things that are usually covered in the Assessment Phase of any engagement like:

• What is the topic of the taxonomy?
• What is its intended purpose?
• What systems will you use to develop and maintain it?

Once we’ve answered all these questions, the next one is frequently whether they could just use a taxonomy that is already developed. No matter what approach is ultimately chosen to create a taxonomy – it still takes time and the ultimate answer is that it depends on what the client needs, how many terms there will be, how technical those terms are and the taxonomy development tool that is being used.

Building a taxonomy for an area that you are familiar with can be done fairly quickly while building one on scientific, technical or medical areas might be much slower. Adding to the issue of the topic is the issue of the tool where the taxonomy is being built. The more efficient the tool the faster the development once terms have been decided upon and research for the terms completed.

Experience in developing taxonomies has given me some general metrics that can be used for pricing a taxonomy but the reality is that the best answer is that it all depends on what is needed.

So – how long does it take?? – it takes as long as necessary!!


Taxonomies are a Commodity

For some reason or another (lots of travel, several hats at home and work) I've had trouble finalizing this post. Earlier today though, I read Paul Miller's latest post on ZDNet. There seems to be some discussion about whether or not data is a commodity. I think there IS most definitely data that are a commodity.

Taxonomies are a valuable raw material in the management of information. A file that can be bought and sold and used to improve services. They can be generated by humans, machines, or even better: humans working with machines. Monkey Chewed Coffee Beans 4Many taxonomies are a dime a dozen, with little to differentiate between versions of the same data. Some are like Kopi Luwak coffee - rare and extremely valuable. The word "taxonomy" is itself suffering from a kind of genericide. Classical definitions still apply: taxonomies have become commoditized.

The complexity of the controlled vocabulary will determine its value to a degree. A simple pick list should be easy and cheap to acquire - a list of countries, for example. Or colors, seasons, months - you get the idea. What is the value of a list of industries? Or companies? Maintenance is the primary cost factor - frequent changes require frequent updates, but an authority file in and of itself is not that complex. A broad and deep poly-hierarchical taxonomy I would expect to have more value. A poly-hierarchical taxonomy is one where a term in the taxonomy can have more than one parent term. Managing these relationships takes more time. An ontology - well, those aren't quite commodities yet, but they will get there. Why? Because they still require a great deal of thought and effort.

The source of the data will also help determine its value. Data from trusted sources - for whom integrity is paramount - should be valued higher. Is the data accurate? Is it maintained? Is it in a usable format? Does it have high availability? (Many quality vendors can be found at

The uniqueness of the taxonomy will drive its value. Like our coffee example above, a taxonomy as ubiquitous as Starbucks will not be as valuable as say a pharmaceutical research vocabulary. Given the, uh, processes needed to produce Kopi Luwak, it is rare and therefore fetches a higher price, as would our R&D taxonomy.

The information security concerns also impact value. Our pharmaceutical company, or a financial services provider, is not about to release it's vocabulary into the wild. It is a significant intellectual asset that merits a substantial IT effort to protect.

I actually like the fact that taxonomies have become commoditized. Why? Competition drives improvement - in quality, in focus, in security and in usability. These are areas that the semantic web community needs to focus on - in my experience, security and usability need attention NOW. Good fences make good neighbors, and when we've got good fences, we can make more links and learn to trust. Icing on the cake!

Flickr image by INeedCoffee

Synaptica Featured in New Report on Industry and Leading Vendors in the Semantic Web Space

The Synaptica team is in Denver this week doing strategic planning, or as I say, scheming ! ;) There are a great number of really interesting problems in information management, and it's fun and rewarding to brainstorm ways of solving them. We're not the only ones scheming and it's great to see the market itself growing.

David Provost this week published a Global Review of the Industry and Leading Vendors in the Semantic Web space titled On The Cusp: A Global Review of the Semantic Web Industry in which Synaptica from Dow Jones and Dow Jones Client Solutions were highlighted. Dow Jones is in a unique position as a software vendor, consulting services provider and deployer of semantic solutions, which made for a great conversation - I highly recommend you read the report. (Not just because I was involved!)

Paul Miller has posted on his ZDNet blog a review of the report New report places Semantic Web ‘On the Cusp’ of something big. Paul adds some great commentary to his summary of the report, should you not be able to get through David's entire document at once.

Happy Reading!

Congratulations to Gabe Rivera on being named to Business Weeks 25 Most Influential People on the Web

I am a huge fan of Techmeme and therefore of Gabe Rivera who just yesterday got named by Business Week as one of the 25 Most Influential People on the Web!!

So i recorded this message earlier today:

For the month of September, i managed to convince our wonderful marketing department to have our new Dow Jones Synaptica Central Blog as one of the sponsored posts on Techmeme and we have certainly had success in driving traffic and interest in the Taxonomy side of the Dow Jones Client Solutions business.

I was an early fan of Techmeme - for example here is a video of me and Robert Scoble in February 2007 showing Clare Hart, Dow Jones EVP, Techmeme and can you can hear how excited i get.

Well today is the last day of the month- and we decided to skip the month of October as a sponsored post - but we hope to be back soon!

Congratulations once again Gabe and thanks for letting us be a sponsored guest to get our Synaptica Central blog off the ground!