Introduction to assistive tagging

This article introduces the topic of assistive tagging, and describes some practical tools for implementing it. Just a warning in advance; this is a long article, and, while I've tried to aim it at content management professionals, it does veer off into technical territory at times.

Classification of content is a key part of modern content management. Content items that have been classified using structured controlled vocabularies are inherently easier to find later on. But it can undoubtedly be a painful process for authors and editors. It adds another layer of effort on top of the work involved in just writing and editing. It is also a classic case of WIIFM; the author or editor who painstakingly classifies their content against controlled vocabularies is rarely the person who benefits from it. So anything that we can do to reduce the burden of classification and make the process run more smoothly is likely to be beneficial, both strategically to the business and tactically to the author/editor.

An increasingly common response to this problem is auto-tagging. The idea is that the author or editor hands off the job of tagging to a robot derived from an artificial intelligence program. Rather like a search engine robot, this analyses the content and automatically assigns the most appropriate taxonomy concepts.

However, auto-tagging is not a solution to this problem, any more than search engines are a solution to information discovery. I have worked with quite a few of these systems over the years and in my experience none of them are an effective substitute for a human subject matter expert.

While auto-tagging remains a long way from practical value, there is a half-way house; assistive tagging. Like auto-tagging, this uses a computer program to analyse your content and to suggest candidate tagging concepts from a taxonomy. The human being can decide whether the suggestions make sense, and use or not use them as appropriate.

I believe assistive tagging represents a pragmatic approach to effective classification of content in the real world, and this article introduces some practical tools to help you use it.

The component parts of assistive tagging

Assistive tagging requires a content management system, a taxonomy management system and something to link them together. One way of doing that is to use a product like our Content Graph Explorer (CGE); you may like to see the article and videos that we have produced on the CGE. Actually the CGE goes a lot further than assistive tagging, since it also uses a triple store as the basis for exploring classified content in a novel way.

Another example of assistive tagging is the use of a Powertagging connector to act as the link between a content management system and PoolParty. That is what I'll be describing for the rest of this article.

Our components in this case are Drupal for the content management system, PoolParty for the taxonomy management system and the Semantic Drupal family of modules to provide the links. This set of modules is extremely powerful, but also quite complex, so I'm going to go through them one by one.

A brief aside on Drupal modules

Drupal is not just a content management system; it is a platform for developing content-rich web applications. Drupal's strength in this area is its module architecture; it provides a standard set of features and interfaces that allow anyone to create enhanced functionality for the content management system. This enhanced functionality appears as modules; they are created and installed into the Drupal environment by standard methods, and then the features become available for users of the system.

Returning to Semantic Drupal

The Semantic Drupal family of Drupal modules can be found at the Drupal project site: The Semantic Connector module is the central module and is a pre-requisite for the others. Here are the Semantic Drupal modules:

Semantic Connector This module manages the high-level connections between a Drupal installation and a PoolParty server
PowerTagging This provides the tools and user interface features for tagging a Drupal content item with one or more concepts in a PoolParty system. Once in place it dynamically suggests PoolParty concepts that match the content in the content item body.
PoolParty GraphSearch The GraphSearch modules offer some support to search functions within Drupal. This is a complex set of tools, and the GraphSearch components of PoolParty are rapidly developing, so I'm not going to write anything about it here. I will cover it in a separate article.
PoolParty Taxonomy Manager The Taxonomy Manager lets you populate a PoolParty taxonomy project from a Drupal vocabulary, and vice versa.
Smart Glossary This module lets you display an A-Z browsable glossary based on a linked PoolParty project. There is an optional visual browser component.

The links above will take you in each case to the Drupal module project page. On that page you can find links to documentation and downloads.

I use drush (the Drupal Shell, a command line tool to speed up Drupal management) to install Drupal modules. This way I can download a whole set of modules in one command:

drush pm-download semantic_connector powertagging sonr_webmining pp_taxonomy_manager smart_glossary

Where a module has sub-modules (such as PowerTagging) these will automatically be downloaded too. Drupal modules need to be downloaded and also enabled. It is possible to use drush for this too (the command is drush pm-enable) but I tend to use the Drupal UI to enable modules, because you get a lot of good information there about any dependent modules or libraries that need to be installed. Here is the module section for PowerTagging for example; you can see its inbound and outbound dependencies.

Once installed and enabled, the final step is to configure the module (where necessary - not all modules need configuration). I'll cover this for each module later on.