Monday, July 7, 2008

Content Management

Content Management seems to be a very large blanket lately. It can cover anything from managing data on the back end to managing information for delivery to a web server. In today's blog I will be covering Data Conversion, Publishing Systems, Content Management, and finally Web Delivery. This will be a brief conversation, since I could possibly write a book on data conversion alone. If you have any questions please feel free to contact me at

Data Conversion

Having cut my teeth in data conversion, I naturally believe the most important concept in delivering information to print or the web is having total control over your data (after all your data IS your business). What is the best way to do this? Well it seems that even today a lot of organizations are storing their information in proprietary formats such as Word or Quark. While these tools might fit your current work flow well, they will fall apart as formats change or if the supplier's business model changes, let's assume that Microsoft decides that Office is no longer where they want to be, they could put an end to Word and all your data would be stuck. My belief is that you need to place your information into XML (10 years ago I would have said SGML, just goes to show that even these formats change).

What is nice about XML is that it is a very structured text based markup language. The advantage of it being text based is that you will be able to use it on a Windows PC/Mac/Linux system, with only a text editor. XML is concerned with the structure of the data instead of the formatting, this is a key difference between XML and markup languages such as HTML or Word. In this world formatting is derived from structure not from random bold and italic tags thrown throughout the data. I am not an XML purist, so I will allow emphasis tags to be inserted into my XML for bolding and such, but a purist will be strict about the structure bit.

It isn't easy to migrate your information into XML, which means you will need to spend time and money to have someone analyze the information, build a DTD (the DTD describes the structure of the XML, there are many out there that can be used or modified for your purposes), then you need to have someone convert your data into an XML format that is supported by the DTD. Note that this isn't easy or cheap, it requires a complete knowledge of what you want to do with the information, but once it has been completed, your information should be set for the next 100 years. When you start this process be ready for arguments, concessions, and revelations, it's an exciting process.

Publishing System

The Publishing System will be used to edit your information, an XML based publishing system will look at the XML and the associated style sheet, format the information in a fashion that is viewable to the editor or writer, then allow that editor or writer to write their article without regard to the XML. The system will enforce the structure of the XML, but the end-user would just see this as applying styles. It would be as if they are using Word or Quark. A good system will allow you to apply meta-data to the file and understand if you have a Print/Web/CD-ROM or other delivery method that needs to be supported, and be able to prepare the data for that method. It will also hopefully alleviate the dreaded editing in print composition that always seems to happen.

Traditionally print has driven the publishing industry, what happens is that the print is moved from the publishing system to a composition system, in this composition system, the data is edited, but these edits never make it back to the source data, which means that your other delivery methods are no longer up to date, this could be a dangerous issue depending on the information content your are working with. In a good system, print composition will happen in the publishing system and will affect the source data, this will ensure that all data streams are up to date.

Once your data has been modified it goes to the next step in the flow, in our case we'll ignore Print/CD-ROM/Other and go directly to Web. When I publish to the Web I prefer to format my information as HTML before it gets to the Content Management System (CMS), ultimately the information is delivered to the browser in HTML, if you don't convert the data, either the users browser or the CMS will need to deliver the information. Being a data control freak, I would rather have my system convert the data up front and not require the CMS to spin extra cycles converting data.

Content Management

Content Management is specifically that, it manages your content. A good system will allow you to add/modify/delete information on the fly, index information on the fly, and will maintain security in the system, either with a built in Digital Rights Management System (DRMS), or by securing the information so users can't bypass the systems DRMS. This system will also maintain all meta-data for each document, meta data can consist of the documents title, published date, expire date, document hierarchy, topic, subtopic, etc... The meta-data is important for displaying browse trees within the web interface or for searching the information.

A content management system can be very complex or very simple, it depends on the amount of money you have to spend and the structure you want to enforce. There are systems, such as Joomla, that are free and there are systems out there that will cost you $100k plus. For many years my content management system consisted of a SQL server that contained the data as blobs or a set of directories using the file system (these directories were not directly accessible via the web server), this wasn't great since it didn't allow us to really index on the fly, but it did allow us to stand up a site quickly and cheaply. We also had an off the shelf indexing engine and a custom built DRMS to manage access to the information.

Either the DMRS or the CMS should be able to log access to your site. This should not only include file access, but should also include queries and any special features, such as bookmarking, that have been built from your system.

Web Delivery

The Web Delivery seems to be what gets the most attention, but to be honest, I believe it is the least complex. When you build our your entire system you need to understand how you would like to deliver the information, but in the end, this is just the cosmetic portion of the site, it's what looks pretty and it's what will ultimately sell the site. For this you will need a graphic artist (note, your developer is NOT a graphic artist, if you want an ugly site, ask your developer to layout the look and feel of the site). When you develop the interface, you will need to have a knowledgeable systems architect on hand, someone who understands the entirety of your system. Given a good design and system and a graphic design, a new site should be able to be stood up in a couple of weeks once the data is ready.

No comments: