Metadata

Next: P2P Networking Up: Background Previous: Background Contents

Metadata

When searching for an item one always has to describe certain characteristics that distinguish the desired item from others. These characteristics will then be crosschecked with the characteristics of the possible options to select the best answer. It doesn't matter if a person is searching for a house by asking directions or if he is trying to find a digital document on the internet, characteristic negotiation is the basic element of searching.

When dealing specifically with data, these characteristics can be seen as data describing the data. For this special type of information the term metadata is generally used, where meta stands for 'above'. In everyday situations this information is completely distinct from the object it details. When dealing with digital information, however, it all boils down to numbers. This means that metadata can be seen both as a description of an object and as an object itself.

Traditionally metadata has been separated from objects. The most widely used form of metadata is the filesystem structure used to index digital data on a harddrive. This data is formatted identically for every file and therefore extremely brief. Such an implementation might be sufficient for a local filesystem where a small group of people handle the data, but it is an impractical indexing method for large heterogeneous environments. The large scale on which information is made available on the internet increases the demand for automation of resource identification to a level beyond that of basic filesystems.

Currently a few projects are underway to create a more flexible framework for describing data. Since standardization is essential for any system to become widespread the initiatives backed by international organizations have the highest chance of success. The Resource Description Format (2), backed by the influential World Wide Web consortium, is a semantic layer placed on top of the meaningless XML syntax. In RDF, meaning is given to data by coupling XML formatted content to specialized dictionaries. It is already being used to identify and rebroadcast news messages on the internet and by the Open Directory Project(3) webdirectory. RDF has been designed specifically with the automation of internet resource discovery in mind. The developers expect RDF to become the framework underlining the so called Semantic Web (4), a global information network where computers can autonomously index information.

Despite the effort of various interest groups, metadata is for the larger part currently written in proprietary formats, eliminating interoperability between software systems. Future developers should try to use standardized frameworks as the RDF mentioned above as much as possible if the internet is to remain a global information domain.

Next: P2P Networking Up: Background Previous: Background Contents

2002-08-28