Indexing Metadata

Next: Distributed Networking Up: Potential Improvements Previous: Potential Improvements Contents

Indexing Metadata

One of the issues that became obvious during testing was the lack of metadata persistence. For results to improve we have to be able to learn from our mistakes. The testprogram implementation uses a webclient interface for already specified reasons. A drawback to using this is that metadata has to be recreated at every instance of the network. Other implementations do not have this problem, since they have a proprietary interface. Metadata persistence is easily obtained by placing the metadata somewhere between the core data, but this will cripple many filetypes. Regardless of the advantages a webclient based system may have over platform specific interfaces, I'm afraid the lack of metadata persistence, and therefore the lack of metadata improvement, is such a serious drawback that, until an alternative is found, a specialized interface is necessary after all.

Unless many specialized plug-ins are created, metadata has to be gathered from filenames. In order to receive useful results without a plug-in for every type of file it is necessary to optimize the keyword selection algorithm. I expect the most promising idea will be based on a learning algorithm in some way. How this should be implemented is a question that can't be answered so easily. I believe this is a subject that should be examined more closely.

No indexing method can be proven to be successful, since input is always human created data and therefore inherently unreliable. Finding heuristics based on human computer interaction is, I think, the only way of solving this problem. I will give two such heuristics as example. First, exploit the fact that people order their data in a to us unknown yet most probably rational manner. Look at the directory structure and try to find congruent features of elements within certain subsets. The less elements in a directory, the more they usually will fit together in some way or another. Second, many people create, somewhat unconsciously perhaps, a personal encoding scheme for any subset of data. For instance all work related files are numbered with their project code, all music files start with the name of the artist or all files in the favorites directory reference webresources. These rules of thumb I've just given are by no means revolutionary nor are they the only ones to be found. They are simply examples of everyday practice one can learn from personal experience that can be used to broaden the way metadata input can be dealt with.

Next: Distributed Networking Up: Potential Improvements Previous: Potential Improvements Contents

2002-08-28