next up previous contents
Next: Networking Module Up: Indexing Module Previous: framework   Contents

Plug-in architecture

The plug-in architecture used is a very straightforward one. All plug-ins are libraries that can be loaded and released at runtime. After having loaded a library the framework simply executes one of two possible functions. The first function returns an XML datastructure while the second simply returns XML formatted data as a text string. Having separate functions allows the plug-in develop to choose if he wants to work with the Microsoft XML Parser or not.

Since this is a testprogram only a few plug-ins have been created. The most simple one reads data from the filesystem and creates an entry in a subtree that models the filesystem. The second plug-in uses file content to identify the filetype and creates an entry in a MIME based hierarchy. This module is currently a simple wrapper around a function exported by the $Microsoft$ $Internet$ $Explorer^{tm}$. It identifies some 30 different types. A third plug-in is created specifically for reading the proprietary metadata format often used in .mp3 audio files called ID3. This plug-in does not add items to the hierarchy, but simply alters and extends existing entries.

The last plug-in best demonstrates the strength of the system. This plug-in creates keywords from the already added information and with these keywords queries an internet directory. The directory queried is currently the google directory (directory.google.com) that is equivalent to the open directory project. From the answer a local version of the directory is created containing the processed files. This is where automatic categorizing becomes difficult. The results given by the website is dependent on the keywords we selected and on the results of the earlier work overall. The test results, including this categorizing, is discussed in the next section. Whatever the quality of the results obtained with our system it is probable that it can be improved when domain specific directories are also incorporated for areas where these exists.


next up previous contents
Next: Networking Module Up: Indexing Module Previous: framework   Contents
2002-08-28