Starting with the most basic functionality, Napster and many of the programs influenced by it allowed only searching on filesystem information. FastTrack(12) is a more advanced descendant of this line of programs, filtering information from the filesystem into a private metadata structure and allowing users to manually add information. The main strongpoint is that metadata seems to be copied together with the main data, creating a persistent label and thereby unifying resources as much as possible. How this is implemented is unfortunately unclear, since no information is made available by the creators.
A completely different approach is taken by the Audiogalaxy application. Inheriting the client/server based query structure from Napster, it completely hides users from one another. The program installed on network edges is a very simple fileserver and -client in one, but all communication and searching is done through a central website. This use of a persistent webinterface not only shields individual users from each other, it also allows for a persistent query infrastructure separated from the client program. A central server remembers all data available on the network over a specified period of time and allows any of these files to be placed in a personal download queue. Only when both a supplier and a consumer of a specified resource become available is a transfer initiated. More importantly, no human interaction is necessary after a request is placed in the queue.
The Audiogalaxy network also features a very detailed structuring of the supplied data. Given the fact that only MP3 sound files are made accessible through the network this task is relatively easy. The high level of knowledge of artists, music styles and connections between them does lead to the assumption that a lot of manual labour has been put into creating and maintaining this information structure. Since no information is made public regarding the client application and the central interface there is no way of knowing to what extent automation in metadata gathering helped to create this system. The client/server approach makes this particular network very easy to use, but also extremely prone to legal attacks, especially since most of the information provided has a questionable legal status.
As mentioned earlier, most p2p applications are based on the Napster approach. Many alterations to the original program have been made, but most advances have gone into the network structure itself instead of in the information retrieval functionality. Taking into account the many problems having to do with the networklayer this cannot come as a surprise. After trying various popular clients the features mentioned above are the only ones that really improve searchability. Especially the Audiogalaxy interface works like a charm. Looking beyond p2p applications we might also find other possible enhancements, though.
The manual addition of metadata, as it exists in the FastTrack software, is not a feasible solution on its own. One technique used primarily in science, but also seen in similar form on public websites, namely peer moderation can improve the quality of metadata, regardless of whether it is manually or automatically gathered. An example I want to explicitly mention is a website harvesting technology related information. This website, Slashdot.org, uses a peer review system that is especially well adapted to categorizing and reviewing large quantities of relatively small messages. I personally know of no other system that has implemented this feature so effectively. One key issue when incorporating peer review is, however, that a heavily involved userbase is necessary.
Adding the techniques mentioned together would be a relatively simple way of increasing search effectiveness. However, there are a few fundamental problems that concerns all mentioned programs. First of all, each program uses a proprietary format for specifying data characteristics. More importantly, no general algorithms is given for finding metadata. Automatic metadata addition basically boils down to reading the filename. Apart from moderation, a case of human intervention, no integrity checking can be applied. As far as automated categorization is concerned, the few algorithms in use today only deal with very specific subsets of information, such as music files in a special format or webpages. Generally these algorithms have been developed behind closed doors, eliminating the possibility of interopting between programs and learning from previous mistakes.
I will now suggest a way to automatically add information to a datastructure in a flexible and extendible fashion. How far we can go with automation and whether it can be practically implemented is discussed later on.