P2P Networking

Next: Practical Solutions Up: Background Previous: Metadata Contents

P2P Networking

There are probably as many definitions of what constitutes a p2p network as there are networks themselves. It would be unwise to follow a very narrow definition, since the underlying network isn't our primary concern. Therefore I've chosen the following definition as a basis for our understanding of the term.

P2P is a class of applications that takes advantage of resources - storage, cycles, content, human presence - available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses, P2P nodes must operate outside the DNS system and have significant or total autonomy from central servers.

(5)

From this definition it becomes clear that using resources at the edges of the Internet gives rise to new problems. Present day Internet protocols were not designed with such a high level of decentralization in mind. Different schemes have been tried to avoid the problems inherent to utilizing edge resources. Since our primary concern is with the optimization of information searches I will restrict my attention to networks that are concerned with content. The most important problem in this field is scaling the bandwidth needs when the network scales. Truly decentralized networks retransmit requests from peer to peer. Since every incoming message is generally rebroadcast to at least two other nodes this leads to an exponential growth in bandwidth demand for a linear growth in connected nodes. After the shutdown of the largely client/server based Napster network many users switched to the fully decentralized gNutella(6) network, resulting in a total breakdown of the network within weeks (7). Since then the network has been altered significantly and continues to grow (8).

Unfortunately little theoretical research has up until now been done in this field. Since network structures can easily be modeled using the formal systems found in fundamental computer science and mathematics one might expect implementations based on such research to be much more efficient than the applications we use today. As a primer, a mathematical analysis of the gNutella network has been undertaken (9). From the results obtained follows that such simple implementations cannot be scaled to potentially millions of nodes. Empirical data obtained by monitoring the gNutella network does give openings for more efficient implementations. One example of such research identifies substructures in the global net (10). Knowledge of a network's structure might lead to search algorithms that exploit the specific substructures to increase overall system throughput and decrease bandwidth usage.

Optimizing search results has a direct connection to the given problem. In an ideal situation all queries are handled in real-time, returning results based on all known resources within a very short timespan. In a fully decentralized network the exponential growth of messages sent between peers can not only seriously degrade overall network performance, it usually leads to a relatively long waiting time before a response can even be generated. Furthermore, it is not certain when a response is obtained whether this response is based on all the resources in the network. This uncertainty follows from measures imposed on the protocol to prevent loops in the message system and the general instability of the network configuration.

Increasing the quality of search results by making extensive references to metadata, as I have implemented, will always induce extra computational overhead. Although future networks might be more efficient in terms of network bandwidth utilization and processing speed, we cannot rely on this for now. How a possibly complex metadata processing layer on top of the network is to be implemented is therefore a second question we will have to ask ourselves. One solution can be caching results as is discussed by (11), but this can seriously reduce the effectiveness of searches when implemented incorrectly.

Next: Practical Solutions Up: Background Previous: Metadata Contents

2002-08-28