To be sure our program resembles that what was intended I'll first repeat the research goal as it was written down in the design document:
Eliminating the differences in database and filesystem access may be the key in allowing more useful sharing of information using digital networks such as the internet as well as in making a local computer system more flexible and maintainable.
In this project, I plan to research the usability of the distributed part of this idea, using SQL based databases as the underlying datasources. The reason for using rDBMSs is the well thought out interface, based on a question and answer scheme. Instead of using binary large object fields containing data inside the database, only links to files are used, thus disabling part of the flexibility of the system. The local computer system is not altered, saving considerable time.The formulation differs somewhat from that used in the introduction of this document, but I believe the central issue has remained the same throughout the research process.
I will refrain from copying the complete design document here. The full text can be found in the first appendix. To demonstrate how I planned to obtain my test results it is important, however, to identify the original design goals.
The testprogram was designed to consist of four separate modules: The datasource module, a driver accessing the underlying database management system; the networking module, containing all code necessary to communicate with peers; the spidering module, used to index information and add it to the database and the user interface module, handling user requests by calling the other modules' methods.
The module that influences test outcomes the most is of course the indexing module, since all metadata is gathered by this module. From the first stage of design, I believed a plug-in architecture was necessary here. This need comes from the diversity found in possible input data. Separating the general purpose application from the specialized indexing code allows third party modules to extend product functionality and allows competition amongst product developers in real world deployment. Many complex applications allow extensions to their system to accommodate specific user needs. Examples are Adobe Photoshop, Autodesk Autocad and Microsoft Visual Studio.
Data fed into the system has to be processed by the indexing module to create appropriate metadata. This information is then added to the database through calls to the datasource module as an intermediate. The need for intermediate code comes from the idea that it should be possible to change the back-end, the database, without having to alter the main code.
Queries are generated by a user interface module. Keeping in mind the connection with the underlying database, I suggested that SQL should be used as the query language. For testing purposes, both a commandline and a graphical version of the user interface had to be constructed. This way testing could be underway before a user friendly interface was ready.
Queries are then processed and rebroadcast by the networking module. The networking module processes queries by interpreting incoming packages and calling appropriate datasource methods. The network itself would consist of identical peers with equal rights, leading to a gNutella like fully distributed network, with all the problems related to such naively implemented networks. Please remember, though, that network issues are not our primary concern right now.
All in all, the complete division of the application into modules should allow last minute alterations and extensions to the system. This would enable us to quickly react to obtained test results. How all this should be implemented was deliberately left out of the design document. Only the use of a relational database management system as back-end, the use of TCP/IP as underlying network environment and Microsoft Visual C++ as development environment was specified.