Monday, July 10, 2006

The Internet as a Platform for Storing and “Merging” Data

We interrupt this ramble about a day in the life of Dr. Susan Snow, epidemiologist of Newtopia, to muse about some of the differences between current computer processing, and a totally Internet-based document and data storage system. First some assumptions:

1. For use in epidemiology (and many other fields), users would have to be authenticated more surely and more conveniently than with the present password gesture, and authentication would have to continue during computer use. Preferably a picture of an impostor would be stored for purposes of prosecution.
2. An Internet provider such as Google would offer to organizations and individuals secure storage of data and documents and interfaces for manipulating the contents—an extension of the already existing Gmail and Google Spreadsheet concepts.
3. Data, once stored, would never be discarded, although revisions would also be stored and applied as needed through an indexing process.

If these bits of magic are granted, then Dr. Snow will be able to work completely from a browser, whether in a computer or a more mobile, cell-phone like device, and to access her work from anywhere in the world that is sufficiently well connected to the Internet. Her data (meaning data and documents from now on), reside in something like the Google server farm, the hundred-thousand-plus microcomputers linked in parallel that underlie the Google web pages. A user has no way of knowing where his or her data are stored, but, at the right time, the Google database engine can find any particular item and deliver it for use, with due attention to its ownership, provision for backup, virus exclusion, etc.

Suppose then, that the counties in Newtopia wish to share their data with the Newtopia Department of Health (NDH). In the old days (2006), this required either a central server at NDH, with connection requirements and considerable inflexibility for the individual counties, or the counties each had to send data periodically to NDH for merging with the central database. There was very little tolerance for variation in data format among the counties, and the weekly merging process often required manual intervention and telephone conversation by the participants.

With each dataset stored on the Internet server farm, however, merging can be merely a process of setting up pointers (with permission) so that the system knows the NDH database for week 24 to be the contributions of the counties in Newtopia for week 24 viewed as a whole, with possible overlays of correction from either NDH or a county, the latest of which prevails. Instead of shipping copies of files to NDH, each county provides only a weekly pointer to its database. Alternatively, the system can be set up so that NDH has permission to create a pointer to the county database; such details are left to future negotiation.

If you think this is wild and woolly, let’s follow the process to its ultimate conclusion, in which each individual in society has his or her own database on the Internet, and the database contains the person’s health records. For reasons ranging from voluntary participation to legal requirements, the individual grants NDH access to a portion of the health record. The records are combined by the Internet database engine into what looks like a single database, which is actually only an index to the relevant bits of data scattered throughout the personal databases of the citizens of NDH. Obviously, a great deal of social evolution, and whole new legal mechanisms for sharing data will have to be evolved before such a scheme could even be considered. The barriers, however, are more social, educational, legal, and political than they are technical.

Once you have the world’s data stored securely, combining portions of it no longer requires making copies—it is more a matter of manipulating indices (or is it indexes?) that allow the same data to be viewed many ways and to be used for many different purposes.


