A recent post on distributed file systems got me thinking about their use in academic environments. Found via Slashdot, the article was about Cleversafe, a distributed file system licensed under the GPL.
Libraries would be a perfect environment for distributed storage. Obviously, we always need space. We’ve got hundreds of computers which are usually under low load. And most have several gigabytes free.
All that’s needed is a server running on the public PCs that doesn’t interfere with normal use.
Wikipedia has some additional links on Distributed parallel fault tolerant file systems. Another free distributed fault tolerant filesystem is Hadoop, which seems to be vaguely based on GoogleFS.
Err, but isn’t LOCKSS a library distributed file system, in essence?
http://www.lockss.org/
Well, I’ve never used LOCKSS, but from what I understand it’s designed for journals. With a generic file system, anything could be stored. We’ve got a lot of digital collections that need image or media storage space, and I don’t think LOCKSS could handle it. LOCKSS is designed with journals in mind, with specific tools built for that purpose, i.e. crawlers and access-control.
Some of the ideas in LOCKSS could be useful for a generic distributed file-system. For example, a hierarchical distributed file system could be created that not only distributed files among local machines, but also coordinated with machines at other institutions. You could set aside a specific amount of space for outside institutions. Smaller organizations could then use space that larger ones have available.