The afternoon session featured John Kunze and Sayeed Choudhury talking on NSF DataNet: Curating Scientific Data. John started out with a number of examples of big data examples from the climate change arena. The 4 challenges he highlighted re this domain are:
- Dispersed Sources - agencies, data centres, individuals
- Diversity of Data Types
- Poor Practice
- Data Loss
He also described the DataONE project, which is designed to provide access to data about life on earth. The project will look at data types from biological and environmental domains and it looks like they will join substantial existing research groups into one data curation context. "Data is like software, but even more sophisticated" was one quote to illustrate the complexities of the data curation challenges. John talked about the idea of digital preservation being about building an outcome, not a place with a "deadly embrace". He talked about their efforts to in essence build a repository using the simple and effective tools available to them at the operating system level. I may have missed something but it sounded suspiciously like they were kinda rebuilding Fedora? I'm sure there are good reasons for going where they did, but I would be interested to see an initiative like this consider working with something like Fedora to effect what they are looking for as an outcome: an open and flexible repository of research data. The benefit to the larger community would be considerable.
Sayeed talked about the Data Conservancy project, which I unfortunately missed :-(
Maybe you've already given the reason for the attempt to build a repository "using the simple and effective tools available to them at the operating system level": The benefit of the larger community.
To be honest, I have no idea about the efforts at CDL - but just as I'm always sceptical when the academic web strays from the mainstream web, I'm also sceptical about repository systems differing too much from data storage needs of the rest of the world (think file systems or jsr 170).
Posted by: robert forkel | May 19, 2009 at 04:54 AM