Matthias Razum (FIZ Karlsruhe) gave A Closer Look at Fedora's Ingest Performance. The group used vanilla hardware with single processor and 2 GB RAM to look at ingest speeds and optimization options. The ingest consisted of about 4.9 million objects/500 million triples (PDFs from the patent database, which took 3 weeks to ingest) CPU was not really the limiting factor, it was I/O. There was no difference from JDK 1.5 to 1.6. There was no real difference between the various triplestores or no triplestore, meaning that using triples does not add significant overhead. The most promising areas of optimization were with Postgres tuning - they switched off Postgres's ability to respond when the machine goes down during an operation. This resulted in a highly significant change in ingest rates (130ish ms compared to 40ish ms). With MySQL tuning the InnoDB/MyISAM tables resulted in similar levels of performance improvement. Putting the DB on a separate machine, even with network overhead had a significant improvement as well. Other findings: there was absolutely no impact with a growing number of objects indicating the scalability of Fedora; combination of I/O (re database) and other tuning can see an improvement factor of 4. Another thing the group has considered is creating a number of Fedora instances and then merging the indexes later. Dan Davis provided an update on the work with Sun and highlighted the conclusions of the Karlsruhe work. They will be using the open source Grinder app to create a testbed for ongoing work in this area.
Gert Schmeltz Pedersen (Technical University of Denmark) spoke about Fedora and GSearch in a Research Project about Integrated Search. Gert looked at integrating multiple Fedora/GSearch implementations in a federated search kind of opportunity. Zoned on this one - to much data on little slides.
Tom Cramer (Stanford University), Richard Green (University of Hull), Lynn McRae (Stanford University), Tim Sigmon (University of Virginia), Ross Wayland (University of Virginia) presented on Case Studies in Repository Workflows: Three Approaches. This is critical stuff for the Fedora community - workflows are what it is all about and I think will an area of major activity for the next couple of years. One nice thing about the new Hydra project is the intention to build a standard workflow tool and the fact that the 3 partners are each using a different framework for building workflows means they have a greater chance of coming up with something cool. See: being different is good :-)
Recent Comments