March 4, 2009
Digitization gets a boost via reCAPTCHA
Luis von Ahn . . . helped develop the first captchas in 2000. Apparently, he had a revelation a few years later while sitting on a plane. . .Full article here, which also includes a nice bit on the digitization project at the University of Toronto, overseen by our friend Jonathan Bengtson:He realized that he had unwittingly created a system that was frittering away, in ten-second increments, millions of hours of a most precious resource: human brain cycles.
With the help of a MacArthur "genius" grant, von Ahn set out to make amends. Now a growing number of websites, from e-commerce (Ticketmaster) to social networking (Facebook) to blogging (Wordpress), have implemented the precocious professor's new tool, dubbed recaptcha. If you've visited those sites, your squiggly-letter-reading ability has been harnessed for a massive project that aims to scan and make freely available every out-of-copyright book in the world, by deciphering words from old texts that have stumped scanning software.
U of T is currently adding about 1,500 books a week -- and at that rate there's no need to be choosy about which ones to scan. "It's a real beast to feed, actually," says Jonathan Bengtson, the librarian who oversees the university's role. Entire subject areas are scanned by sorting for pre-1923 works (in accordance with US copyright laws), eliminating duplicates, and taking everything that's left. Scholars from around the world can also request books for ten cents a page, and typically see them online in less than twenty-four hours.Hat tip to Mirabilis.The most popular Toronto contribution, Juszel reports, is a 1475 edition of St. Augustine's De civitate Dei, downloaded a baffling 75,911 times (at press time). . .
For the newer books, OCR is about 90 percent accurate. But that success rate drops to as low as 60 percent for older texts, which often contain fonts that are blurry and less uniform. These troublesome scans are sent on to the reCAPTCHA servers at Carnegie Mellon University in Pittsburgh.
Cologne city archives update
More on yesterday's disaster, via Der Spiegel:
"It's an inconceivable loss," Eberhard Illner, a former archivist for the city, told the Kölner Stadt-Anzeiger newspaper. "It's a catastrophe, not just for the city of Cologne but for the history of Europe". . .Almost nothing of all this photographed or otherwise backed up, of course. The loss is all the greater given how Cologne was so thoroughly destroyed during WW2; now the main record of what Cologne once was is gone, too.The archive's collection of original documents included thousands from Cologne's golden age. The founding charter of the University of Cologne, signed in 1388, was inside, along with the documents that established Cologne as a free imperial city under Emperor Friedrich III in 1475.
For historians trying to reconstruct the past, the greatest loss may be the more quotidian papers: Tens of thousands of receipts issued by the city government between 1350 and 1450, for example, or the 358 volumes of decisions and minutes of the Cologne City Council dating back 700 years. In total, the building had more than 18 kilometers of shelves.
The archives also contained the personal papers of almost 800 prominent German authors, politicians and composers, including Nobel Prize winner Heinrich Böll and Jacques Offenbach, a 19th century cellist and opera composer. Weimar Republic politician Wilhelm Marx and German-Jewish composer Ferdinand Hiller were among the other notables whose collections have been buried under tons of concrete. "These are fragile papers, that are now ground to dust," Illner told the daily.
And somewhere underneath the rubble lie the remains of 500,000 photographs of the city and its people, an irreplaceable visual record of life in Germany's fourth largest city. Likewise, more than 100,000 architectural drawings and plans may have been destroyed.
ADDENDUM: Where is the NY Times on this? One small AP article yesterday, titled "2 Are Missing After Cologne Building Collapses", with absolutely no mention in the following text even hinting at the significance of the loss.
FOR those with German, you may wish to keep up with this grim story via Archivalia.
March 3, 2009
Disaster in Cologne
In addition to the loss of life, today's catastrophic collapse of Cologne's municipal archives building has probably destroyed much of one of the most important collections of historical documents in Germany. Many stories about the collapse today; the AP story is here, with pictures. Deutsche Welle story here.
The Cologne city website's page on the archive is here; latest press release (in German) here. Immediate concern appears to be stabilization of the ruins, adjacent buildings, and a nearby construction crane, involving injection of concrete into underlying tunnels carved out for the city's subway system -- suggesting that initial denials that subway construction had anything to do with the collapse may well not hold up.