An interesting case study in large-scale content-based collaboration
"To make all the information accessible to more than 400 journalists, Cabra said the files were uploaded to Amazon, a lengthy process, but not as time-consuming as sorting data into searchable formats.
All the software used was open source, tweaked to suit the reporters’ needs. The search tool, allowing reporters to hunt for names like Putin or places like the British Virgin Isles, was based on Apache Solr, used by a large number of search-heavy organizations, including DuckDuckGo, a privacy-focused tool. Solr was combined with Apache’s Tika, an indexing software that can parse different file types, be they PDFs or emails as in the Panama Papers, drawing out the text from the non-essential data. Layered on top was the shiny interface, built using Blacklight, another open source development.
To understand what they were looking at, the reporters could use integrated data visualization, using a mix of graph database tech Neo4j with Linkurious to make the job of making connections between files easier."From Encrypted Drives To Amazon's Cloud -- The Amazing Flight Of The Panama Papers - Forbes