Wednesday, May 10, 2017

Over 200 terabytes of the government web archived! | Internet Archive Blogs

On a much smaller but still significant scale, see Chicago mayor Emanuel posts EPA’s deleted climate change page (Politico)

"In our December post, “Preserving U.S. Government Websites and Data as the Obama Term Ends,” we described our participation in the End of Term Web Archive project to preserve federal government websites and data at times of administration changes. We wanted to give a quick update on the project — we have archived a heck of a lot of data!

Between Fall 2016 and Spring 2017, the Internet Archive archived over 200 terabytes of government websites and data. This includes over 100TB of public websites and over 100TB of public data from federal FTP file servers totaling, together, over 350 million URLs/files. This includes over 70 million html pages, over 40 million PDFs and, towards the other end of the spectrum and for semantic web aficionados, 8 files of the text/turtle mime type. Other End of Term partners have also been vigorously preserving websites and data from the .gov/.mil web domains."
Over 200 terabytes of the government web archived! | Internet Archive Blogs
Post a Comment