Making 10M government PDF documents searchable – FlowingData

Government organizations love to distribute documents as PDF files. They are easy to forward and to print. The problem is when you want to find and access them later among millions of other files. GovScape, a research project between the University of Washington and Boston University, provides a search interface through the End of Term Web Archive’s 2020 crawl.

The code for GovScape is open source and available on GitHub. I have a feeling such a tool will grow more important going forward.

Source link

spot_imgspot_img

Subscribe

Related articles

Snack Friday Sale + the Cutest Foodie Gift Guide Ever (shop small!)

This holiday season… we’re shopping small, buying thoughtful gifts...

Sphere Entertainment Shares Up 11% as Music Stocks Have Winning Week

Sphere Entertainment Co. shares jumped 11.2% to $76.04 in...

European shares clock monthly gains on Fed rate cut hopes

European shares ended higher on Friday, capping a strong...

European markets end higher; Delivery Hero shares jump almost 15%

LONDON — European stocks ended Friday in positive territory...

Buttery Cheddar Pecan Crackers – Joy the Baker

Welcome, friends, officially to the holiday season. The big...
spot_imgspot_img