Digital Humanities : Humanities Data
What is Humanities Data?
Although it can feel strange to think of primary sources or works of art and literature as "data," humanities data refers to the materials and sources that your project will explore or exhibit through computational means. Data in the humanities can include text, images, sounds, maps, and 3D modeling, among other types.
To learn more about working with data in the humanities, check out:
Humanities Data: A Necessary Contradiction by Miriam Posner
Big? Smart? Clean? Messy? Data in the Humanities by Christof Schöch
Finding Humanities Data
These are some places to get started looking for humanities datasets:
Text Data
Text data can come from born digital materials such as novels in e-book form or from digitized materials. Text data from digitized materials is generated by a process called Optical Character Recognition (OCR) and the quality of this data can vary and can sometimes require cleaning.
Here are some sources of text data that can be used for text analysis projects:
- Project Gutenberg - Online library of free public domain eBooks
- Oxford Text Archive - A repository of full-text literary and linguistic resources
- HathiTrust - Digital library with a mix of public domain and copyrighted books. Through UW-Madison's membership, faculty, students, and staff have access to the full corpus for analysis in a data capsule.
Other licensed data sources for text analysis at UW-Madison Libraries:
- Gale Digital Scholar Lab - Digital humanities tool that allows you to build datasets of primary sources and analyze them.
- TDM Studio - Online text and data mining platform that allows you to analyze current and historical newspapers, dissertations and theses, scholarly journals, and primary sources from a variety of fields.
Image Data
Here are a few places to get started building an image dataset for analysis:
Maps and Geospatial Data
These sources can help you get started on finding and working with maps and GIS data:
Using Humanities Data
As you are collecting datasets for your project, it is important to think about the ways you intend to use them and whether the sources you are drawing from allow that use. Will you be able to analyze the data using the methods you wish to? Are you thinking of publishing the dataset on a website or other platform and if so, is that allowed?
Here are some things to consider?
- Is the source of your data open access?
- Is the source of your data licensed, and if so, what are the terms of the license? (Library databases are examples of a licensed source and you can read more on our Responsible Use of Electronic Resources page)
- Is the source of your data subject to copyright?
For more information on reuse of materials, see the Library's How to Use Others' Materials page by Carrie Nelson.