Fall Teaching: "Decolonizing (Digital) Humanities"

[Updated January 2022] 

I'm teaching a grad seminar on Digital Humanities this fall. I'm structuring most of the hands-on work around two Text Corpora I've been developing, one on African American Literature, and the other on Colonial South Asian Literature

If the Canon has been the defining structure of traditional literary studies, in the DH framework the starting point is the Corpus. You can do a lot with a group of texts structured this way -- from Text Analysis, to Natural Language Processing, to thinking about Archives and Editions. As with the Canon, the questions you can ask and the knowledge you can produce are strongly determined by what's included or excluded from the Corpus. 

Course Description (short version): 

This course introduces students to the emerging field of digital humanities scholarship with an emphasis on social justice-oriented projects and practices. The course will begin with a pair of foundational units that aim to define digital humanities as a field, and also to frame what’s at stake. What are the Humanities and why do they matter in the 21st century? How might the advent of digital humanities methods impact how we read and interpret literary texts? Some topics we’ll consider include: Quantifying the Canon, Race, Empire & Gender in Digital Archives, and an introduction to Corpus Text Analysis. Along the way, we’ll explore specific Digital Humanities projects that exemplify those areas, and play and learn with digital tools and do some basic coding. The final weeks of the course will be devoted to collaborative, student-driven projects. No programming or web development experience is necessary, but a willingness to experiment and ‘break things’ is essential to the learning process envisioned in this course.


August 25


Matthew Kirschenbaum, “What Is Digital Humanities and What’s It Doing in English Departments?”

Roopika Risam, “Introduction: the Postcolonial Digital Record” (from New Digital Worlds)

Keywords: Digital Humanities, Postcolonial Studies, Postcolonial Digital Humanities (Risam), “Digital Canonical Humanities” (Risam)

Example in class (in support of Risam’s point about Digital Canonical Humanities). Compare the Charles Chesnutt Archive (http://chesnuttarchive.org) with the Walt Whitman Archive (http://whitmanarchive.org).

Getting our feet wet at home (20-30 minutes): Google Ngram viewer. Set for “English Fiction.” Recommend “Smoothing” set to 0.


Try: “Mars,moon,rocket,stars” (testing emergence of science fiction. Make sure Corpus is set to FICTION.)

Try: “colored girl,black girl,black woman,colored woman”

Try: “queer, homosexual, lesbian” 

Try: "working father, working mother" (for this one, turn the corpus to "English" -- includes non-fiction writing. try limiting 1900-2019)

Try: "lady doctor, female doctor, woman doctor" (again, using the corpus "English" -- try from 1850-2019)

Try: "public humanities, digital humanities, medical humanities, environmental humanities" (settings: 2000-2019; English corpus; smoothing set to 0)

Devise your own queries. Think carefully about setting parameters to help the data make a compelling point about how language use is changing/has changed. Get screen captures & share with the class on CourseSite Forum.  

Also find out & discuss: What is an Ngram? (You could start at the link below, not Wikipedia -- which has a definition that is far more technical than we need.)


What do you think might be the potential of this kind of analysis for our course? What might some limitations be?

August 27


Risam, “Chapter One: The Stakes of Postcolonial Digital Humanities”

Ted Underwood, “Preface: the Curve of the Literary Horizon” from Distant Horizons

Keywords: Quantitative vs. Digital; Distant Reading vs. Close Reading; “Slaughterhouse of Literature”/”Great Unread” 

Getting our feet wet with a Corpus of African American literature created by Deep (20-30 minutes exercise):

1. Download all of these files: 


Download the corpus onto your computer. 

2. Here is an explanation I put together explaining what this corpus is and how it was constructed: 


Believe it or not, this collection of texts is -- I believe -- the largest dedicated open-access collection of texts by Black writers freely available on the internet.   


3. Open the Metadata spreadsheet file in Excel or Google Sheets. You can find it at the Google Drive link above. 

Use "Keywords" to find some texts that look like they might be on topics of interest to you. 

4. Do some searches through the corpus. Try searching for: 

Liberia; Lynching (or Lynch, Lynched); Jamaica; India; Mulatto; Detective; Explorer; Passing; Obi (or Obeah -- an Afro-Caribbean syncretic religious practice); Hayti or Haiti

Sometimes the hits that come up might lead us to want to actually read certain books to find out more about how those themes are explored. 

There are more advanced things we can do with search (especially looking for “collocate” words -- pairs of words that show up together); we’ll play with these shortly as well using Voyant Tools.

5. What jumps out as you look at these materials for the first time? Any surprises? Any patterns you notice so far? 

Perhaps: Skim a few pages of a text at random -- does anything jump out at you? How does the author identify (or not) the race of their characters? Find any interesting passages? 

More broadly, at this early phase, what do you think we could do with a corpus of texts by African American authors? What questions might we want to *ask*? 

September 1

Politics & Terminology in Literary Studies 

M.H. Abrams, “Canon of Literature” from A Glossary of Literary Terms 

Other Keywords Entries (read a selection according to interest): “Black Arts Movement,” “Feminist Criticism,” “Harlem Renaissance,” “New Criticism,” “New Historicism,” “Periods of American Literature,” “Periods of English Literature,” “Postcolonial Studies,” “Queer Theory” 

(In order to understand what quantitative scholars like Ted Underwood are aiming for with studies of genre using machine-learning algorithms [next session], we need to have a sense of how the Anglo-American Canon was understood, and also how literary periods have worked in English departments for the past 50 or so years.)


September 3

Digital Humanities and Literary History

Underwood, Chapter 1, “Do We Understand the Outlines of Literary History?” (From Distant Horizons)

Franco Moretti, “Graphs,” from Graphs, Maps, Trees (2007. On CourseSite)

Homework: Play with Voyant-Tools. For this exercise, let’s look at a second Text Corpus: Colonial South Asian Literature. Try some of the different tools: what can you learn? What kinds of corpora does it make sense to use for different tools? What questions do these tools answer?

September 8

Digital Humanities--Canonicity

Amy Earhart, “Can Information Be Unfettered? Race and the New Digital Humanities Canon”


Stephanie P. Browner, “Digital Humanities and the Study of Race and Ethnicity”


Underwood, Chapter 2 “The Life Spans of Genres” (from Distant Horizons)

On your own: New tool to explore: AntConc (downloadable software)


September 10

Quantifying the Expanding Canon

Studying Anthologies: Lehigh grad student Adam Heidebrink-Bruno’s work on American modernism. Zoom visit from Adam himself.

Open Syllabus Project: Who is being taught?

The Open Syllabus project is a collection of more than 6,000,000 9,000,000 syllabi from English-medium colleges and universities around the world. The team that created the project used Crawlers to find syllabi on the open web & incorporated what they found into their database. (Projects like this are a strong argument for uploading one's syllabi to the web. They can become part of the "digital cultural record." Incidentally, this project is still ongoing, so it makes sense to keep uploading syllabi for the next iteration of their project. You can slo send them syllabi manually, as email attachments.) 

Assignment: Do test queries on http://opensyllabus.org

African-American authors? Latinx authors? LGBTQ+ authors? Postcolonial authors? How would we quantify the results? How might we visualize them?

Pick a topic -- a grouping of authors that interests you. It could be a historical period (i.e. Romantic poets; Gothic novelists; Modernists, etc), or something else. Do queries on Open Syllabus to see how often they're being taught. 


One option for analyzing this data: Using either Microsoft Excel or Google Sheets, make a small spreadsheet with the results of your queries. See if you can make a simple graph of some kind to illustrate a point. (For example: One of you showed that Virginia Woolf shows up higher on Google Ngrams than does James Joyce. What does Open Syllabus show us about the same pairing? How often are women Romantic poets taught vs. men? Women modernists vs. men? Black early 20th century writers vs. their White peers?)  


If you like you can also drill down a bit and go from the author to individual texts by that author. So if you search Toni Morrison (and get 7900 hits from their database), click on her name and they'll show you the books by her that are taught most often. Click on one of the novel names ("Beloved") and they'll give you much more specific data. If you see anything interesting here, you can use that for your forum post. (One cool feature is that they give you "Taught most often with...") 

September 15

Hands-on project workshop: Playing with data -- either from the Corpora I posted on CourseSite or from other corpora you can find online. 

(If there’s a particular topical corpus -- say, Detective Fiction or Science Fiction -- you’re looking for, you could start by Googling it. But also feel free to ask me.)

I also recommend you read this primer for working with plain text files & getting started with processing those texts to make them useful:


September 17

Workshop continued.

Short analysis with data due: September 20 

September 22

Race and the Digital Humanities 1

Kim Gallon, “Making a Case for the Black Digital Humanities” (2016)

Safiya Umoja Noble, “Towards a Critical Black Digital Humanities” (2019)


September 24

Race and the Digital Humanities 2: Algorithms of Oppression

Noble, Safiya Umoja. Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press, 2018, doi: 10.2307/j.ctt1pwt9w5.

Noble, Algorithms of Oppression: Introduction
Noble, Algorithms of Oppression Chapter 1 

Risam, “What Passes for Human?” (2019) (Bringing the kinds of questions Noble asks to AI, Facial recognition, robotics)



September 29

Slavery and the Archive 1

Jessica Marie Johnson, “Markup Bodies: Black [Life] Studies and Slavery [Death] Studies at the Digital Crossroads” (2017) (CourseSite)

Gabrielle Foreman, “Writing About ‘Slavery’? This Might Help” (brief document with tips and dos & don’ts)


Colored Conventions Project


Hands-on work on creating custom maps: Possibly: using Named Entity Recognition to get Names and Maps from our African American Literature Corpus. [Can also just use "Dreamscape" in Voyant-Tools to get place-names & maps. Tool is limited / imperfect, but a lot easier than doing NER in other ways.] 

October 1

Slavery and the Archive 2: Jamaica

Vincent Brown, “A Slave Revolt in Jamaica”


Readings from Vincent Brown, Tacky’s Revolt (2020): “Prologue,” “Chapter 2: The Jamaica Garrison,” “Chapter 4: Tacky’s Revolt” 

October 6

Slavery and the Archive 3:

1. Getting our feet with a newspaper archive. 

Let's explore the African American Newspapers Series 1: 1827-1998 database. You'll need to log in through Lehigh’s library website using your Lehigh account credentials.



Try some sample queries, perhaps related to any topic you might be interested in -- abolition, emancipation, reconstruction, Lynch Law (or lynchings), "Mulattoes," Black soldiers in World War I, Black settlers in the western U.S., Liberian colonization schemes, etc.

Or: Since we've just been talking about Jamaica, you could try: Maroon, Coromantee, Claude McKay, Black Star Liner, Marcus Garvey, Obeah... (you won't find very much if you just look for Tacky, I don't think)

(Some of you are interested in Latinx and Caribbean stuff -- try searching for other Caribbean countries and contexts, including Cuba, Trinidad, the DR, or Puerto Rico, to see if anything interesting comes up?)

Or: Try some specific name-based queries for authors we've encountered: William Wells Brown, Charles Chesnutt, Frances Harper (sometimes Frances Ellen Watkins Harper), Pauline Hopkins, Oscar Micheaux, etc. 

2. Extracting a useful text. 

Pick an article or articles that you think others might want to read; let's imagine we're working on a digital archive related to a particular topic. Use the "Download" button in Newsbank to download it as a PDF. 

The PDFs on Newsbank are not 'machine-readable' -- they haven't had Optical Character Recognition (OCR) run on them. So let's try doing that ourselves. 

How you do the OCR is up to you. Google Drive has a built-in OCR program that runs when you upload a PDF and try to open it using Google Docs. Just upload the PDF and right-click to try and open it in Google Docs. It will run the OCR for you (sadly, when I tried it today with something, it wasn't very accurate at all). 

You can also try downloading software that can run OCR for you. The best OCR software I know of is called ABBYY Finereader, but it's not free after 30 days (or 100 pages). 

There are also free software packages out there, many of them using the Tesseract OCR engine. One I have tried is this one:


Which method is the most accurate for the text you've selected? You might find that anything you try ends up being so inaccurate there's no point in using OCR at all -- easier and faster just to re-type whatever text you have. 

3. Write up a short narrative (1-2 paragraph) telling us about the topic you researched and the process(es) you used to try OCR. Did you learn anything interesting with respect to how to search the database effectively? Did you learn anything interesting from the *content* of what you read in individual articles? For the article(s) you picked to OCR, give us the date and source so we can track down the original as well. Finally, please paste as clean a version of the article you've selected as you can to this forum. 

Digital Archives, Editions, Collections

Earhart, “The Era of the Archive” (Traces of the Old, Uses of the New, Chapter 2). Keywords: New Historicism; Digital Archive vs. Digital Edition

Kenneth M. Price, “Edition, Project, Database, Archive, Thematic Research Collection: What's in a Name?”

Risam, Chapter 2 of New Digital Worlds. “Colonial Violence and the Postcolonial Digital Archive” 


October 13

Analog Archives: What Are Archives For?

Terry Cook, “Evidence, Memory, Identity, and Community: Four Shifting Archival Paradigms” (2013) 

Kate Thiemer, “Archives in Context and As Context” (2013)


(An analog archivist questions the way Digital Humanities scholars use the word “archive”; she posits “collection” might be more appropriate)


October 15

Digital Editions: Hands-on/Collaborative/Student-driven

Workshop for Second Project: Constructing a Basic Digital Edition in Scalar. Hands-on Introduction to the Scalar platform & Lehigh's Instance of Scalar.


Possible sources for producing Digital Editions/Collections in Scalar: African American Text Corpus, Colonial South Asian Literature

Student Output: a collaborative, minimalist edition of the play "Dessalines," by William Edgar Easton: 


Sourced from a PDF available on HathiTrust.

More student outputs: 


October 20

Students work collaboratively on building a Digital Edition of a text in Scalar, with introductory essay, notes, other relevant materials. More info. TBA.

Project Due Sunday October 25.

October 22

Digital Media Studies 1: Twitter -- Hashtag Activism

Jackson, Sarah J, Moya Bailey, and Brooke Foucault Welles. #Hashtag Activism: Networks of Race and Gender Justice, 2020. https://mitpress.mit.edu/books/hashtagactivism.

#Hashtag Activism, Introduction, “Making Race and Gender Politics on Twitter”

#Hashtag Activism, Chapter 5: “From Ferguson to #FalconHeights: The Networked Case for Black Lives”

October 27

Digital Media Studies 2: Twitter; Scraping

Marcia Chatelain, “Is Twitter Any Place for a [Black Academic] Lady?” [focus on “#FergusonSyllabus and academic expectations/culture] (2019)


Hands-on work: Scraping hashtags on Twitter. Possibly using Python (will demonstrate how to do this)

October 29

Digital Media Studies 3: Instagram & Twitter

There are two ways to gather data from social media. 

One is of course to simply search for hashtags, keywords, or users you're interested in, and then capture & collect material that way. You can do close readings of individual posts or images, or "medium" level readings, where you might look at patterns within small sets of materials collected. The other is to gather data at bulk using scraping (the interpretation of which, using visualizations and other data analysis techniques, might be called "distant reading"). 

Two choices -- either work with Twitter or Instagram* [update: see below. Instagram searching is blocked right now & may not be usable] for this project. The goal is to create either a collection of data through scraping (preferably in .CSV or Excel format) or to do some manual searches, make a small collection that you think tells a story, and then present that to the class via the Forums.

Choice A: Twitter

Asking *you* to scrape Twitter is a little beyond our abilities for this class. Twitter has technically banned public scraping, so you need to do it through their "API Developers" system. However, I was recently approved as a "Developer," so I can do the simple scrape for you & send you a file (it should only take a few minutes on my end). It will of course be up to you to clean up the data & find ways to interpret it. 

If you're interested in scraping Hashtags from Twitter, please send me the Hashtag(s) you're interested in getting data for & the extent of the data. I will send you a messy file in the .JSON format (this is what Twitter gives you). It has way more data than you probably want. To make it useful, you need to run it through a "Parser." There are online Parsers that help you convert .JSON files to .CSV files. 

In addition to processing data, for this choice, write up a paragraph or two to explain what it is you were looking for and what you think the data shows. 

Choice B. Hashtag searches on Instagram or Facebook

[NOTE: Instagram has blocked regular hashtag searches on its platform to discourage the spread of misinformation related to the upcoming election. However, Deep has access to Instagram through the CrowdTangle platform, so if you would like to work with Instagram data, please let me know which hashtags, users, or topics, and I should be able to extract some data for you.]

There are various ways to pull hashtag data from Instagram. 

As we discussed in class with Lehigh post-doc Amanda Greene, you don't necessarily need to do anything fancy here -- just searching for hashtags that interest you and studying them formally (including both visual rhetoric and textual captions) might provide enough material to do an interesting short reading.  

Another approach that could work might be to analyze the output of a single user you find interesting. For instance, in the past I have been interested in the Instagram feed of the Instapoet Rupi Kaur. I used a scraping tool to gather 800 of her Instagram posts with captions and interaction rates. I was able to create graphs showing the rise (and recently, fall) in engagement with her Instagram account over several years. 

Either use one of the above scrapers to create a CSV (Excel/Sheets) file with data, or do some manual hashtag searches and create a small collection. Write up a paragraph telling us what you did and what you learned. What story does it tell? What else could be done if you had more time to dig deeper? 

November 3

Digital Media Studies 4: Instagram -- InstaPoetry.

Lili Paquet, “Selfie Help: The Multimodal Appeal of Instagram Poetry” (2019) (CourseSite)

Instapoets: Rupi Kaur, others

Possibly: Analyzing our scraped data using Sentiment Analysis:



November 5

Intersectional Data Feminism 1

Lauren Klein and Catherine D’Ignazio, Data Feminism:
Introduction: “Why Data Science Needs Feminism”
Chapter 1: “The Power Chapter”

November 10

Intersectional Data Feminism 2

Klein and D’Ignazio, Data Feminism:
Chapter 3: “On Rational, Scientific, Objective Viewpoints from Mythical, Imaginary, Impossible Standpoints”
Chapter 4: “What Gets Counted Counts”
Hands-on work: To be announced

November 12

Intersectional Data Feminism 3

Ted Underwood, Chapter 4 of Distant Horizons: “Metamorphoses of Gender” 

Hands-on work: Can we replicate some of Underwood’s analyses? Also, can we apply some of this to the African American Literature Text Corpus or the Colonial South Asian Literature Corpus? Do texts by black and brown writers engage with gender the same way? Are there variations in the pattern? 

November 17

Digital Humanities Pedagogy

Roopika Risam, New Digital Worlds. Chapter 4.

Explore some of the tools Risam mentions. 

Hands-on exercise. Look at the Google Drive Folder I sent you with a collection of 50+ Digital Humanities syllabi. Fill out the entries on the following Google Sheets file to annotate the entries that are assigned to you:


November 19

Digital Humanities Pedagogy

Stefan Sinclair and Geoffrey Rockwell, “Teaching Computer-Assisted Text Analysis: Approaches to Learning New Methodologies” (from Digital Humanities Pedagogy)

Olin Bjork, “Digital Humanities and the First-Year Writing Course” (from Digital Humanities Pedagogy)

November 24-26

Thanksgiving Week (nothing scheduled)

December 1

(Fully Remote) Workshop: Final projects

December 3

(Fully Remote) Workshop: Final projects + Semester Wrap-up