Text Processing with Regular Expressions (RegEx): a Digital Humanities Work-Flow for Beginners (no coding)

I wrote up the following as a primer for the students in my Digital Humanities seminar in 2020; updated in Spring 2025. If you have favorite RegEx commands and tips, I would welcome them in the comments or hit me up on Blue Sky. 


The most common use case for needing a bit of coding for people in literary studies – especially people working with digital collections and archives – is when we have to format texts. This is less glamorous work than working with fancy visualizations or maps, but it can be incredibly useful and time-saving in many different contexts. Some of these will also potentially translate into work skills outside of academia. 

For my own work, I frequently get messy files that have been scanned from old editions, and then OCRed. They need clean-up! 

Cleaning up a single 80-page collection of poetry by hand is not that big a deal, but we have been working with dozens of them. And a single 300 page novel can take hours if you don’t have any tools to speed it up. My rule of thumb is: when you find yourself doing the same repetitive task again and again hundreds of times, that’s something that ought to be do-able by a machine. 

Sometimes messy scanned texts have certain recognizable patterns. For instance, in scanned/OCRed poetry, you often see things like this: 

   I never see the burial place, 
   Where my dear mother lies ; 
   But that I think I see her face, 
   Peak at me through the skies. 

[And yes, it says, “peak” not “peek” in the original. Don’t think this one had a copy-editor] 


The quality of this is pretty good actually, but notice the extra space before the semi-colon on the second line. Let’s say you have that exact glitch 50 or 100 times in a collection… You would normally fix that with Find and Replace: 

   Find: [space]; 

   Replace: ; 

So now the second line should read: “Where my dear mother lies;” 

Chances are, if you see that space before punctuation with a semi-colon, there are probably some with other punctuation as well – I would go through the same document, and do Replace All changes for space before comma, space before period, space before question mark, space before exclamation point. 

But what if you saw patterns with more complicated glitches – things that a simple find and replace couldn’t address? 

Like, say, you wanted to get rid of all of the running headers in a novel – lines that begin with a number and then end with the title of the novel? Again, not a big deal to do this 10 or 20 times by hand. But 300 times? Gets a little old. You can literally do it with a little snippet that looks like this 

   Find: \n\d+.*\n 

   Replace with : [leave blank]. 

(Note: we're jumping ahead a bit here, but the bit of code above says: look for any new line (\n) that starts with a number (\d) followed by any text (.*) and ending with another carriage return (\n). If you replace that with nothing, you are telling it to delete any lines that have that description.)

Hit “Replace All” on the find and replace box, and you just saved yourself 300 manual edits. 

Or, you wanted to find all lines that end with hyphenated words and unbreak the hyphenated words, putting them together? 

Or: you have a passage or a text that for whatever reason is in ALL CAPS. How can you convert it to conventional capitalization without retyping it? 

For those types of problems, you could use a coding system called Regular Expressions (Regex). You can use Regex codes directly in a Find and Replace box in a sophisticated text editor – you don’t need to know Python or R (though these commands do work within Python and R, and some people talking about Regex online are using it with Python). 

Technically Google Docs has a Regular Expressions box, though to be honest I've not used it much, mainly because Google Docs starts to run very slowly if you are working with larger documents. A 300-page novel runs super-slowly; because the software is constantly analyzing and indexing your file as you work and relaying everything back and forth with a remote server; it is also creating a ton of invisible stuff in your file related to formatting and special characters. 


I usually use a free piece of software called Notepad++  for this kind of work (CoTEditor on Mac). 300-page novels load quickly and without delay, and there are no hidden characters or invisible formatting.  It’s also completely offline, though, so you have to remember to hit “Save” and then upload the file to a destination when done. 


Announcing An Open-Access African-American Literature Corpus, 1853-1929

Announcing: an Open-Access African American Literature Corpus, 1853-1929
Amardeep Singh, Lehigh University. On Twitter @electrostani
July 2020 (updated January 2025)

I’ve put together a small corpus of texts by Black authors of fiction and poetry in plain text format. The corpus is downloadable and researchers are free to modify it according to preference.

The corpus at present consists of, at present, about 175 texts by African American writers, of which about 90 are works of fiction (about 5 million words) and 85 are books of poetry (about 700,000 words). It currently starts in 1853, the year of publication of William Wells Brown’s Clotel and Frederick Douglass’ short fiction “The Heroic Slave,” and ends in 1929, the year of Nella Larsen's Passing. Some of the files are admittedly still a little rough around the edges; cleaning and formatting will be an ongoing and long-term process. Still, I think the files are in good enough shape to start preliminarily exploring them using tools like AntConc or VoyantTools.

Right now I’m making the collection available as a Google Drive link as well as on Github


→ Download link. You can find the corpus here (Google Drive) or here (Github). (The Google Drive is more recently updated.)


Sources: 

In the Metadata file I’ve created to accompany the collection, I indicate the origin of each text. Many come from Project Gutenberg, HathiTrust, the American Verse Project at the University of Michigan, the Library of Congress, and the History of Black Writing Novel Corpus. A few texts were present on multiple repositories; I generally used the text of the source that seemed cleanest and most convenient. 


Why Do This / My Background:

I started thinking about the relative paucity of collections focused on people of color online a few years ago (see my blog post on “Archive Gap” from 2015). I then initiated a couple of digital projects aimed to intervene in what I saw as the absence of Black writers in particular, “Claude McKay’s Early Poetry,” and “Women of the Early Harlem Renaissance.” That latter project in particular opened my eyes to the wealth of materials that have essentially fallen off the radar of literary history. A limited quantity of this overlooked material is sampled in anthologies like Maureen Honey’s Shadowed Dreams: Women’s Poetry of the Harlem Renaissance or Double-Take: A Revisionist Take on the Harlem Renaissance. But there remains a fairly substantial ‘great unread’ in the African American literary tradition that could be brought to light, at least partly just by gathering materials that might have already been digitized in one form or another. 

Other corpora centered around Black writers do appear to exist, but they’re often restricted access. (For instance, The History of the Black Novel corpus has 53 works available to the public, but the larger corpus with about 450 works is restricted access for copyright reasons.) 

If corpora either don’t exist or aren’t readily available to scholars who don’t have access to password-protected university servers, that slows down research. At this point, Digital Humanities scholars have done impressive work analyzing large corpora of literature, but very few have applied computational methods to specifically African American texts. My hope is that this corpus might nudge more people to try. 


What’s included in the Corpus: 

In its current form, the corpus contains a mix of poetry and prose (for convenience, I’ve indicated whether a text is poetry or fiction in the title of each file). I’ve excluded slave narratives and other texts that are clearly not literary. (A large number of North American Slave Narratives are, in any case, collected here.) 

I included poetry alongside fiction in part because many of the topics historically-minded scholars might be interested in from these materials can be found in both formats. Many Black poets from this period wrote occasional poetry connected to historical events, including the Civil War and Emancipation, the Spanish-American War, World War I, the "Red Summer" of 1919, and so on. Admittedly, this mixing of formats might cause problems when studying these texts using certain software platforms (i.e., poetry and prose will be tokenized differently; they also need to be classified differently when doing word frequency types of queries, and sentence-length queries won't be useful). 

For convenience, I've also created folders with "Just Poetry" and "Just Fiction" from the collection in the Google Drive folder link above. 

Gender issues: It might also be worth noting that during this time-period there were many African-American women publishing poetry -- but not as many who published fiction. (The reasons for this are beyond the scope of a brief announcement.) Still, including poetry can also be seen as an intentional choice -- designed to include writing by women in the field of view. It's also an invitation to other scholars using these materials to encourage them to work with writing by women. 

Users of this corpus who disagree with my choices are welcome to modify the selection when they design their own queries. I would also welcome any and all feedback. 

Honoring Black Writers / Expanding the Canon:
I’ve been inspired by the statement the Colored Conventions Project asks users to agree to when they download the CCP corpus, especially the first three principles:

  • I honor CCP’s commitment to a use of data that humanizes and acknowledges the Black people whose collective organizational histories are assembled here. Although the subjects of datasets are often reduced to abstract data points, I will contextualize and narrate the conditions of the people who appear as “data” and to name them when possible.
  • I will include the above language in my first citation of any data I pull/use from the CCP Corpus.
  • I will be sensitive to a standard use of language that again reduces 19th-century Black people to being objects. Words like “item” and “object,” standard in digital humanities and data collection, fall into this category. (Link)
While I don’t ask users of this collection to sign an analogous statement, I encourage all users of these materials to adhere to the spirit of the request made by CCP of the users of their corpus. My goal in doing this type of work is to recognize and validate the work of African American writers as important contributors to world literature. One of the ways we can do that is to consider the work at scale, using computational tools like text analysis and stylistics.

"Some Have Happiness Thrust Upon Them": Playing With "Twelfth Night" in "A Suitable Boy" (2/3)

(Part 2 in a Series. See part 1 here. Mira Nair's adaptation of A Suitable Boy debuts on BBC One in the UK on 7/26; the U.S. broadcast dates are yet to be announced.)

Vikram Seth's A Suitable Boy, set just after Indian independence, is deeply concerned with what we might call "de-Anglicization" -- the process by which upper-class and -caste Indians began to shed themselves of the Anglophilia that had been thoroughly imposed upon them over two centuries of British rule in India.

Elite English culture was presented to Indians in modes of dress and eating; it was seen as a work ethic and a demeanor to aspire to ("stiff upper lip"); it was visible in architecture and social structures (the "Club"). But nowhere was the pursuit of Englishness more palpable than in the school system the British established and that Indians continued to propagate for several generations. Most major English-medium Indian schools universities remain modeled on the British system; it's only recently that the American approach to "college" has begun to make inroads.

At the beating heart of that system of educative discipline is of course the Canon of English Literature. So it's not at all an accident that in A Suitable Boy one of the main characters is a young lecturer in English at the provincial (fictional) Brahmpur University. And his young sister-in-law, Lata -- the primary protagonist in the novel -- is herself an English major at the same university. 

It's not that the British are still hanging around at Brahmpur University in Seth's novel; even by the early 1950s, they've all departed. All of the faculty we meet are either fully Indian or mixed-race Anglo-Indian. There's no wizened British Department Chair to force the Indian faculty to toe the line and live and die by Shakespeare, Donne, Milton, and (Percy) Shelley. The Indian faculty enforce the Canon all the same. But the young people at least inhabit Shakespeare slightly differently than the British might have. And the audience receives the play differently than we might expect.

Revisiting "A Suitable Boy" in 2020 (1/2)

I'm excited about Mira Nair's six-part adaptation of Vikram Seth's A Suitable Boy, which will be premiering on BBC One and Netflix India on July 26th. (No word yet on when and how we'll be able to see it in the U.S.) As most people reading this probably know, I have a special interest in this project since I published a book-length study of Mira Nair's films. This is Nair's first feature film since Queen of Katwe (2016), and her first film set in South Asia since The Reluctant Fundamentalist (2012). Nair has a special eye and a gift for telling stories about India, and it's been too long since she's made a film there. Seth's novel, I think, seems like a great fit for picking up where Monsoon Wedding left off...

*

I actually re-read Seth's book in its entirety earlier this summer, partly out of anticipation for the coming adaptation (I should also say that I'm also thinking of writing an article or a book chapter on the novel...). As I did so, I felt a newfound appreciation for the book that I didn't have the first time approached it. In the 1990s, as a young reader, I was interested in the shiny and topical style of writers like Rushdie. I wanted 'quick hits' -- ideas that can be encapsulated nicely in a seminar paper or conference talk. Later, as a young teacher, I tended to look for short books that work well with undergraduates; hence, I put A Suitable Boy away on a high shelf and left it alone. Today, I'm drawn much more to good storytelling and research, and Seth's novel has both.

For those who don't have the many, many hours required to read the whole thing, one possible angle you could try is the Dramatized Audiobook version, which condenses the story and uses a pretty well-known ensemble voice cast. It does downplay the politics and plays up the "Anglophile" parts of the plot a bit, but it's a high quality dramatization and quite entertaining. I listened to it a couple of years, and it whetted my appetite to get back to the text itself.

Nair's television adaptation has a trailer that you can see here:



Thoughts about the trailer? To my eye, the trailer emphasizes two of the romantic plots (Lata-Kabir and Maan-Saaeda Bai), while deemphasizing some of the less glamorous characters and side-plots (Kabir Durrani is clearly there -- but where's Haresh Khanna?).


That said, I have heard (directly from the director!) that the adaptation is going to attend to the social and political upheaval described at length in the novel -- the tensions between urban and rural Indias, the caste politics, and communalism. I'm pleased about that; the novel is much more than a period piece and romantic drama. (If you look carefully at the trailer, you'll see some hints of the politics...)

In a series of three blog posts (one per week), I'll revisit this fine novel, and introduce it (without spoilers!) for people who've never read it.


Postcolonial Ecocriticism: A Preliminary Reading List

Two things motivate me in this blog post: first, I put together a small unit on postcolonial ecocriticism last fall in my Modernism/Postcolonial grad seminar, which turned out to be surprisingly effective for the students in the class. Second, I attended a panel on Postcolonial Ecocriticism at MLA earlier this month in Seattle.

In effect, most of the following is not necessarily material I've already read -- but stuff I want to read. If readers come across this list and would like to add their own suggestions, I would encourage people to use the comments function, or hit me up on Twitter (@electrostani).

1. In my grad seminar, we had been looking at books like E.M. Forster's Howards End, Arundhati Roy's The God of Small Things, and Jhumpa Lahiri's The Lowland. I had only intended The Lowland to be informed by postcolonial ecocritical thinking, but perhaps not surprisingly, we also had conversations about climate change -- more accurately, climate justice -- with those other books as well. Howards End is in part a book about the dreariness of polluted London and the value of rootedness and the local against the dehumanizing effects of transnational capitalism. The God of Small Things, for its part, has a surprising amount of concern for the ecosystem of the Kerala town where it is set -- including the river at the center of the plot -- which we see grow increasingly polluated as the novel moves forward in time from the late 1960s to the 1990s. It's not hard to see the seeds of Roy's later activism on issues like big dams anticipated by this novel; indeed, her essay, "The Greater Common Good," contains a critique of the rhetoric of the "big" and gestures towards resistance in small measures:
We have to support our small heroes. (Of these we have many. Many.) We have to fight specific wars in specific ways. Who knows, perhaps that’s what the 21st century has in store for us. The dismantling of the Big. Big bombs, big dams, big ideologies, big contradictions, big countries, big wars, big heroes, big mistakes. Perhaps it will be the Century of the Small. Perhaps right now, this very minute, there’s a small god up in heaven readying herself for us. Could it be? Could it possibly be? It sounds finger-licking good to me. (link)
The touchstone for our conversations was Rob Nixon's Slow Violence, Or the Environmentalism of the Poor, which is a remarkable book in many ways. It both worked as an introduction and as a roadmap for students who might want to go further -- with writers like Ken Saro-Wiwa, Wangari Maathai, Indra Sinha, and many others who we couldn't fit on our syllabus last fall.


South Asian Modernism: in Blog Posts

I recently chaired a panel at the Modernist Studies Association (MSA) on South Asian Modernisms, with three wonderful scholars whose work was new to me, Jennifer Dubrow of the University of Washington (check out her book on Urdu print culture!), Preetha Mani of Rutgers (AMESALL), and a graduate student from U-Chicago, Supurna Dasgupta.

Our panel dealt with multi-lingual South Asian modernisms, from Saadat Hasan Manto (Urdu) to Krishan Chander (Hindi) to Jayakanthan (Tamil) to Bishnu Dey (Bengali). We had a small but very engaged audience on the last morning of the conference.

In gearing up for chairing the panel I started reviewing some of my many blog posts over the years related to South Asian Modernism. I realized I had done quite a number of them -- plus two academic articles... Between 2010 and 2012 or so I was, in truth, working on a book on Modernism in South Asia -- though I petered out and never quite got it together (I wrote something else instead).

Judging from the lack of awareness about the contours of South Asian modernism in the mainstream of the MSA (outside of Mulk Raj Anand, who is pretty well-known), it seems like a synthetic, internally comparative account of South Asian modernism might be helpful to have. At a minimum, I would hope that such a book should have: 1) an account of the Bengal Renaissance as a jumping-off point [not a modernism proper]; 2) the advent of the Progressive Writers Association; 3) Hindi modernism post-independence (Nayi Kavita, Nayi Kahani); 4) Urdu modernism post-independence (Jadeed Afsana; Manto); 5) Tamil and Kannada modernisms; 6) the Anglophone scene (from Anand to the Calcutta Writers Workshop); and 7) an account of South Asian writers from the 1920s-60s who worked and wrote abroad (Mulk Raj Anand, Nirmal Verma, G.V. Desani, Sajjad Zaheer, Ahmed Ali, among many others).

Owing to my language limitations I am not at all sure that I am the one to write such a book. Though who knows?

Still, in case it's helpful, below is a collection of blog posts I wrote dealing with South Asian modernism, mostly between 2006 and 2012.


Review: A Night in London by Sajjad Zaheer (2012; Urdu modernism in translation; South Asian writers abroad)

Revisiting Ahmed Ali: Twilight in Delhi (2011; Anglophone; South Asian writers abroad)

Gordon Roadarmel and Modern Hindi Fiction (2010; Nayi Kahani; Hindi short stories)

Revisiting the Calcutta Writers Workshop (2010; P. Lal; Anglophone)

Another Look at P. Lal (2010; Anglophone; influence of Anglo-American modernism on Anglophone modernism in India)

Modern Hindi Poetry (2010; on the Naya Kavita movement; a review of sorts of Lucy Rosenstein's collection)

Why I Don't Like Mulk Raj Anand's "Untouchable"... (on representing caste in Indian fiction)

Mulk Raj Anand on the Language Debate (2010; on the status of the English language in Indian literature in the 1930s)

Saadat Hasan Manto's "Letters to Uncle Sam" (2006; early post on Manto; Urdu fiction)

Ismat Chughtai's Short Stories (2006; early post on Chughtai)


Academic articles: 

Progressivism and Modernism in South Asian Fiction: 1930-1970 (Literature Compass)

More than 'Priestly Mumbo-Jumbo': Religion and Authorship in All About H. Hatterr (Journal of Postcolonial Writing)


A Few Scattered Reflections at Mid-Career

Full Professor...

I just got word that the Board of Trustees approved my promotion: I am to be a Professor of English. Not Assistant, not Associate any longer: Full Professor.

Unfortunately, this post will not be a victory lap. Humanities fields are at a crossroads right now, and there are some serious issues to contend with. Some of what follows are some things that have been on my mind this year as I've been considering what comes next for us all as humanists.

To begin with, what does "Full Professor" mean? Friends and family have been asking me this over the past year or so since I submitted my file. (I am the only academic in my extended family -- and actually, the first and only Ph.D.) 

In the short run, all it means is that I don’t have to submit any more files listing all my activities for review by the university. Over the years, I put together three pre-tenure reappointment review files, two tenure review files (long story), two post-tenure triennial review files, and a full professor promotion file. Both the tenure review files and the full promotion files were also sent out to anonymous readers at other institutions who had to write letters. Every senior member in my department also had to write letters of support on my behalf; my chair had to write letters of support; the tenure and promotion committee did its own evaluations. At this point -- finally -- there are no more files to do, and no more evaluative letters have to be written on my behalf. I've apparently been evaluated enough!

In the long term, being a (full) Professor means you’re eligible for certain leadership roles in the department and in the university as a whole. It also means you’re a full citizen of the university and pretty much committed to the institution.

“Committed to the institution” should not be a shock, since I’ve now been on the faculty at Lehigh University for seventeen years! I am of course deeply grateful to everyone who helped me along the way. (For some particular names, see the acknowledgments page of my Mira Nair book. It could as well be an acknowledgments page for the past few years as a whole.)

I am lucky… I survived. 

I know how very lucky I am. I came out of a prestigious Ph.D. program and had a good amount of momentum going into the job market. I was also lucky to be doing it in a time of relative plenty in terms of job availability. I do not know how someone with my unwieldy dissertation project would fare if I had to do it again today. And the department where I landed has been flexible and supportive -- I didn’t really realize the extent of that support until I went up for tenure.

Admittedly, I have some war stories (I think we all do). Some of them I wrote about earlier, and I won’t rehash them here. My job is now pretty secure. But what about our graduate students, who face an academic job market that has been consistently shrinking? What about subsequent generations? Even as I celebrate getting this far, it’s hard not to think that the road ahead is troubled.

Syllabus: "New Brown America: Race and Identity in the 21st Century"

-->
Spring 2019
Instructor: Amardeep Singh, English Department

Short Description

What does it mean to be brown in America in 2019? How have recent historical events -- from 9/11 to the election of Donald Trump -- impacted the status of immigrant communities? This course will explore a range of contemporary texts from popular culture, including novels, memoirs, films, stand-up comedy albums, poetry (both on the page and performed), and musical recordings, all of which explore the changing nature of identity. Many of our primary texts will explore points of intersection between different ethnic and racial groups, including black/Latino/Asian intersections, multiracial identities, and the broad, trans-racial appropriation of hip hop culture. We will also read from critical race theorists who will help students develop a conceptual vocabulary to engage these issues. In terms of performance, starting points will be Hasan Minhaj, Trevor Noah, Sharmila Sen, Eddie Huang, Rupi Kaur, and Mohsin Hamid. Students will be encouraged to bring their own interests and suggested materials to the course.

January 22       First Day of Class: Welcome.

What does Hasan Minhaj mean when he uses the phrase, “New Brown America”? How might his concept – which is poetic and moral – align with demographic trends, showing how a growing number of immigrants might be changing American society? Is the U.S. becoming more ‘brown’, or is it more accurate to say that ‘brown’ immigrants will eventually become ‘white’ – following the path of earlier immigrant communities?
                       
                        In class: Hasan Minhaj, Homecoming King: clips

                        U.S. Census Document, “Race & Ethnicity” (Definitions)
                        https://www.census.gov/mso/www/training/pdf/race-ethnicity-onepager.pdf


January 24       “New Brown America”: Defining Terms
           
                        What do we mean when we describe some groups as ‘races’ and others as
‘ethnicities’? What exactly do sociologists mean by ‘ethnicity’? Second, what
exactly are the immigration trends that have conservative Americans so
disturbed? Is the U.S. becoming more ‘brown’ or is it more accurate to say that
new immigrants are becoming ‘white’?

Omi and Winant, Racial Formation in the United States: Preface and Introduction (2014 edition) (PDF CourseSite)
                       
Pew Research: “Facts on U.S. Immigrants, 2016”
                        http://www.pewhispanic.org/2018/09/14/facts-on-u-s-immigrants/

                        Thomas Edsall, “Who’s Afraid of a White Minority?”


January 29       Blackness/Whiteness

Where do the concepts of Whiteness and Blackness come from in American culture? How did waves of European immigrants ‘become’ white? How are white and black identities defined dialectically, historically? What might it mean to say that “whiteness is a lie” (as Baldwin and Coates both claim)? If whiteness is a lie, what does blackness mean?

Read: Claudia Rankine, Citizen (poetry)

Read: James Baldwin, “On Being ‘White’ and Other Lies” (1984) (PDF
CourseSite)

Woody Deane, “Rethinking Whiteness Studies” (2014) (PDF CourseSite)             

Some Brief Notes on Sharmila Sen's "Not Quite Not White: Losing and Finding Race in America"

I picked up Sharmila Sen's book, Not Quite Not White: Losing and Finding Race in America, as I was beginning to prepare for my upcoming spring class, "New Brown America: Race and Identity in the 21st Century." I have been looking for writers who help us theorize an emerging concept of 'brownness' as an identity formation in the U.S. Here are some of the other books I've been looking at:

  • Kamal Al-Solaylee's Brown: What Being Brown in the World Today Means
  • Richard Rodriguez's Brown: the Last Discovery of America, and 
  • Steve Phillips' Brown is the New White: How the Demographic Revolution Has Created a New American Majority.

These are of course very different books. Phillips' book is really a political strategy essay -- pointing out how immigrant groups tend to lean democratic, and what this ought to mean for the Democratic party going forward. And Rodriguez' book is more a literary essay and memoir than it is a broadly applicable 'theory' of brownness as an emergent racial formation. Finally, Kamal Al-Solaylee's Brown -- a book I would actually strongly recommend -- is more globally focused than it is an account of race and ethnicity in the U.S.  Al-Solaylee's book looks at migrant movements around the world and notes a striking pattern: there are 'brown' migrants working in the middle east (think of the South Asians in Qatar and UAE) and Chinese cities like Hong Kong (many of them Filipina maids and nannies), as well as in the U.S., Canada, and the UK. These workers are 'brown' mainly relationally: their brownness is a sign of their subordinate and migrant status. But they don't form a group or an identity; by and large they are defined only by their relationship to dominant communities wherever they are.

Taken together, these books, along with essays by people like Jose Munoz (who surely would have published his own book on brownness by now had he lived) and the performance and creative writing of people like Hasan Minhaj (Homecoming King), Elizabeth Acevedo (see "Afro-Latina"), Suheir Hammad, and others, are giving us a critical mass of conversation about an emerging 'brown' cultural moment.

As I see it, Sharmila Sen's book is an important part of that unwieldy, wide-ranging, and sometimes awkward conversation. As a community of writers and teachers, we don't quite know what we mean by 'brownness' yet -- but we're increasingly using the term in our conversations nonetheless. We don't quite know what the implications of demographics changes will be on American concepts of race and ethnicity yet (think: "Waiting for 2042"), and we don't yet know whether Trumpism will remain in place in our system (specifically after Trump himself is gone) as a counter to those changes.

In the interim, brownness remains an awkward subject position, a coalitional politics more than a coherent identity. (We need to keep working on it.)

A Long List of Works Now Out of Copyright: Let's Digitize Them?

Updated: thanks to everyone for their suggestions and additions. The list is now significantly longer than it was when I first started putting it together. 

A Note on Method: This list is cobbled together from magazine articles related to Public Domain Day, Wikipedia lists of books published in 1923, and Balfour Smith's extensive spreadsheet of works.

Works published in 1923 are now out of copyright (the reasons for this are complicated; look up the Sonny Bono Copyright Extension Act for more, or see this article in Smithsonian Magazine for a quick primer). I expected there would be a big rush of digitization to coincide with "Public Domain Day," but thus far there doesn't appear to have been all that much activity.

Perhaps one reason is that many texts now entering the public domain can already be viewed online in page view / PDF at Hathi Trust (see links below). A couple of Edgar Rice Burroughs novels show up at Project Gutenberg Australia. But the number of working plain text or HTML editions at sites hosted in the U.S. is quite small. Moreover, a number of major texts appear to have no digital versions available at all at present (see especially e.e. cummings' Tulips and Chimneys and Wallace Stevens' Harmonium).

It seems worth mentioning that a lot happened in 1923. The British authorities seized a copy of Ulysses in the mail and declared it obscene in January. The pulp magazine Weird Tales published its first issues. Sean O'Casey's The Shadow of a Gunman had its debut, as did George Bernard Shaw's Saint Joan. The Surrealists and the Dadaists had a riot in Paris, and decided to part ways. And all of the books below were published!

Here's a longish list of texts that were published in 1923, and that are now out of copyright. Where I've been able to find Hathi Trust, Gutenberg, or Archive.org links I've provided those. I will add to this list as I learn of more.

Works Published in 1923 -- Now in the Public Domain

Poetry

Jean Toomer, Cane (Hathi Trust). UPDATE: My bare-bones digital edition here.
e.e. cummings, Tulips and Chimneys (no edition as of yet)
Joseph Conrad, The Rover (Hathi Trust)
Robert Frost, New Hampshire (Gutenberg Edition: January 4, 2019)
Robert Frost, Selected Poems
William Carlos Williams, Spring and All (Wikipedia)
Wallace Stevens, Harmonium (Wikipedia)
Carl Sandburg, Rootabaga Pigeons (Hathi Trust)
Willa Cather, April Twilights and Other Poems
Vachel Lindsay, Collected Poems
Vachel Lindsay, Going-to-the-Sun
Edna St. Vincent Millay, The Harp-Weaver, and Other Poems
George Santayana, Poems, revised

Fiction (literary)

Aldous Huxley, Antic Hay (Hathi Trust)
D.H. Lawrence, Kangaroo (Hathi Trust)
D.H. Lawrence, Three Novellas (The Fox, The Captain's Doll, The Ladybird)
D.H. Lawrence, Birds, Beasts and Flowers
Ernest Hemingway, Three Stories and Ten Poems (Wikipedia)
Katherine Mansfield, The Doves' Nest and Other Stories (Hathi Trust) 
Katherine Mansfield, Bliss and Other Stories
Edith Wharton, A Son at the Front (Hathi Trust)
H.G. Wells, Men Like Gods (Hathi Trust; Wikipedia)
Willa Cather, A Lost Lady (Hathi Trust)
William Carlos Williams, The Great American Novel 
Samuel Hopkins Adams (publishing as Warner Fabian), Flaming Youth (Wikipedia Entry. Hathi Trust lists this as published 1924)
Sherwood Anderson, Many Marriages (Hathi Trust; Wikipedia entry)
Sukumar Ray, Abol Tabol (Wikipedia)
Dhan Gopal Mukerji, Jungle Beasts and Men
John Dos Passos, Streets of Night
Carl Van Vechten, The Blind Bow-Boy
Djuna Barnes, A Book
Gertrude Atherton, Black Oxen (Gutenberg link.)
Arnold Bennett, Riceyman Steps (class study involving shell shock; Wikipedia)
Elizabeth Bowen, Encounters (Archive.org; short stories)
John Galsworthy, Captures
John Galsworthy The Burning Spear
Rudyard Kipling, Land and Sea Tales for Boys and Girls
Vita Sackville-West, Grey Wethers
Olive Schreiner, Stories, Dreams and Allegories
Virginia Woolf, "Mrs. Dalloway in Bond Street" (short story that would later feed into Mrs. Dalloway [1925])

Translations

Anton Chekhov, Love and Other Stories (trans. Constance Garnett)
Jules Verne, The Castaways of the Flag (first English-language edition)
Jules Verne, The Lighthouse at the End of the World (first English-language edition)
Colette, Green Wheat
Alexandre Dumas, The Three Musketeers (trans. Philip Shuyler Allen)
Nikolai Gogol, Dead Souls (trans. Constance Garnett)
Nikolai Gogol, The Overcoat and Other Stories (trans. Constance Garnett)
Maxim Gorky, My University Days (trans. Louis P. Lochner)
Knut Hamsen, Victoria (trans. Arthur G. Chater)
Heinrich Heine, Poems (trans. Louis Untermeyer)
Emond Rostand, Cyrano de Bergerac (trans. Brian Hooker)

Notable Nonfiction

Dhan Gopal Mukerji, Caste and Outcast
Bertrand Russell, The Prospects of Industrial Civilization
G.K. Chesterton, Fancies Versus Fads
Winston Churchill, The World Crisis
Jessie Conrad, A Handbook of Cookery for a Small House 
Arthur Conan Doyle, Our American Adventure
Theodore Dreiser, The Color of a Great City
E. M. Forster, Pharos and Pharillon 
James G. Frazer, Folk-lore and the Old Testament (abridged edition)
Aldous Huxley, On the Margin: Notes and Essays
D.H. Lawrence, Studies in Classic American Literature
David Lloyd George, Where Are We Going?
H.L. Mencken, The American Language, 3rd revised edition
Thorstein Veblen, Absentee Ownership and Business Enterprise in Recent Times: The Case of America
Woodrow Wilson, The Road Away from Revolution

Popular fiction and Genre Fiction


L. Frank Baum, The Cowardly Lion of Oz
Edgar Rice Burroughs, Tarzan and the Golden Lion (Gutenberg Australia)
Edgar Rice Burroughs, The Girl From Hollywood (Gutenberg Australia; Wikipedia Entry)
Agatha Christie, The Murder on the Links
Marie Corelli, Love and the Philosopher
Austin Hall, People of the Comet (Science fiction serialized in Weird Tales in 1923)
Kahlil Gibran, The Prophet (Gutenberg)
Herman Hesse, Demian (first English-language edition)
Dorothy Sayers, Whose Body? (Wikipedia)
P.G. Wodehouse, The Inimitable Jeeves (Archive.org)
P.G. Wodehouse, Leave it to Psmith (Hathi Trust)
P.G. Wodehouse, Mostly Sally
Maxwell Bodenheim, Blackguard
Thomas Alexander Boyd, Through the Wheat (an American World War I novel; Wikipedia entry)
Max Brand, Seven Trails
John Buchan, Midwinter (Gutenberg Australia)
James Branch Cabell, The High Place: a Comedy of Disenchantment (Wikipedia entry)
Hall Caine, The Woman of Knockaloe
Susan Ertz, Madame Claire
Jeffery Farnol, Sir John Dering
J.S. Fletcher, The Charing Cross Mystery (Gutenberg Canada)
Zona Gale, Faint Perfume
Garet Garrett, Cinder Buggy
Philip Gibbs, The Middle of the Road
Talbot Mundy, The Nine Unknown (orientalist fantasy involving the Emperor Ashoka and Kali worshippers)
Liam O'Flaherty, Thy Neighbour's Wife
Olive Higgins Prouty, Stella Dallas (Wikipedia entry)
William MacLeod Raine Iron Heart 
Rafael Sabatini, Fortune's Fool
May Sinclair, Uncanny Stories (illustrated by Jean de Bosschere)
James Stephens, Deirdre
Margaret Wilson, The Able McLaughlins (Wikipedia)
Anzia Yesierska, Salome of the Tenements



Introducing Mira Nair: a slideshow video



I put this video together to help introduce folks to Mira Nair. Some people know her films well, but I've found in recent months that many friends -- even those who know their world cinema -- often don't know the full range of her work.

Many of the images in this slideshow are also screen captures I use as illustrations in my book on the filmmaker.

The Films of Mira Nair: Diaspora Vérité is now available in paperback from Amazon.com

New Brown America: Revisiting Sepia Mutiny in 2018

[I'm giving the following as a conference talk at the Madison South Asia conference on Friday, 10/12/2018] 

I view Sepia Mutiny as a space where second-gen South Asian Americans worked on their identity issues publicly at a moment when a generation of talented artists and performers were on the cusp of emergence into the American mainstream. While the site is now defunct, I would argue that the debates occurring on the site have continued to be live since it went offline, often now in mainstream venues and an evolving set of social media frameworks.

Some of the key themes of Sepia Mutiny writing include:

1) The significance of emergent South Asian American identity in the broader North American context. What does it mean to be ‘brown’ in the U.S. in the early years of the 21st century? What terms do we use to name ourselves? (Do we, for example, use the word 'desi' or not?) How strong or weak are our alliances and affinities as a group (across religious, national, caste, and regional boundaries -- to name just four huge fault-lines)? How do South Asian Americans situate themselves against the white mainstream as well as other minority identities -- other Asian Americans, Arabs and Persian immigrant communities, African Americans, Latinos? What kinds of cultural and artistic products document that emergence and work through some of the key obstacles we’ve faced – including especially 9/11 and the election of Trump in 2016?

2. The many, many ways of being hybrid, mixed, split. South Asian Americans are notably defined by generational and intra-cultural variation, but one thing we all seem to have in common is a kind of internal culture clash. How to connect ‘home’ tastes and values to the versions of ourselves we perform in public? How to position ourselves both with respect to mainstream western cultural icons and South Asian aesthetic worlds -- from Indian classical dance to Bollywood/Bhangra? How much does your identity really mean if, as a second-gen, you don’t speak a South Asian language very well or at all? What is your relationship to ‘home’?

3. The ongoing problem of appropriation as an indirect mode of racism and cultural diminishment. In the mainstream, this could be in the form of western performers appropriating Indian cultural or religious symbolism: a pop star wearing a bindi, or the complex appropriation of Hindu devotional practices in westernized versions of Yoga. It could, of course, also be a matter of accent appropriation -- and here, our own frequent criticisms of western appropriation of bad Indian accents clearly anticipated the kind of critique Hari Kondabolu would later make of Hank Azaria and the creators of the Simpsons in his 2017 documentary The Problem With Apu. I also can't help but think of the "Macaca" controversy of 2006, the many, many examples of stereotyping and typecasting of South Asian Americans as either model minorities or terrorists.

* * *

Shades of Brown: Notes for a South Asian American Media Studies Project

I'm starting a sabbatical, and hoping to restart this blog with a series of posts related to the thinking I'm doing over the summer and into the fall. Here's the first of what I hope will be a series of meditations building towards what might become a new book project

What does it mean to be 'brown'? What are the parameters and limits of brownness -- as a skin complexion, as a racial category in American life? Many Latinx people identify as 'brown'; and slogans like "Brown Power" have been part of the Latinx and Chicanx political vocabulary since the 1970s. South Asians identify as 'brown' as well -- and there's as much complexional variance amongst South Asians as there are amongst Latinx people. Are South Asians the same 'brown' as Latinx people? We need to explore this; we need to have a conversation about what we mean by brown. When is it a term of pride? What are the different browns -- moreno/a, mestizo/a, Indio/a -- and east Indian shades of brown?
"Boricua morena
Boricua morena
Boricua morena..."
-Big Pun

Admittedly, as a color (and not necessarily as a complexion), brown has its own values and aesthetic legacy in English. Brown can suggest mud, it can suggest shit, it can suggest a combination of too many colors (when painting, a mess, or a mistake). To claim brownness as a political and racial category is to push back against the ways in which the color is devalued (though it should be noted that the negation of 'brown' is different from that of 'black'). Brown is also a natural, intermediate, and inclusive color -- not an extreme color defined by purity of one kind or another. Brown is the earth -- the ground, out of which other brown things grow. Only some of us can be white or black (historically determined identities shaped specifically by anti-Blackness); potentially all of us have some shades of brown in our skin, including people who trace all of their ancestry to Europe. If brown is the American future, could 'brown' become the racial default, displacing 'white'? 

Arguably, large numbers of biracial and multiracial people might be understood as 'brown'. Some of them have also been understood, sometimes awkwardly, as black (Tiger Woods). Others proceed in their careers with a degree of ethnic and racial ambiguity (Dwayne Johnson, Vin Diesel). If we're going to have a conversation about brownness, we need to have a conversation about multiracial identities as well, especially given the rapid increase in the number of families who identify as multiracial in the past three decades. If America is turning brown, it's doing so as much through intermarriage as through immigration (sometimes both at once).



Unlike blackness or whiteness, brownness seems to be a porous category -- not a term historically shaped (as blackness is) by the legacy of the American slave trade or the one-drop rule. But what exactly is it? 

"Listen made intently when you make the sound
Tell you that it's all love,
They care about the browns
The truth is when you down,
They be out making the rounds
Like, brown boy, brown boy, what's up with that sound, boy?"
-Heems

Through the 1990s and into the 2000s, there were really two options for an aspiring South Asian diaspora performer -- mainstreaming (which usually entailed deracialization and assimilation to a state close to whiteness), or orientation to a small constituency of fellow South Asians (peforming for other browns -- other desis). Much of the South Asian diaspora fusion Bhangra music that circulated in the 1990s and 2000s operated in this model, with independent music labels and a subcultural nightclub circuit. It was anchored in a vibrant college scene, with dance clubs on many campuses and intercollegiate competitions like Bhangra Blowout.


In many ways the model for minoritization came from the African American community, and that imprint is not unimportant. Blackness and black culture is a huge part of South Asian diaspora media culture. The musical idiom with the most cachet since the 1990s has of course been hip hop, with the play between minoritization and mainstreaming that has been central to that subculture playing out in the South Asian version as well. For every mainstream, crossover success (i.e., Panjabi MC), there are figures like Bohemia and Dr. Zeus, who stayed underground. And a version of this might adhere with Latinx music as well, where Reggaeton in particular is deeply indebted to Afro-Jamaican dancehall reggae and hip hop. But hip hop is not just a musical idiom and a subculture; for 'brown' performers it's served as the primary pathway to mainstream legibility. 

(And we could talk about some of the interesting brown cross-references that have occurred, as for instance when the Cuban-American rapper Pitbull, in his breakthrough 2001 single, "Culo," used the "Coolie Riddim" -- a dancehall beat with an East Indian sound. Or, conversely, the influence of salsa and other Latinx musical forms in Bollywood music...)

The debt to hip hop is sometimes fraught, as Heems discovered when he received pushback for Tweeting lyrics to a song (by an African American rapper) that included the n-word. And, for her part, M.I.A. got into trouble when she questioned the racial singularity of the Black Lives Matter movement ("Is Beyoncé or Kendrick Lamar going to say Muslim Lives Matter? Or Syrian Lives Matter? Or this kid in Pakistan matters? That's a more interesting question"). More broadly, though the advent of "brown rap" raises a question about the nature of the performance -- is a rapper like Heems performing "brownness" or "blackness" if and when he uses black vernacular phrases and cadences? What might it mean to engage with hip hop as a brown rapper and not attempt to mimic African American voices?

I'm not from here
Please be patient
I be ragin' face displacement
I'm obsessed with the space between spaces
Eh, f---ing racists
I get caged in a box cause I'm Asian  
-Heems 

Perhaps, sometime around 2008, a third option started to emerge in bits and pieces in mainstream American popular culture. That option might be described as the brown option. This option entails mainstreaming without necessarily disavowing ethnic or racial difference. Neither 'white' nor 'black' -- something else.

If you catch me at the border,
I've got visas in my name
-M.I.A., "Paper Planes" 

The year 2008 is imprecise, but it seems like a good yardstick. 2008 is the year Das Racist had its breakout hit with "Combination Pizza Hut and Taco Bell"; in a more mainstream setting, 2008 was the year British South Asian pop/R&B singer Jay Sean signed to Cash Money records (he released "Down" in 2009 -- it went to #1 on the Billboard charts).  2008 is the year M.I.A.'s "Paper Planes" was a hit on American radio stations (though the song was actually was released in 2007). Naveen Andrews was breaking hearts with his dreamy character Sayid on Lost in 2008.

2008 is just before Aziz Ansari hit the mainstream with Parks and Recreation (2009) and his cameo as "Randy" ("Raaaaaaaandy") in Funny People, though as of 2008 he was very much on the cusp. Kal Penn and John Cho's Harold and Kumar Go to White Castle was of course released earlier, but Harold and Kumar Escape from Guantanamo Bay, the more explicitly politicized and highly improbable sequel to the multicultural stoner classic, was released in 2008.  Also in 2008, Aasif Mandvi was a regular correspondent on "The Daily Show" with Jon Stewart, while he was also playing prominent roles on shows like Jericho. Sendhil Ramamurthy was one of the break-out stars on Heroes. And Mindy Kaling was a star writer and actor on The Office -- she got her own show in 2012.

And of course 2008 is the year of the biggest 'brown' mainstreaming event one could imagine: the Presidential campaign and election of Barack Obama. This was a campaign where South Asians were prominently and consistently aligned with the biracial ('brown') Presidential candidate. Barack Hussein Obama shared the problem, which many people of South Asian descent feel acutely, of the 'funny' name -- a name people might struggle initially to pronounce. He still ran for president on his own name (he could easily have presented himself to the world as 'Barry' -- the nickname he used as a young man). And won.

And yes, alongside Barack Obama, we should duly note that 2008 was the year Bobby Jindal was sworn in as the governor of Louisiana -- the first Indian-American governor in American history. Arguably, however, if people like Barack Obama or Aasif Mandvi were finding ways to enter the mainstream while embracing their complex identities and backgrounds (their 'brownness' and, in Obama's case, 'blackness' as well), people like Jindal seemed to be downplaying any signs of racial or religious difference.



The political legacy of these events has been beautifully and comprehensively discussed in Sangay Mishra's groundbreaking book, Desis Divided: the Political Lives of South Asian Americans. Mishra also uses the dual pathways I have been describing, though he uses a slightly different vocabulary ("pluralizing"/"assimilationist" vs "racializing"/"minoritarian"). He also limits his scope to politics -- here I'll be primarily interested in media figures, including actors, musicians, and stand-up comedians. I'll be interested in in politicians like Jindal and Nikki Haley insofar as they perform versions of brownness in the American public sphere.

Through much of this period, I was writing about these issues on the internet with a very active group of readers and co-contributors. The site where we were having these conversations was a group blog called Sepia Mutiny. One of my goals, going forward, is to review the scope of the conversation we were having on Sepia Mutiny between 2005 and about 2010 to retrace our steps -- to find the contours of the evolving conversation about brownness and the emerging new forms of racialization in the American landscape.

Along the way I want to look at precursors to the 2008 moment -- the long tradition of South Asian American (and maybe also Latinx) media presence in the American landscape through the 20th century. And also think about what's happened since then -- Mindy Kaling (The Mindy Project), Aparna Nancherla, Kumail Nanjiani, Hasan Minhaj, Hari Kondabolu. I'm also interested in Youtube stars like Lilly Singh and the Instagram poetry sensation Rupi Kaur. How do all of these artists, in their respective fields, navigate brownness in the new media landscape?