Text Corpus: Colonial South Asian Literature

Recently, I announced a Text Corpus I had put together, of African American Literature from 1853-1923. 

I've also been putting together a Corpus of Colonial South Asian Literature from roughly the same period.  

The link to that folder can be accessed here. I'll also be posting the files on Github soon.

This has been a much harder Corpus to compose. Whereas with the African American literature we have bibliographic lists of published works to serve as a guide (such as the one posted at the History of Black Writing at Kansas), there does not appear to be an equivalent list with respect to Colonial South Asia. 

Choices Made in Producing this Corpus:

1. Nationalities

I decided to include British as well as South Asian writers in the Corpus. Many of the writers were clearly in dialogue with one another; South Asian writers were clearly reading people like Rudyard Kipling, E.M. Forster, and Katherine Mayo. It's a little less clear which South Asian writers British and American writers were reading other than Tagore (and this itself might be studied). The publishing industries also overlapped to a considerable extent; while some South Asian writers published their works with publishers based in India, many aimed to publish with houses based in London. 

One possible line of inquiry with this material might be to try and compare fiction, poetry and drama by British authors with South Asian output in English. Such inquiry could either be historical and thematic (i.e., comparing the way British and South Asian writers reacted to historical events like the Sepoy Mutiny or the Famine of 1876), or it could be connected to matters of language and style. To do that it makes sense to have writers from different backgrounds represented in the Corpus. 

I knew there was a fair amount of interest in colonial India in the U.S. at the time -- from the appreciation of Kipling to the American feminist fascination with Pandita Ramabai. However, while doing this research I was surprised to come across a large number of Pulpy Indian adventure novels by an American writer named Talbot Mundy.  

In the metadata file, I list the nationalities of the authors. Besides a few Americans in the collection, I would draw readers' attention to B.M. Croker (an Irish woman who lived in India and wrote many Romance novels based in colonial India), and Sara Jeannette Duncan (a Canadian woman who also lived in India and wrote prolifically as well).  

In addition to the nationality question, with South Asian writers who moved abroad there is also the question of destination. Cornelia Sorabji (who eventually moved to England) is of course pretty well known. Dhan Gopal Mukerji, who moved to the U.S. in the 1910s, is mainly known for his memoir Caste and Outcast, but he was quite a prolific literary writer, with several books of poetry and fiction that are worth looking at. 

2. Translations. 

I decided to include translations by South Asian writers like Bankim Chandra Chatterjee (Chattopadhyay) and Rabindranath Tagore in the Corpus. Tagore of course needs no explanation; he was one of the few South Asian writers to break through and achieve global acclaim in the early 20th century. Bankim Chandra Chatterjee (here, I'm using one of the spellings used at the time, aware of course that "Chatterjee" and "Chatterji" are colonial-era abbreviations of Chattopadhyay...) is slightly different. He is clearly historically important for Anandamath (here included in translation as Dawn Over India) and Rajmohan's Wife (thought to be the first English-language novel by an Indian author), but it seemed like it might be valuable to include some other of his Bengali novels in translation here. Several of these I found at Wikisource.

Alongside translations by South Asian writers, there are a few translations in the corpus of historical South Asians texts by British writers. 

3. Fiction and Nonfiction

Right now there is a limited amount of nonfiction included in the corpus. This was a very tough decision, as there is a vast array of nonfiction colonial travel writing based in South Asia from this period. I've excluded that sort of writing for now, though I may include more of it as I continue to expand the corpus. 

However, I decided to include some nonfiction, mostly texts by literary authors who wrote occasional works of nonfiction (Dhan Gopal Mukerji's Caste and Outcaste is included, as is Tagore's My Reminiscences). I've also included a plain text file of Pandita Ramabai's The High-Caste Hindu Woman, mainly because it seems like an important text that might be useful for researchers in this field. Any queries specifically structured around the stylistics of fiction or the colonial novel might want to exclude these nonfiction texts. 

4. Derivation; grunt work

As with my other Corpus, I pulled together materials from different repositories to assemble this corpus. Here, the lion's share of material comes from Project Gutenberg and HathiTrust. (Derivation is indicated in my metadata file.) 

The Gutenberg materials were in good shape; they've generally been proofread and formatted cleanly.

The HathiTrust materials required much more work. One can extract HathiTrust texts by requesting plain text, but these OCR page scans need quite a bit of processing to make them clean enough to use. A lot of the grunt work of assembling this collection has entailed doing that processing. 

Here is a list of works I've imported from HathiTrust page scans thus far: 

Arnold, W.D. Oakfield; Or, Fellowship in the East 1855
Bain, F.W. A Hindoo Love Story 1898
Candler, Edmund Abdication 1922
Candler, Edmund Siri Ram, Revolutionist 1911
Candler, Edmund Mantle of the East 1910
Candler, Edmund Year of Chivalry 1916
Chatterji, Bankim Chandra Anandamath: Dawn Over India 1882 (1941)
Chatterji, Bankim Chandra Krishnakanta's Will 1917
Croker, B.M.  Proper Pride 1882
Croker, B.M.  Diana Barrington: A Romance of Central India 1888
Croker, B.M.  A Rolling Stone 1911
Diver, Maud Lilamani: A Study in Possibilities 1911
Diver, Maud Unconquered 1917
Derozio, Henry Louis Vivian Poems of Henry Louis Vivian Derozio: A Forgotten Anglo-Indian Poet 1923 (1831)
Duncan, Sara Jeannette Burnt Offering 1910
Dutt, Michael Madhusudan Sermista; a drama in five acts 1859
Dyer, Helen S. Pandita Ramabai: The Story of Her Life 1900
Kipling, Rudyard and Wolcott Balestier The Naulahka: A Story of West and East 1892
Mukerji, Dhan Gopal Caste and Outcast 1923
Mukerji, Dhan Gopal Layla-Majnu: A Musical Play in Three Acts 1916
Mukerji, Dhan Gopal Rajani: Songs of the Night 1916
Ramabai, Pandita The High Caste Hindu Woman 1888
Satthianadhan, Krupabai Kamala: A Story of Hindu Life 1894
Sorabji, Cornelia Between the Twilights: Being Studies of Indian Women By one of Themselves 1908
Sorabji, Cornelia Indian Tales of the Great Ones Among Men, and Bird-People 1916
Sorabji, Cornelia Shubala-A Child Mother 1920
Sorabji, Cornelia Sun-Babies: Studies in the Child-Life of India 1904
Tagore, Rabindranath Gora 1924 (1901)

Some of the highlights in the table above are in bold. As far as I know, these are the first plain text versions of the above texts to be made available online. 

You may notice that a couple of these texts are dated post-1923. I believe the 1941 translation of Anandamath (Dawn Over India) has fallen out of copyright in the U.S.

I should add that while I've cleaned up these files, I haven't proofread them. That is going to be a long-term project -- for which I would welcome collaborators!