Recently, I announced a Text Corpus I had put together, of African American Literature from 1853-1923.
I've also been putting together a Corpus of Colonial South Asian Literature from roughly the same period.
The link to that folder can be accessed here. I'll also be posting the files on Github soon.
This has been a much harder Corpus to compose. Whereas with the African American literature we have bibliographic lists of published works to serve as a guide (such as the one posted at the History of Black Writing at Kansas), there does not appear to be an equivalent list with respect to Colonial South Asia.
Choices Made in Producing this Corpus:
I decided to include British as well as South Asian writers in the Corpus. Many of the writers were clearly in dialogue with one another; South Asian writers were clearly reading people like Rudyard Kipling, E.M. Forster, and Katherine Mayo. It's a little less clear which South Asian writers British and American writers were reading other than Tagore (and this itself might be studied). The publishing industries also overlapped to a considerable extent; while some South Asian writers published their works with publishers based in India, many aimed to publish with houses based in London.
One possible line of inquiry with this material might be to try and compare fiction, poetry and drama by British authors with South Asian output in English. Such inquiry could either be historical and thematic (i.e., comparing the way British and South Asian writers reacted to historical events like the Sepoy Mutiny or the Famine of 1876), or it could be connected to matters of language and style. To do that it makes sense to have writers from different backgrounds represented in the Corpus.
I knew there was a fair amount of interest in colonial India in the U.S. at the time -- from the appreciation of Kipling to the American feminist fascination with Pandita Ramabai. However, while doing this research I was surprised to come across a large number of Pulpy Indian adventure novels by an American writer named Talbot Mundy.
In the metadata file, I list the nationalities of the authors. Besides a few Americans in the collection, I would draw readers' attention to B.M. Croker (an Irish woman who lived in India and wrote many Romance novels based in colonial India), and Sara Jeannette Duncan (a Canadian woman who also lived in India and wrote prolifically as well).
In addition to the nationality question, with South Asian writers who moved abroad there is also the question of destination. Cornelia Sorabji (who eventually moved to England) is of course pretty well known. Dhan Gopal Mukerji, who moved to the U.S. in the 1910s, is mainly known for his memoir Caste and Outcast, but he was quite a prolific literary writer, with several books of poetry and fiction that are worth looking at.
I decided to include translations by South Asian writers like Bankim Chandra Chatterjee (Chattopadhyay) and Rabindranath Tagore in the Corpus. Tagore of course needs no explanation; he was one of the few South Asian writers to break through and achieve global acclaim in the early 20th century. Bankim Chandra Chatterjee (here, I'm using one of the spellings used at the time, aware of course that "Chatterjee" and "Chatterji" are colonial-era abbreviations of Chattopadhyay...) is slightly different. He is clearly historically important for Anandamath (here included in translation as Dawn Over India) and Rajmohan's Wife (thought to be the first English-language novel by an Indian author), but it seemed like it might be valuable to include some other of his Bengali novels in translation here. Several of these I found at Wikisource.
Alongside translations by South Asian writers, there are a few translations in the corpus of historical South Asians texts by British writers.
3. Fiction and Nonfiction
Right now there is a limited amount of nonfiction included in the corpus. This was a very tough decision, as there is a vast array of nonfiction colonial travel writing based in South Asia from this period. I've excluded that sort of writing for now, though I may include more of it as I continue to expand the corpus.
However, I decided to include some nonfiction, mostly texts by literary authors who wrote occasional works of nonfiction (Dhan Gopal Mukerji's Caste and Outcaste is included, as is Tagore's My Reminiscences). I've also included a plain text file of Pandita Ramabai's The High-Caste Hindu Woman, mainly because it seems like an important text that might be useful for researchers in this field. Any queries specifically structured around the stylistics of fiction or the colonial novel might want to exclude these nonfiction texts.
4. Derivation; grunt work
As with my other Corpus, I pulled together materials from different repositories to assemble this corpus. Here, the lion's share of material comes from Project Gutenberg and HathiTrust. (Derivation is indicated in my metadata file.)
The Gutenberg materials were in good shape; they've generally been proofread and formatted cleanly.
The HathiTrust materials required much more work. One can extract HathiTrust texts by requesting plain text, but these OCR page scans need quite a bit of processing to make them clean enough to use. A lot of the grunt work of assembling this collection has entailed doing that processing.
Here is a list of works I've imported from HathiTrust page scans thus far:
|Oakfield; Or, Fellowship in the East
|A Hindoo Love Story
|Siri Ram, Revolutionist
|Mantle of the East
|Year of Chivalry
|Chatterji, Bankim Chandra
|Anandamath: Dawn Over India
|Chatterji, Bankim Chandra
|Diana Barrington: A Romance of Central India
|A Rolling Stone
|Lilamani: A Study in Possibilities
|Derozio, Henry Louis Vivian
|Poems of Henry Louis Vivian Derozio: A Forgotten Anglo-Indian Poet
|Duncan, Sara Jeannette
|Dutt, Michael Madhusudan
|Sermista; a drama in five acts
|Dyer, Helen S.
|Pandita Ramabai: The Story of Her Life
|Kipling, Rudyard and Wolcott Balestier
|The Naulahka: A Story of West and East
|Mukerji, Dhan Gopal
|Caste and Outcast
|Mukerji, Dhan Gopal
|Layla-Majnu: A Musical Play in Three Acts
|Mukerji, Dhan Gopal
|Rajani: Songs of the Night
|The High Caste Hindu Woman
|Kamala: A Story of Hindu Life
|Between the Twilights: Being Studies of Indian Women By one of Themselves
|Indian Tales of the Great Ones Among Men, and Bird-People
|Shubala-A Child Mother
|Sun-Babies: Studies in the Child-Life of India
Some of the highlights in the table above are in bold. As far as I know, these are the first plain text versions of the above texts to be made available online.
You may notice that a couple of these texts are dated post-1923. I believe the 1941 translation of Anandamath (Dawn Over India) has fallen out of copyright in the U.S.
I should add that while I've cleaned up these files, I haven't proofread them. That is going to be a long-term project -- for which I would welcome collaborators!