As readers may be aware, I've been periodically creating small, open-access textual corpora, collecting African American literature and literature from Colonial South Asia.
The Kitchen Sink, Carefully Collected and Labeled: Recently, I thought it might be a worthwhile project to create a larger textual corpus, collecting out-of-copyright materials from a broad range of authors from the early 20th century. The idea is to collect materials from recognizable modernists like Virginia Woolf and James Joyce, alongside African American writers, Indian writers like Rabindranath Tagore, as well as a sampling of genre fiction (including detective fiction, adventure fiction, science fiction, etc.). So: everything from Jack London to Edith Wharton to Langston Hughes.
The goal is to produce a collection that could be useful to people doing quantitative analyses of these materials, but also to scholars doing conventional historical scholarship on the literature of the period. I've tried to make the collection segmented, so that people interested in just writing by mdoernist women, for instance, could sort the collection that way (see the metadata below). Similarly, people interested in just African American poetry could sort the collection that way as well (using the Af-Am poety folder).
Having these aspects of social and cultural identity represented in the metadata was important to me; it's one reason why I've found existing textual repositories online insufficient.
How to access the corpus? This is a work in progress. It can be found here for the moment.
As I've been going, I've been drawing largely on digital files at Project Gutenberg, Archive.org, and HathiTrust. (Note: the Gutenberg files will need to be "cleaned" to make them useful for quantitative queries; as of the present writing, I have not yet done that with the files, but it should be happening soon.)
As important (or more important) than the collection itself is the metadata file, with information about the texts. I'll say more about the metadata file below.
1. Folders:
Literary Fiction / High Modernism. Essentially what you would expect -- texts from 30-40 prominent modernist writers from the UK, Ireland, and the U.S., with a view less well-known figures like Hope Mirrlees.
Genre Fiction, including Science Fiction, Detective Fiction, Adventure, Romance, Horror. This period was of course the Golden Age of Detective Fiction, with Arthur Conan Doyle writing at the fin de siecle and writers like Agatha Christie and Dorothy Sayers emerging in the 1920s. Writers like Doyle and Wells both straddled the late 19th and early 20th centuries; ultimately, I will probably aim to put their pre-1900 works in an appropriate folder for people doing author-based work.
All Fiction. What it sounds like. A mix of "highbrow," "middlebrow" and popular fiction.
All Poetry. Canonical figures like
Drama. As of the present moment, I haven't been actively seeking out dramatists to include in this folder; it mostly consists of plays written by authors who were primarily not playwrights (such as Yeats).
African American Fiction. For more on this collection, see this earlier description of my African American materials.
African American Poetry. See the link above.
Colonial South Asian Texts. For more on this collection see here.
Nonfiction and Essays (including Travel narratives, Memoirs, and Literary Criticism).
2. Metadata.
We've collecting the following information about the texts as we go. The metadata file (a work in progress) can be viewed here.
Author's name
Title
Year of First Publication
Year of Author's Birth
Publisher (first publisher)
Genre or Mode: Fiction, Nonfiction, Poetry, Short Fiction, Drama
Author's inferred gender: M, F, NB. As of now, I am understanding writers like Bryher and Radclyffe Hall to be nonbinary (NB). Others of course have complex relationships to gender expression (one thinks of Gertrude Stein). This category may be revised or rethought over time.
Author's nationality
Location in Corpus
Location of Publisher
Tags and Themes: WWI, Travel, LGBTQIA, Disability, Environmental, African American, South Asian, Indigenous
Provenance of Text: Gutenberg, HathiTrust, Archive.org, etc.
Again, the metadata file is very much a work in progress. Completing it may take weeks or even months, but I hope that when it's complete it will be useful to researchers.