The Archive Gap: Race, the Canon, and the Digital Humanities

[Update: A substantially revised version of this blog post was published in South Asian Review in 2019. That article can be found here]

Pioneering work by digital archivists like Jerome McGann of the University of Virginia helped lay the groundwork for the conceptualization of a set of best practices for online archives that have been widely replicated in subsequent projects. That said, some scholars associated with race studies and gender studies constituencies have raised questions about the ways in which the first wave of major digital archives essentially reinforced the Anglo-American canon. Authors like Walt Whitman, D.G. Rossetti, William Blake, Emily Dickinson, and Henry David Thoreau were, by about 2008, well-represented by thorough, thoughtfully designed, and technically sophisticated web archives. These archives frequently feature page images of manuscript drafts of the complete works of the authors in question, as well as (in the case of Whitman and Blake) exceptionally deep access to different versions and printings of key texts. This same level of attention was, generally, lacking with reference to minority writers. As Stephanie P. Browner puts it, “scholars of race and ethnicity do not yet get online and find themselves in a deep, comprehensive, well-linked and indexed world of materials.” [link]

Here, I will survey what I am calling the "Archive Gap," comparing and contrasting digital archives of canonical figures (especially Rossetti and Whitman) with those of American writers of color, lesser-known women writers, and writers from the colonial world. The Archive Gap as I am conceiving of it has several dimensions. For simplicity’s sake, I’ll split my consideration into two parts, one historical and the other contemporary.

The historical dimensions of the Archive Gap are largely outside of our control. To put it quite simply, we’ll never be able to recover what was never preserved. There are, however, strategies we can use to address the pattern of omission of artifacts related to marginalized writers, which I'll discuss briefly at the end of this post.



In our contemporary moment, I do think that the emphasis of digital archives and thematic research collection on canonical, Euro-American writers beginning in the 1990s has had an impact on the constitution of what we now think of DH as a whole. I do not think it’s an irreversible problem (and indeed, there are signs that the field is changing in ways that may lead to the lessening of the Gap), but I haven't seen other scholars work through the precise problem that I am describing here. So I'll take a stab at it and hope I'm not repeating what everyone else already finds obvious.

Many of the primary editors of early digital archives published significant scholarly works describing their decision-making processes as well as the new tools and technologies that might have been developed to create the richest and most flexible possible resources. Scholars like Martha Nell Smith, Ed Folsom, Kenneth Price, and Jerome McGann have used their opportunities to talk about their projects to help theorize what digital archives can be, and they’ve written compelling arguments that the digital turn in thematic research collection editing ought to be seen not as an extension of print editing, but as a fundamental transformation -- the advent of a new textual paradigm. The move away from linear, codex-bound printing allows a much more straightforward, indexical presentation of multiple versions of texts and modes of text. The removal of printing cost considerations makes the inclusion of large numbers of images in digital collections much easier; this in turn enables much greater editorial transparency and inclusiveness than was typically practiced during the era of scholarly editing in printed texts. (In a digital context, we can avoid the disturbing situation described by Martha Nell Smith, where the editor of the Emily Dickinson Variorum, R.W. Franklin, silently omitted sizeable chunks of text in Dickinson’s letters. [link] ) And the open-endedness of digital collections allows frequent updating of collections as new manuscripts and texts might be uncovered after a project is already underway.

The particular publishing complexities associated with the work of several these writers becomes a key feature in the design of the archives that have been built to present their work digitally. Since Blake and Rosetti were writers whose printed texts benefit from being presented alongside visual art they drew (or engraved), digital archives have been designed to help us access the visual text. And Dickinson used idiosyncratic punctuation and arranged her handwritten poems on the page in ways that are difficult to emulate in non-visual printing; she also sometimes inserted unusual artifacts (such as the famous postage stamp Dickinson placed in the middle of the page for her poem “Alone and in a Circumstance”). These visual elements are arguably inextricable from the content of the “text”; in a digital archive, we are no longer limited to a modified or reduced presentation of the work (assuming, of course, that we have the rights to use the materials).

Now, if we go to African American writers who were contemporaries of writers like Dickinson and Whitman, what kinds of archives do we find?

First, we find that important African American authors like Charles Chesnutt, Frances E.W. Harper, and Claude McKay were either completely omitted from the first wave of digital archive projects, or represented by extremely minimalist archives lacking the complex architecture, metadata, and ancillary texts seen in archives of more canonical figures.

(People who study 20th-century African American literature might well note that there are many important black writers whose names are not on the short list above. Keep in mind that the vast majority of the writings of Langston Hughes, Richard Wright, Ralph Ellison, Nella Larsen, Anne Spencer, and Zora Neale Hurston are still under copyright. So one way to reduce the Archive Gap might be to support measures that would weaken the stranglehold of copyright law. Even moving the public domain goalpost to the year 1930 would be incredibly important in terms of potential digital availability of key works in African American literature.)

Some of the limitations in the print archives are also due to structural issues that are beyond the scope of the DH community to change. Claude McKay, for instance, does not appear to have kept thorough manuscripts for his early poems. The Beinecke Library at Yale and the Schomburg Collection in New York City have collections of McKay’s papers, but McKay appears to have only preserved the carbon typescripts of early collections of his poems (“Harlem Shadows”), not handwritten manuscripts. Without manuscripts, we cannot analyze the revisions someone like Whitman made on his own manuscripts of Leaves of Grass (as Folsom and Price have described it in Re-Scripting Whitman, those revisions tell fascinating stories regarding the representation of African Americans as well as homosexuality in Whitman’s work). Why didn’t more black writers from this period keep extensive collections of their handwritten manuscripts? It’s quite plausible that McKay threw out (or lost) most of his early manuscripts simply because he moved around so much (he lived much of his mature life in Europe and North Africa). But we might also speculate that it had something to do with the sense of self-importance Whitman had at the beginning of his career – the sense that these manuscripts he was producing beginning in the 1850s would be artifacts that he and others would one day want to see. A correlative personal imperative to preserve is lacking, for various reasons, for McKay. 

In short, the Archive Gap is something we’ve inherited from earlier generations of editors and librarians – what kinds of materials from black writers did earlier librarians and collectors want to keep? It’s also something we’ve inherited from the authors themselves – the highly varied amounts of social capital they had access to may have affected the way they valued their own textual materials.

We’ve already mentioned that writers like Charles Chesnutt, William Wells Brown, and Frances Harper were largely overlooked in earlier digital archive projects. Even now, while the Charles Chesnutt archive is certainly very functional and elegant, it is a much more rudimentary project than are the various digital archives dedicated to more canonical American authors [update from 2022: this has been rectified. Check out the new Chesnutt Archive here]. For its part, the “North American Slave Narratives” project funded by the NEH (now nearly a decade ago) is impressively thorough and complete. But it lacks a strong sense of editorial presence, and its keyword search is very limited. If you wanted to search for “African American women as victims of sexual violence at the hands of slaveowners,” you wouldn’t be able to do that with this collection as currently organized. Perhaps, in addition to supporting and funding the initial building of these resources, it might be worth considering how and whether to continue updating and upgrading them so they continue to improve and grow in functionality. (To be sure, sites like the William Blake Archive are probably due for an update as well.)

Some early African American writers left behind legacies that are as complex and variegated as those of major canonical figures like Whitman or Dickinson. William Wells Brown’s Clotel, for instance, was published in four rather different versions, evolving and changing in parallel with the Civil War. Christopher Mulvey’s electronic scholarly edition of Clotel includes all four versions of the novel and gives us the ability to read them in parallel, but it’s striking that two out of sixteen installments of the serially published 1860-61 edition of the book are in fact completely lost. (In place of the text itself, Mulvey gives us a reconstructed synopsis of what must have been in the missing chapters.) This is a classic example of the Archive Gap I am describing. The newspaper in which Clotel was published in 1860-61 was an African American paper that ran for three years called The Weekly Anglo-African; it has been incompletely preserved. While this type of loss does occur from time to time amongst white writers who were Brown’s peers, it is not common. A newspaper like The New York Saturday Press, for instance, was much more thoroughly and carefully preserved by earlier generations of librarians and collectors than was The Weekly Anglo-African – giving today’s digital archivists much more working material to start from.

What are some strategies for navigating around the various dimensions Archive Gap I have been outlining?

One might be to work with the materials we do have and build on that. And in the American context, there remains a wealth of materials that haven’t been thoroughly studied. The “Colored Conventions” project at the University of Delaware might represent an example of an area of research where we apparently have extensive documentation of activist meetings held by African Americans throughout the 19th century. At least among literary historians, these papers may not have been looked at as closely as, say, slave narratives.

Another strategy might be to reconceive what we are aiming to do with the archives and the materials we do have. Major archives of canonical figures tend to emphasize the neutral and idealized presentation of the materials. Any references to politics, and any specific points of editorial advocacy are carefully downplayed. What if we reconceived our role as archivists and editors? Perhaps our role in presenting materials should be as much to advocate for the authors themselves – and along the way, offer actual interpretations of their works – as it is to present their textual materials?

A third project might entail rethinking the structure of digital archives and a move away from single-author archives towards thematic collections that show how larger groups of writers were linked together and interacting at a given point in time. Thus we make digital archives more inclusive and address the Archive Gap by conceiving of our work online as part of a broader project of challenging the canon. As I understand it, this is the impetus behind my colleague Ed Whitley's "The Vault at Pfaff's" and it informs my own thinking behind the project I have recently started, on "The Kiplings and India." In the Indian context, one way to complicate the myth of a heroic Kipling who invented a whole new genre of writing while working as an assistant editor at a forgotten newspaper in Lahore might be to show that he was not alone in Lahore. He had co-workers, including several Indian employees of the Civil & Military Gazette, whose ideas and voices informed his own. He had a father whose deep knowledge of India gave him a huge jump start. And he had a sister and mother who were both literary in their own right, and who frequently collaborated with the aspiring writer Rudyard Kipling.

As a final note, I spent some time today looking at the kinds of grants that have been funded by the NEH in recent years, and was struck by how many new projects dealing with issues of racially and otherwise marginalized writers are now underway. Upon completion, those projects will undoubtedly have an impact on how we see the digital humanities as a field. I would be remiss at this point if I didn't also acknowledge the work of scholars like Lauren Klein, who has a recent essay on "The Image of Absence: Archival Silence, Data Visualization, and James Hemings" that also takes an interest in what isn't in the archives. Klein is an example of a rising star in DH circles whose quest for new methodologies and modes of visualization are deeply connected to a social justice project.

So with developments like these, the Archive Gap may already be fading. But even as that change is underway, might it be worth our while to note that it existed in the first place?



[Updates from 2022: Since authoring this blog post originally in the fall of 2015, I've continued to develop digital projects that speak to the questions raised several years ago. See my corpus projects on African American Literature and Colonial South Asian Literature. Also see my latest project: African American Poetry: a Digital Anthology.]