Gendered Pronouns in Early 20th Century Fiction: A Simple Quantitative Study
I have long been fascinated by a DH paper published in 2018, “The Transformation of Gender in English-Language Fiction” (link here; authors were Ted Underwood, David Bamman and Sabrina Lee) that has suggested strong statistical evidence that men were increasingly dominating the world of fiction in late 19th and early 20th centuries – that between 1850 and 1950 the percentage of published novels that were authored by women dropped dramatically (from near parity to more like a third or a quarter). Thus, at the exact period when we might have expected women to be gaining visibility and influence – associated with the early 20th century suffrage movement and the appearance of important feminist voices like Virginia Woolf – they were actually losing position on the whole in the publishing world.
According to the authors, the pattern only started to reverse in the second half of the twentieth century (and today, the publishing industry would of course look very different). Also, within their fiction, “The Transformation of Gender” authors indicate that men writers tend to write more about men, while writers who are women might be closer to gender parity in the amount of time given men and women in the world. The authors suggest that particular tendency hasn’t improved or changed as much.
Incidentally, this concern with the growing marginalization of writers who were women is not a new one. The authors of “The Transformation of Gender” cite a 1989 study, Edging Women Out: Victorian Novelists, Publishers, and Social Change (Gaye Tuchman and Nina Fortin), where the authors did quantitative (but not digital!) scholarship with similar findings. Tuchman and Fortin counted and classified entries in Leslie Stephen’s Dictionary of National Biography to compare how women writers were talked about versus men writers. They found that while books by men were reviewed more frequently on the whole, the gender disparity in the more recent authors (late 19th century) became especially sharp with respect to works of nonfiction.
The authors of “The Transformation of Gender” used a very large corpus of tens of thousands of novels from HathiTrust (and checked against the smaller University of Chicago novel corpus) as well as sophisticated modeling techniques built around Natural Language Processing (NLP) to infer gender within a text and derive percentages. Some years ago, I finally gained enough confidence in basic Python to explore some of these methods on my own, using David Bamman’s BookNLP software (sadly, that software does not appear to be working at present, so I will not be using it for the results below).
One other bit of background: in the revised version of the essay published in his book, Distant Horizons, Ted Underwood mentions the Gendered Language Visualizer, a simple but deceptively powerful tool that tracks the association between non-gendered words and gendered pronouns in works of fiction.
Earlier studies: I should say that this is a more complex version of a type of analysis scholars have been doing in stylistics for many years; there are studies that go back to the 1990s that aimed to predict the gender of a writer based on characteristics of function words and articles. Koppel et al. (2002) used sophisticated statistical techniques with a fairly straightforward counting to find that writers who are men tend to use a higher proportion of noun specifiers (a, the, that), and numbers in their fiction. They also claim women tend to use more pronouns (she, herself), negation (not), and certain prepositions (for, with) and conjunctions (and). By lining up counts of these various parts of speech, the authors claim to be able to predict the gender of an author of an anonymized text with 80% accuracy. (Note: for what it’s worth, I tried to replicate their results with my own small, early 20th-century corpus, and failed.)
Moving past binarized gender thinking: Admittedly, I am not so interested in this particular application for my own research – it’s almost never the case with 20th-century fiction that the gender identity of an author is unknown. I tend to be interested in writers who pushed against conventional gender roles and expectations in any case, many of whom might be understood as LGBTQIA+ today – writers like Virginia Woolf, E.M. Forster, D.H. Lawrence, Radclyffe Hall, or Wallace Thurman. Today, most scholars would find the "predict the gender" type of analysis overly restrictive and limited by binarized gender thinking. If E.M. Forster, for example, breaks with the expected pattern in novels that feature women protagonists (spoiler: he does!), that would be a more interesting finding than simply that reconfirming that 80% men are from Mars, as it were.
A simplified method for the present study: So what if we drastically simplified the query with a corpus of early 20th-century fiction? As a starting point for thinking about patterns with respect to gendered socialization, why not simply look at gendered pronouns: he/him/his and she/her/hers? If the conclusions by Underwood et al. are correct, we should expect to see a lopsided homosocial tendency in fiction by men (men mostly talking to and about other men, and only occasionally mentioning a woman), and maybe a more balanced gender representation in fiction by women. We might also see some interesting anomalies in the patterns that might be worth exploring.
Before doing this at a mid-range scale, I was curious to see how authors I know would shake out. Over the past few months, I’ve been developing a custom corpus of early 20th-century texts. I have described the basic design of the corpus here; it contains about 1000 total texts, including about 100 texts that might be thought as canonical high modernist texts, 130 texts by African American authors, and about 90 texts associated with colonial South Asia. It also contains a substantial amount of genre fiction. The results below only reference works of fiction, though there are works of poetry, drama, and nonfiction in the corpus.
With a little help from generative AI coding assistants, I devised a simple bit of code to count the use of gendered pronouns (he, him, his vs. she, her, hers), first, in a single novel, then in a batch of files, and then derive a percentage from the total word length of the file. I then took those gendered pronoun percentages, and compared them to one another to get a ratio. Rather than overwhelm the reader with a vast array of raw data, I’ll start with some smaller findings, initially focused on gendered pronoun ratios in a small set of ‘high modernist’ works of fiction, mainly by white British and American authors. I’ll then expand the conversation to other authors and consider broadly why any of this might be significant.
From my limited ‘high modernist’ collection, what are some texts that are especially lopsided towards men? (If you expected to see Ernest Hemingway on this list, you would be right!)
Text Ratio of masculine to feminine pronouns
Ernest Hemingway, Men Without Women 11.4 to 1
Hemingway, In Our Time 9.4 to 1
James Joyce, Portrait of the Artist
as a Young Man 9.2 to 1
John Dos Passos, Three Soldiers 7.3 to 1
D.H. Lawrence, Kangaroo 4.2 to 1
Hemingway, The Sun Also Rises 3.5 to 1
James Joyce, Ulysses 3.0 to 1
James Joyce, Dubliners 2.2 to 1
E.M. Forster, A Passage to India 2.2 to 1
F. Scott Fitzgerald, The Great Gatsby 2.0 to 1
What to make of the lopsided nature of some of these texts? I should say, off the bat, that I don’t think the lopsidedness necessarily serves as an indictment of someone like Hemingway. The relative absence of women in his various short stories is partly due to their settings (several deal with soldiers and World War I, and “The Undefeated,” about an aging Spanish bullfighter, is a pretty marvelous critique of dysfunctional masculinity). Moreover, A Portrait of the Artist as a Young Man is a coming-of-age narrative for Stephen Dedalus at schools that only admit boys and men with teachers who are also only men, so it’s not a huge surprise that the social world represented in the text is also pretty lopsided. (The imbalance might have been less if Joyce had kept in more of the love interest/romantic sections that were in the original Stephen Hero version of his manuscript.) The lopsidedness of other writers (and other Joyce texts) is less extreme, though it’s striking to see novels by D.H. Lawrence and E.M. Forster here (especially since Forster, with Howards End, is also on my second list).
Again, I don’t see it as an indictment per se, or as a reason to drop Hemingway or Joyce from my syllabus, though it is still worth knowing. (Do readers want or need to see characters that match their own gender identity or expression in order to connect with a text? Probably not, but my hunch is that it might help...) Still, the pattern does appear to show that there is a pretty limited role for women in the social worlds we find in these texts. It is not as if the authors don’t know it, either: the title Men Without Women can be read as self-critique of a symptomatic nature. These are men without women, and perhaps that’s why they are so broken.
And what about woman-centered texts?
Text Ratio of feminine to masculine pronouns
Dorothy Richardson, Pilgrimage 1
Pointed Roofs 13.1 to 1
Bryher, Development 9.9 to 1
Richardson, Pilgrimage
(other volumes) [varies between 5 to 1 and 2 to 1]
Nella Larsen, Passing 4.9 to 1
Radclyffe Hall, The Unlit Lamp 3.9 to 1
Radclyffe Hall, The Well of Loneliness 2.8 to 1
Wallace Thurman, The Blacker the Berry 2.8 to 1
Gertrude Stein, Three Lives 1.6 to 1
Virginia Woolf, Mrs. Dalloway 1.6 to 1
Katherine Mansfield, The Garden Party
And Other Stories 1.6 to 1
Mansfield, Bliss and Other Stories 1.5 to 1
Virginia Woolf, The Voyage Out 1.5 to 1
Woolf, Night and Day 1.4 to 1
Woolf Orlando 1.4 to 1
Woolf, To the Lighthouse 1.3 to 1
Forster, A Room With a View 1.2 to 1
Forster, Howards End 1.2 to 1
It was not hugely surprising to see Pilgrimage: Pointed Roofs as the most lopsided text in the high modernist selection from my text corpus. Pointed Roofs is the story of a young woman teaching at a girls’ boarding school, so, as with Portrait of the Artist above it is not surprising that it reflects a homosocial world with largely girls and women as characters.
I was intrigued to see a book by a man, Wallace Thurman, come out fairly high on this list (2.8 to 1). I am not entirely sure what to make of it; the novel in question is a thoughtful and often bitter account of colorism within the Black community with a woman protagonist.
The bigger takeaway might be that the pattern described by Underwood et al. appears to be in evidence out with this small group of high modernist writers – writers who were women were, on the whole, less lopsided than were their peers who were men. Instead of a ratio of 10 to 1 or 4 to 1 or even 2 to 1, the median here for writers like Woolf and Mansfield – two of the core authors in the modern feminist canon – is closer to 1.5 to 1.
Expanding the Range of Authors: Genre Fiction Writers
Now, let’s move to the broader dataset. The first discovery might be that the gendered pronoun disparity can be wildly lopsided in adventure fiction and westerns:
Text Ratio of masculine to feminine pronouns
Zane Grey, The Young Pitcher 630 to 1
Zane Grey, Ken Ward in the Jungle 433 to 1
G. K. Chesterton, The Man Who Was
Thursday 139 to 1
H. G. Wells, The First Men in the Moon 104 to 1
John Buchan, Prester John 101 to 1
Lord Dunsany, The Gods of Pegana 57 to 1
L. Frank Baum, The Master Key 47 to 1
Dhan Gopal Mukerji, Kari the Elephant 44 to 1
G. K. Chesterton, The Man Who Knew
Too Much 43 to 1
Jack London, The Call of the Wild 18 to 1
John Buchan, The Thirty-Nine Steps 15 to 1
Dorothy Sayers, Lord Peter Views
The Body 5.8 to 1
Agatha Christie, The Big Four 4.0 to 1
The scale of lopsidedness is pretty vast – and consistent – with early 20th century men who wrote westerns, science fiction, and detective fiction all showing a highly lopsided social world. (I ran hundreds of titles for this study, and am only including a few noteworthy titles on these tables; readers who want to see the raw data should contact me...) Even women who wrote detective fiction tended to show a version of it, though Dorothy Sayers’ Lord Peter Views the Body (at 5.8 to 1) is still much less imbalanced than something like The Young Pitcher (another narrative of a young man at school, with no girls or women about).
And what about woman-centered texts?
Text Ratio of feminine to masculine pronouns
Rokeya Hossain, Sultana’s Dream 10.8 to 1
Vita Sackville-West, The King’s Daughter 5.7 to 1
Edith Wharton The Old Maid 5.4 to 1
Elinor Glyn, Man and Maid 5.0 to 1
Gertrude Atherton, The Living Present 3.8 to 1
L. M. Montgomery, Anne of Green
Gables 3.2 to 1
Somerset Maugham, Liza of Lambeth 2.9 to 1
Louis Bromfield, The Green Bay Tree 2.4 to 1
Edith Wharton, The House of Mirth 2.4 to 1
Zane Grey, The Call of the Canyon 2.4 to 1
Temple Bailey, Judy 2.0 to 1
H.G. Wells, Ann Veronica 1.8 to 1
Again, while there are some texts that are highly woman-centered (Sultana’s Dream is a feminist utopia with men kept in enclosures, while women run the world), the imbalance for romance fiction writers like Elinor Glyn or girl-oriented children’s fiction writers like L.M. Montgomery is considerably less pronounced than with their counterparts who were men.
Given how lopsided Zane Grey generally is, it is interesting to see one of his novels here (a shell-shocked World War I veteran moves to Arizona and has to choose between two different women). It’s also noteworthy to see an instance of H.G. Wells’ “new woman” fiction here.
If anyone would like to see the full / raw data I would be happy to share it; please email me.