Humanities AI in 2025: Brief Reflections After a Conference

I’ll try and keep these reflections brief. There were maybe three threads I was following at the recent Humanities AI conference at Lehigh:

1) Critical AI


Slide from Greg Reihman



2) Academic Power Users. ("Generative AI is pretty useful for scholarship; here’s what I’m doing with it.")

3) Critically-engaged Best Practices. ("Yes, let's be critical of commercial generative AI, but insofar as scholars are going to use it in academic work, here are some best practices.")


Often it seems like group #1 and group #2 are engaged in non-intersecting conversations. People invested in critical AI have been hardening their position against the "plagiarism machine" in recent months, especially as we've seen it operating in the destructive DOGE cuts to the federal government. It is important to engage with those critiques -- though it might also be good to pay attention to the ways generative AI has been changing, and also alternative approaches to engaging with it. 

I'm not sure it actually happened at the conference, but if we bring the observations of critics together with the experience and knowledge base of regular practitioners, we might get to #3.




1) “Critical AI” is a phrase (and now a journal) associated with different lines of thought highlighting the many problems with the overhyped generative AI industry, with a long list of valid complaints, focusing on a broad array of topics:
  • Its domination by big tech companies jockeying for position and market position
  • Its status as business marketing ploy (“AI-powered coffee machine!”  No one needs this.)
  • The implicit biases contained within AI training data, and the crude fixes we’ve seen for those biases (see Meredith Broussard, Joy Buolamwini, Lauren Goodlad, etc.)
  • Its potential to be used for broad social surveillance and algorithmically-assisted policy harms, often sloppily deployed (i.e., DOGE cuts)
  • The problem of not knowing what exactly is in that training data (including copyrighted texts). (Bigger issue: lack of transparency of closed platforms.)
  • The concern that our experience of it as a “magic black box” continues to lead people to react to it as both a miraculous thing (harbinger of “AGI”) and as catastrophic (AGI doomerism).
  • The magic black box is also of course a huge problem for teachers dealing with students who abuse the technology: it’s too easy to get answers and ‘good enough’ paper drafts. Generative AI as a way to avoid cognitive labor & the real and valuable struggle of trying to write.
  • Environmental costs – exorbitant water and power demands, often invisible to the average user
  • Its tendency to hallucinate – to create bad and made-up data, invent sources that don’t exist, and engage in faulty reasoning. (Gen AI as a linguistic statistical modeling machine…)
  • The danger of it intensifying the epidemic of social isolation, loneliness, and epistemic insularity that has already been underway since the advent of the smartphone. People are increasingly turning to generative AI for companionship and therapy. Some of those uses could be benign (maybe a few trial runs with an AI therapist could lead humans to realize they might benefit from seeking out a real, human therapist). But the companionship use-cases have a lot of depressing possible outcomes, including a growing risk of personal dependency on the machine.  One hears anecdotally about young people turning to AI in lieu of human romantic partners, or marriages breaking up, etc.

My favorite moment along the lines of Critical AI conversation at the conference came from my colleague, Greg Reihman, who said, “What if, instead of calling the technology we are talking about ‘Generative AI,’ we had called it “Computational Text Generation Devices?”

My initial thought to that rhetorical question was, “if we called it that, probably none of us would be here in this room...” Which is to say, if it weren’t generally referred to as “AI” -- with all the science fiction baggage and mythos associated with that term -- probably the topic would be of interest to a narrow slice of computational linguists and natural language processing people in Computer Science. We wouldn’t have a room full of academics in fields like Religion, Philosophy, History, Asian Studies, Art & Design, and English all talking about it. Our collective investment in this topic is a result of marketing, of hype, of mainstream awareness.





2) AI Power Users. 

Alongside several papers expressing some version of the above points, there were papers from people who use generative AI on a regular basis to do their academic research.

This conference had a substantial representation of Asian Studies scholars; some of those scholars were using it for translation and historical research: recent gen AI models have apparently made significant advances on translating from Chinese. Others were using it to query and engage in highly specialized topical research. I won't say too much about specific research, though you can get a sense of how that was talked about from the titles and backgrounds of the presenters on the program.

My friends and collaborators Anna Preus and Melanie Walsh have been using it to classify poetry, with some interesting results. See their published article on this here

There was also a paper on “Vibe Coding” that walked through how people, both in the industry and academia, are using gen AI to bypass traditional software development and coding. This can work on a limited scale, but it comes with a lot of problems, especially if you’re building software that might need to be updated, maintained, or used by lots of people. 



3) Best Practices & A Couple of Useful Tools

Could we synthesize the critiques from group #1 with the observations made by people in group #2? If we are going to use generative AI for scholarly research, there might be a set of best practices we might want to employ.

A) Open Access. There seems to be a consensus that we should turn to open-access models instead of commercial generative AI like ChatGPT. For academic research, look for models with specialized / tailored training data (like the "Historical Perspectives Language Model", which could be used to study how language and usage have changed over time). 

One big reason for this is that we don’t want to be subject to the whims and vagaries of whatever Sam Altman is Tweeting about today. We also don't want to be 'locked in' as consumers willing to pay whatever price OpenAI wants to charge ($200 a month???). This might address the big business / tech billionaire complaint to some extent. (Not entirely: Llama is an open-access model, but it is of course created by Meta with the long-term goal of helping the company make money.)

Another possible value of open models is that we can know much more about what's in those models and how they work.  This might address the lack of transparency complaint. 

But another good reason to do this is actually about our own costs – open access models running offline are, as I understand it, free so we could feel empowered to try queries that might otherwise use too many ‘tokens’.  

B) Generative AI is constantly changing. It would probably be best if we stopped making very generalized claims along the lines of “AI simply can’t do that.” 


One feature of many papers I saw is an acute awareness that the platforms are constantly evolving and changing. The weird outputs of generative AI images from DALL-E a couple of years ago are mostly gone, as my colleague Jenny Kowalski talked about in her presentation. In their place are pretty generic, very average and acceptable images.

Some platforms still have trouble doing text with images, but others now do that very well (the latest ChatGPT).


Also, generative AI platforms also couldn’t do math very well a couple of years ago. Now they are much better at it (still not perfect). 

Overall, the commercial platforms are very aware that their long-term usefulness to a large swath of users – along with their commercial viability – will be greater if they can do a variety of tasks reasonably well, some of which might involve text generation, while others will involve reasoning and data analysis. So, especially since DeepSeek emerged a few months ago, they appear to have been making investments in building up those things rather than simply getting larger and larger datasets.

One task for scholars in the short term might be to try and keep track of what generative AI platforms are doing and how they’re changing. (Ideally, it would be great if the platforms themselves would document all the changes they’re making with each version in plain English. But if they’re not doing that, maybe we should be doing it.)


C) There are tools that can help us look inside training data used by models.

One is called “What’s in My Big Data?” This looks inside the pre-training data of DOLMA, an open-access AI model created by AllenAI. You can query specific pieces: is this particular novel in the training data? Instead of speculating or relying on the libgen catalog, WIMBD allows you to get under the hood of the generative AI models (and we can assume that what’s in DOLMA is probably at least somewhat similar to what might be in other models)

Unfortunately, WIMBD in its current form will be shutting down soon due to the costs of running it.

Another cool tool people were talking about is OLLAMA. OLLAMA allows you to run various gen AI models on the command line on your own machine. You can also configure them to develop answers to queries based specifically on libraries you might have on your own hard drive.

This way, your use of GenAI remains “offline” – the companies aren’t taking your data, and you can explore queries that might be closely related to your main research area. 


Needless to say, if you're running one of these models locally, you don't have to pay for a subscription -- it's free. 

If we approach generative AI in these ways, we might be able to avoid some of the worst pitfalls of the commercial generative AI industry. We might also hopefully stop seeing it as a magic black box and start seeing it as a research tool to facilitate certain limited tasks, not to replace human expertise, but to supplement it.