Automated Text Recognition with ChatGPT 4

Re-posted from the Greflinger Digitale Edition Weblog

During a presentation for the Digital Meetup at the Department of Archaeology, Conservation and History at the University of Oslo yesterday, the historian Ian Milligan (University of Waterloo) mentioned briefly the stunning results he got when using ChatGPT for automated text recognition of historical documents. I don’t remember which materials he had used, but I was keen to try it out and see how the recognition performs compared to Transkribus public models‘ since this is my text recognition platform of choice.

There were a couple of obstacles, though. Firstly, I tried the University of Oslo’s ChatGPT, which is currently the GPT-3.5 Turbo and can process data that I, as a researcher employed by UiO, cannot share with third parties unless they are vetted and approved by the IT Department. This version, though, does not allow for the upload of images or accessing image sources via a URL. I tried my private, free subscription to Open AI’s ChatGPT and GPT-3.5 model and was equally unsuccessful. What was I doing wrong?

In my confusion, I turned to my social network on Mastodon to ask for advice: My colleague Christof Schöch from the University of Trier responded and pointed me towards a test he had performed with a historical image with some text on where ChatGPT was able to recognize text and transcribe it correctly. However, there was no info on which version was used or how this was done. I began doubting my ability to see an upload button or any information on how I would get an image into ChatGPT. A quick Google Search pointed me mainly toward business applications which used the Open AI API, not something I felt I wanted to dive into right now. Then, I guess it was on Reddit that someone pointed out that there’s an image upload function in ChatGPT-4 with the Plus subscription. Okay, so I first checked the UiO version of ChatGPT-4, where this feature was unavailable. The Open AI platform led me to a subscription, and I decided to try out Plus since it has access to Beta features and other specifications. 25$ later, finally, I had access. After some tweaking of settings, I managed to get the little image upload button to show up and voilà!

First test: Letter from Christopher Hansteen, 1828

I decided to go all in and test the automated text recognition abilities of ChatGPT-4 with a letter by Norwegian astronomer Christopher Hansteen to his wife.1 The letter is part of the collection of the private correspondence held by the National Library of Norway and can be seen and downloaded from their website. Following Christopher’s example, I only prompted ChatGPT with the following image and instructions:

Digital image of a letter written by Christopher Hansteen to his wife Johanne Christine Andrea Hansteen on October 18th, 1828.

Source: National Library of Norway, Brevs. 291 Brev fra Hansteen, Christopher (1784-1873) til Hansteen, Johanne Cathrine Andrea (1787-1840) | url = http://urn.nb.no/URN:NBN:no-nb_digimanus_175851

What text do you recognize in the image?

ChatGPT answered:

The image you’ve provided appears to be a handwritten manuscript or letter, dated “Tübingen 15 October 1818.” The writing is in cursive and may be in German, given the date and location mentioned. However, due to the complexity of the handwriting and the possibility of old Germanic script, I cannot provide a full and accurate transcription of the content.

If you need information or interpretation regarding specific parts of the document, please point them out, and I will do my best to assist you.

Hmm, so that was anticlimactic. I tried with a slightly better prompt:

The image that I provided is a letter from Christopher Hansteen, astronomer, to his wife. Please transcribe the first to lines of text, below the date.

This time, the response was more promising:

I apologize for the oversight earlier. Let me take another look.

From the image you provided and the context you’ve given, the first two lines below the date appear to read:

“Geliebte Christe! Ich schreibe Dir in aller Eile einen …”

However, it’s worth noting that handwritten manuscripts, especially in cursive and from historical periods, can be challenging to transcribe accurately without expertise in the specific handwriting style and language. If this transcription is crucial, I recommend consulting with a historian or archivist who specializes in 19th-century German manuscripts.

Still, I couldn’t get more out of it without investing in significantly better prompt engineering and better knowledge of the text at hand!

Second test: Ethica Complementoria, 1674

So, instead, I turned to more familiar materials: the version of the Ethica Complementoria print I currently work on, the German print from 1674, printed in Copenhagen. I have already transcribed and proofread the text several times and trained and improved an HTR model for it in Transkribus. Here, I could use the results from the latest version of my model and the validation set to see whether ChatGPT-4 performs worse/equally well/better than my model, ethica1674_v2.

The State and University Library Hamburg, Germany, provides the image. It is page 46 of the print and shows a combination of German (printed in Fraktur) and Latin (printed in Roman cursive type).

I uploaded the image and gave ChatGPT the following context and instructions:

Ok, let’s try another image. The image I provide is of a 1674 print of the Ethica Complementoria. It’s in early new high German. The font is a Fraktur. Please take your time and transcribe the first 10 lines of text. Here is the image:

ChatGPT provided the output – funnily enough – as code (i.e. in a code box, in plain text):

Certainly. Given the characteristics you mentioned (early new high German, Fraktur font), I’ll do my best to transcribe the first 10 lines from the image you provided:

Ethica

und seyn will, hierinn nicht bewandert
wird er nicht unbillig veringert
wie jener Professor, welcher von einem
hochgelahrten Fürsten unterthänig
bat, die panaceam contra Peste ihm
gnädigst zu offenbahren; denn der selbige
Prinz ließ andeuten, daß hiero-
glyphicum ex Metamorphosi von Æsculapio
in Draconem converso, wie derselbe habe
die Pest curiret; also sollte es auch an-

Please note that while I’ve done my best to provide an accurate transcription, the nuances of early modern German and the Fraktur font can be tricky. It’s always a good idea to cross-check with an expert in historical German texts for a definitive transcription.

Okay, so that was impressive! There are a couple of mistakes, mainly small things like wrongly recognized characters and a couple of entirely made-up words, like “hochgelahrten” instead of “hocherleuchten” and the apparent normalization of historical typography to fit into a modern script, but impressive nonetheless.

I wondered whether I could improve its text recognition abilities by providing a good example of a properly-recognized text based on the image. I copied the recognized text from the Transkribus validation set (so it’s NOT 100% accurate, but almost) and gave it this prompt:

Look at your transcription and compare it to the transcription done with an expert tool, Transkribus. I have provided it below. Can you improve your transcription by learning from its transcription?

Ethica
und ſeyn will / hierinn nicht bewandert
iſt / wird er nicht unbillich verringert /
wie jener Profeſſor, welcher von einem
hocherleuchten Fuͤrſten unterthånig
bate / die Panaceam contra Peſtem ihm
gnådigſt zu offenbahren / dem der loͤb-
liche Printz ließ andeuten / das hiero-
glyphicum ex Metamorphoſi von Æſculapio
in Draconem converſo, wie derſelbe habe
die Peſt curiret / alſo ſolte er es auch an-

ChatGPT delivered and gave the following output:

Given the expert transcription from Transkribus, here’s an improved version of the first 10 lines:

Ethica
und ſeyn will / hierinn nicht bewandert
iſt / wird er nicht unbillich verringert /
wie jener Profeſſor, welcher von einem
hocherleuchten Fuͤrſten unterthånig
bate / die Panaceam contra Peſtem ihm
gnådigſt zu offenbahren / dem der loͤb-
liche Printz ließ andeuten / das hiero-
glyphicum ex Metamorphoſi von Æſculapio
in Draconem converſo, wie derſelbe habe
die Peſt curiret / alſo ſolte er es auch an-

Thank you for sharing the transcription from Transkribus. It’s always beneficial to compare and refine transcriptions for historical documents.

I realized my prompt was a bit misleading, so I reformulated what I actually wanted it to do:

Now that you know how an expert transcription looks like, can you transcribe lines 10 to 20 from the previous image in the same way as the expert?

Obediently, ChatGPT delivered:

Certainly. Using the provided expert transcription as a guide, I’ll transcribe lines 10 to 20 from the image:

ſchlagen / das iſt : Er ſolte in ſeiner
Kunſt wohl geuͤbt und erfahren ſeyn/
ſeilig ſtudiren / dabei ſtilſchweigen und
wohl nachdencken / ſapienti ſat dicium. Sie
lehren / iſt gut predigen. Es giebet
viel / mancher fuͤr einen groſſen Mann
auſſ / mit prächtigen Kleidern/ und hoch
trabenden Worten herprangen/ und
iſt doch ein muͤßiggänger Ignorant: phi-
loſophus nihil debet eſſe, qua eſſe ipſum

This transcription takes into account the nuances of the Fraktur font and the early new high German language.

Now I was really impressed! This is a stunningly accurate transcription of the page, especially given the change of font and language in the short passage. It has neatly applied the special characters and punctuation, too.

In my excitement, I posted a thread of my experiment to Mastodon, which, I guess, is the closest I can get to “going viral” (at the point of writing, I have 27 boosts/replies) given my network and reach. Christopher Pollin suggested chaining the text recognition with a transformation into XML/TEI like he has done using proper prompt engineering and thus creating a workflow from image to marked-up text using ChatGPT-4. I haven’t tried this part yet, but I will investigate it.

In Conclusion

Unexpectedly, ChatGPT-4 performed on a task I did not anticipate being able to do even, quite outstandingly. I guess that with sound prompt engineering, we can achieve a high enough performance for most use cases: searches in historical documents. It will, additionally, make the work of the textual scholar easier by taking away time-consuming and rather dull tasks like manual transcription and encoding. It is, however, questionable how it scales up. I used one page of text and let it transcribe ten lines each time.

I will contact the University of Oslo ChatGPT team and ask if and when they plan on integrating the image upload function into their service. This could be an option for researchers who need historical documents transcribed and post-processed (like: translated) quickly, without needing to set up specialized software or training HTR models.

1 Note: ChatGPTs transcription and subsequent attempt to read the letter are completely bollocks. The first line reads “Tobolsk 18. Oktober 1828”. The text is not written in German, but in Dano-Norwegian and says: “Tak mangfoldige Gange for Dine 2 trøstende kjerlige Breve. Ja vist har jeg været mere uroelig, end jeg har ladet mig mærke med;” Cited after the transcription by Kari Høgvold, accessible on the University of Oslo Library’s website as a PDF – What ChatGPT probably does here is that it assumes a letter written by a husband to his wife in that period of time will have a beginning like “Geliebte Christe! Ich schreibe Dir in aller Eile einen […]”. By the way: Hansteen’s wife’s name is Johanne Cathrine Andrea.