The Problem of Digital Dating, Part I

A significant and unsolved problem in digital resources for medieval and earlier material is how to represent dates or, rather, uncertain date periods. This is a problem that I have addressed in the past, particularly at the 2011 ISAS Conference in Wisconsin, and it has come up again as part of the recent Meeting for COST Action IS1005, 'Medieval Cultures and Technical Resources'. The problem is that we often do not know exactly when something happened: when a manuscript was written, when an artefact was constructed, when a coin was lost. This, of course, is normal, but it becomes a problem when we introduce the computer. Although so-called 'fuzzy logic' has been around for a while now, the fact remains that computers fundamentally are designed for 'clear' answers – the famous digital 'ones and zeros', 'yes or no'. But how does 'early eleventh century' fit into this? Does it come before or after 'somewhere between the years 1000 and 1015'? Does it include the year 999 (let alone the year 1000)? Who decides?

Conventions for Representing the Date of Manuscripts

To make sense of this discussion (and the DigiPal Database), it is important to be familiar with the main convention for dating manuscripts. Scholars normally use the formula 'saec.' or 's.' for the Latin saeculo, meaning 'the century', followed by the century in question expressed in Roman numerals: the tenth century, for example, is therefore written 'saec. x'. 'Early', 'middle' and 'late' are represented again with Latin abbreviations: 'in.', 'med.' and 'ex.', and turn of the century with 's. x/xi' (or equivalent). Finally, the first half of the century is represented with a suprascript '1', the second half with '2', and quarter-centuries with suprascript '1/4', '2/4', '3/4' and '4/4'. Thirds of centuries are sometimes also found ('1/3', '2/3' and '3/3').

Why does it matter?

Part I: 'Pre-digital' palaeography

The system described here has worked well for a long time now, although it does raise the question what we mean by 'early', 'middle' and 'late', and this has sometimes caused problems in the past. One clear example of this is the fierce debate between David Dumville and Kevin Kiernan over what Neil Ker meant when he dated the only surviving copy of Beowulf to 's. x/xi': the critical point being whether or not the years immediately after 1016 were included in this time-span (see References below). Furthermore, informal discussions as part of the COST Action and elsewhere reveal widely varying practices, particularly between different national traditions, where 'early' can mean anything from a ten-year to a fifty-year timespan. In other words, 'saec. xi in.' to some people means 1000×1010 (or 1001×1010), to others 1000×1050 (or 1001×1050, or 1001×1051), and to others something in between. This is clearly a problem when one wishes to combine catalogues online, since one must then decide how to deal with the discrepancies. In principle this is not difficult to solve as long as we know what each cataloguer meant, but it does make automatic union catalogues almost impossible. For future cataloguing, we could also decide on an international standard and advocate its use, and indeed a group in the COST Action, lead by Matthew Driscoll of the Arnamagnæan Institute in Copenhagen, is working on just such a proposal. However, there is a much larger problem which is looming just beneath the surface.

Part II: 'Digital' palaeography

The problem is one of searching and statistics, and specifically, how people are going to use the digital resources being built. To illustrate this problem, let me present two proposals which are currently on the table for recommendation by the COST working group.

First, the MASTER standard (which was subsequently incorporated into the TEI P5 Manuscript Description Module) used the following convention (thanks to Matthew Driscoll for providing these figures):

Formula Interpretation
Circa 1400 1390×1410
Saec. xv 1400×1499
Saec. xv in. 1400×1415
Saec. xv1/4 1400×1425
Saec. xv1 1400×1450
Saec. xv2/4 1425×1450
Saec. xv med. 1440×1460
Saec. xv3/4 1450×1475
Saec. xv4/4 1475×1499
Saec. xv ex. 1485×1499

At a recent COST workshop in Florence, an alternative was proposed (thanks again to Matthew Driscoll):

Formula Interpretation
Circa 1400 1390×1410
Saec. xv 1401×1500
Saec. xv1/4 1401×1425
Saec. xv in. 1401×1433
Saec. xv1 1401×1450
Saec. xv2/4 1426×1450
Saec. xv med. 1434×1466
Saec. xv3/4 1451×1475
Saec. xv2 1451×1500
Saec. xv ex. 1467×1500
Saec. xv4/4 1476×1500

In many ways these are arbitrary numbers, and so it may seem almost irrelevant which one is chosen, but there are some critical implications that arise. Let me demonstrate with a third, hypothetical alternative which is almost the same as the Florence proposal but not quite:

Formula Interpretation
Circa 1400 1390×1410
Saec. xv 1400×1500
Saec. xv1/4 1400×1425
Saec. xv in. 1400×1433
Saec. xv1 1400×1450
Saec. xv2/4 1425×1450
Saec. xv med. 1433×1466
Saec. xv3/4 1450×1475
Saec. xv2 1450×1500
Saec. xv ex. 1466×1500
Saec. xv4/4 1475×1500

These three proposals will behave very differently indeed in an online interface:

  1. If a manuscript is dated 1400, then according to MASTER it will appear in searches for c. 1400, s. xv, s. xv in., s. xv1/4 and s. xv1. According to the Florence proposal, it will appear in searches for c. 1400, s. xiv, s. xiv ex., s. xiv2, and s. xiv4/4. According to our hypothetical proposal it will appear in both the MASTER (fifteenth-century) and the Florence (fourteenth-century) bands, due to the overlap in dates.
  2. Similarly, according to the Florence proposal a manuscript dated to 1430 will appear in searches both for s. xv in. and the s. xv2/4 (as well as for s. xv and s. xv1). According to MASTER it will only appear in searches for s. xv2/4.
  3. If we assume that the date-ranges should be sorted in order of their mid-points, then according to MASTER the late-fifteenth century comes before the fourth quarter of the century, but in the Florence and hypothetical proposals the order is reversed.

There are also some more complex consequences which arise from overlapping timespans:

  1. If we follow the Florence proposal, and I search for all manuscripts dated s. xv2/4, should I also get back all manuscripts dated to the middle of the century? On the one hand this is not what I asked for, but on the other hand any manuscript dated 's. xv med.' could have been written in s. xv2/4, and so I might want to know about it. (Of course this is not unique to the Florence proposal, either, but would manifest itself in different ways in all three.)
  2. Conversely, if I search for s. xv2/4 in the Florence system, then is it really true that I'm not interested in anything dated 1425 or 1451? What's so special about the year 1426 that I don't care about anything that was written even one day before it? Indeed, the same questions apply to the MASTER and hypothetical systems but with different year-boundaries, further emphasising this point. The sharp division between periods is not only arbitrary but is also potentially outright wrong when we remember also that the first of January was often not considered the first day of the year, let alone differences between Julian and Gregorian calendars (the reasons, by the way, why the UK tax year begins on 6 April). So when we say '1425', which calendar are we using? Do we even know?

Which of these is right? The answer, of course, is none of them is. If we date a manuscript to the 'early eleventh century' then the point is that we don't know exactly when it was written: if we did then we would say so. It's therefore wrong in many ways to define the dating periods closely like this. In principle, there is no reason why we could not proceed on this basis, storing each time-period in the computer simply as text and not assigning any numbers at all. However, this brings in other consequences, in particular that then we loose most of our potential for searching, visualising and manipulating results, which is surely why we are putting this material online to begin with. For example in this approach the dates '1405', 's. xv in.', 'c. 1400', 's. xv', and 's. xv1/4' are all entirely unrelated – to the computer they are no more similar to each other than 's. vi ex.' is – and so we would need five different searches to find manuscripts in these bands, and there would be no way of sorting them or visualising them chronologically relative to other manuscripts. Unless we can radically rethink how we represent our manuscript dates in digital form, then, we must come to some sort of agreement over how long the periods extend and what sort of results we want when we search for manuscripts by date.

In the next instalment I will raise some more theoretical points which I think may help, including the concept of accuracy and precision and the issue of research questions, as well as some more unusual interfaces. In the meantime, which years do you consider these periods to encompass? Which of the scenarios above would you prefer? Do you see a way out? Please let us know.

References

  • D.N. Dumville, 'The Beowulf Manuscript and How Not to Date It', Medieval Studies English Newsletter [Tokyo] 39 (1998), 21–27.
  • D.N. Dumville, 'Beowulf Come Lately: Some Notes on the Palaeography of the Nowell Codex', Archiv für das Studium der neueren Sprachen und Literaturen 225 (1988), 49–63.
  • K. Kiernan, Beowulf and the Beowulf Manuscript, 2nd ed. (Ann Arbor, MI, 1996).
  • N. Ker, Catalogue of Manuscripts Containing Anglo-Saxon (Oxford, 1957).

Comments

Posts by Date

Posts by Author

Feeds

RSS / Atom