Describing Handwriting, Part IV: Recapitulation and Formal Model

In the previous post, I promised to give more concrete examples of how this system might work in practice. Before doing so, however, I first want to recap a bit and try to formalise the discussion so far. I was brought up with UML, Unified Modelling Language, which is a formal way of expressing entities and relationships in a way that a computer can understand. This is very technical and not at all easy to understand unless you’re already familiar with it, but I give it now partly as a way of documenting what we are doing and also for those of you who are already familiar with the system. The diagram here doesn’t follow the rules of UML to the letter but should at least be close enough to capture the idea.

My first version of a UML Class Diagram for the conceptual model so far is as follows (click on it to enlarge):


[Edit, 17 October 2011: The UML diagram has been altered from the original by associating Script, Scribe and Hand directly with Component as a way of capturing elements of style which are common across multiple characters.]

What does this mean?

  • Starting at the top, it states that a GRAPHEME is associated with one or more CHARACTERS (for ‘Character’ see below). A CHARACTER is made up of any number of COMPONENTS. A COMPONENT in turn can be found in any number of CHARACTERS and can have one or more FEATURES or indeed any number of further COMPONENTS.
  • A CHARACTER can also be manifested in one or more ALLOGRAPHS, and a set of ALLOGRAPHS makes up a SCRIPT. ALLOGRAPHS themselves can have COMPONENTS which have FEATURES, but ALLOGRAPHS also have GENERALFEATURES which are the aspects of ‘style’ discussed in Part III. A set of ALLOGRAPHS together makes up a SCRIPT.
  • Each ALLOGRAPH can be manifested in any number of IDIOGRAPHS (which in turn have COMPONENTS and GENERAL FEATURES). A set of IDIOGRAPHS makes up the practice of a SCRIBE.
  • Each IDIOGRAPH can appear on the PAGE as a GRAPH; GRAPHS have the usual set of GENERAL FEATURES and COMPONENTS, as well as a set of coordinates. The set of GRAPHS makes up a SCRIBAL HAND.
  • SCRIBAL HANDS are written by exactly one SCRIBE (but a SCRIBE can write many SCRIBAL HANDS); a SCRIBAL HAND may also be written in one or more SCRIPTS and may use one or more ALPHABETS.

Conceptual Questions

This discussion raises some conceptual questions that I can see:

  • Is GRAPH – IDIOGRAPH – ALLOGRAPH – CHARACTER – GRAPHEME a simple association or one of specialisation? I can see arguments for both.
  • Is SCRIBE the correct term here? Strictly the SCRIBE is a person, whereas the set of IDIOGRAPHS make up a scribal practice.
  • Based on the discussion of Style in Part III, shouldn’t there be relationships from SCRIBE and SCRIBAL HAND directly to GENERAL STYLE (or some other entity)?
  • Are the cardinalities correct? Can a CHARACTER really have no COMPONENTS? I think so, at least in the abstract — what are the components of a punctus, for example? Or should ‘dot’ be considered a COMPONENT, in which case even a punctus is covered?
  • The relationship names are not in place, partly because I’m not entirely confident about the terminology I used previously. In what sense is an allograph related to an idiograph? Do we have a terminology for this? It feels to me very analogous to Group 1 entities in FRBR but I’m not convinced that the terms apply directly.

Terminology (again): When is a Grapheme not a Grapheme? When it’s a Character?

I have also come to appreciate that graphemes refer only to the abstract and do not in themselves have a physical manifestation; they are therefore not relevant to palaeography and so we need another term. A useful source here is the Glossary of Unicode Terms and Section 4 in Chapter 3 of the Unicode Standard, since they have been dealing with these concepts for a long time and in a way that requires much more precision than palaeographers need. For instance, a grapheme properly has no physical form and so cannot be described or even considered in a palaeographical context; here Unicode’s Character or perhaps Abstract Character seems more correct. Note that, according to Sense (1) in the Unicode standard, ‘character’ is ‘[t]he smallest component of written language that has semantic value’: it follows, therefore, that ‘A’ and ‘a’ are instances of the same character. Note, however, that this is not the usual definition in computer science and doesn’t even seem to be the definition usually applied to Unicode which seems more like Sense (3), ‘The basic unit of encoding for the Unicode character encoding’ (sic). Presumably this is the sense meant when ‘LATIN CAPITAL A’ is described as a different character to ‘LATIN SMALL LETTER A’. This definition does seem to work here, since in palaeography we must distinguish between minuscule and majuscule forms. Unicode’s ‘Letter’ is not strictly appropriate, (a) because it excludes punctuation and other symbols such as the Tironian nota, and (b) because it’s not clear to me, at least, if it can include a visual form (what exactly is the ‘informative property’ of a character?). I therefore stick with ‘character’ for the time being, but am very open to suggestions. In the next post, I will continue towards a concrete example, starting with lists of all the characters, components, features and some allographs of English Vernacular minuscule.

About Peter A. Stokes

Senior Lecturer in the Department of Digital Humanities at King's College London. After Honours degrees in Classics and English Literature and in Computer Engineering, Peter Stokes completed a PhD at Cambridge on English Vernacular minuscule ca 990-ca 1035. He was then Research Associate from 2005 to 2007 on the LangScape project of Anglo-Saxon boundary-clauses at CCH before receiving a Leverhulme Early Career Fellowship in Palaeography at the Department of Anglo-Saxon Norse and Celtic in Cambridge, where he developed new methods of quantitative and computer-based palaeography. He then returned to CCH to work on the Anglo-Saxon Cluster and Electronic Sawyer projects before being awarded a major research grant from the European Research Council for his Digital Resource for Palaeography, Manuscript Studies and Diplomatic. He has spoken at conferences on name-studies, lexicography, Anglo-Saxon charters, image-processing and palaeography. He has been lecturing in palaeography and codicology at the University of Cambridge since 2004, in digital publishing at the Institute of English Studies since 2010, material culture of the book at King's College London in 2011, and formerly in medieval history at the University of Leicester in 2009/10.
This entry was posted in Blog. Bookmark the permalink.

4 Responses to Describing Handwriting, Part IV: Recapitulation and Formal Model

  1. Susana Tavares Pedro says:

    I completely agree that the term “grapheme” should not be used. Portuguese linguist António Emilliano wrote an article called “Issues in the Typographic Representation of Medieval Primary Sources” that is still forthcoming but avalilable online (in galley proof with minor corrections) at http://www.fcsh.unl.pt/docentes/aemiliano/documentos_diversos/EMILIANO2011_web.pdf where he discuses the use of this and other concepts — such as letter, character, glyph and graph.
    I think the article provides a very useful background for the definition of a set of concepts and terms that are a prerequisite for the task describing all the features that one can single out from actual handwriting.

  2. Paul Caton says:

    Some thoughts on your Part IV Conceptual Questions:

    Q. Is GRAPH – IDIOGRAPH – ALLOGRAPH – CHARACTER – GRAPHEME a simple association or one of specialisation?
    I think the arguments are stronger for association than for specialisation. (cf. my thoughts below about the relation between ALLOGRAPH and IDIOGRAPH)

    Q. Is SCRIBE the correct term here? Strictly the SCRIBE is a person, whereas the set of IDIOGRAPHS make up a scribal practice.
    I prefer to think in terms of a PERSON, among whose list of occupations (and often it’s the only one we know) is “scribe”. This is one of those things that is easier to model in an XML file than in a relational database.

    Q. Are the cardinalities correct? Can a CHARACTER really have no COMPONENTS? I think so, at least in the abstract — what are the components of a punctus, for example? Or should ‘dot’ be considered a COMPONENT, in which case even a punctus is covered?
    Here I would go with the latter suggestion. Conceptually, at the highest level of abstraction in the model is a completely abstract unit that has no shape, no form – nothing except the fact of it’s not being some other unit. We have (properly, I think) rejected the term GRAPHEME for such a unit, and LETTER is too narrow in scope, so I’m going to call int ONTOGRAPH for now. At one level ‘down’ in the hierarchy is CHARACTER, where the ONTOGRAPH is conceived as an ideal shape, and that shape comes from a particular spatial arrangement of one or more COMPONENTs. So then you cannot have a CHARACTER without at least one COMPONENT.

    Q. In what sense is an allograph related to an idiograph? Do we have a terminology for this? It feels to me very analogous to Group 1 entities in FRBR but I’m not convinced that the terms apply directly.

    You’re right to think of the FRBR Group 1 Entity Type case, because it has the same head-spinning chicken-and-egg dynamic. The WORK -> EXPRESSION -> MANIFESTATION -> ITEM hierarchy seems intuitive to us because it seems to mirror the sequence of production that begins with an idea in the mind of the author and ends with a book in the hands of a reader. And the dependency relations seem to plausibly follow the same sequence, ie. no EXPRESSION is possible without a prior existing WORK, etc. However, if you ask “How do we know there’s a WORK called Moby Dick?” the answer would be “Because there’s an expression of it”, and then to the question “”How do we know there’s an expression of Moby Dick?”the answer would be “Because there’s a MANIFESTATION of it”, and then to the question “”How do we know there’s a MANIFESTATION of Moby Dick?” the answer would be “Because I’m holding an ITEM of it in my hand!”. Looked at this way, the chain of dependency is reversed: no MANIFESTATION without an ITEM, no EXPRESSION without a MANIFESTATION, no WORK without an EXPRESSION.

    And something similar pertains with ALLOGRAPH and IDIOGRAPH. If you think of it from the point of view of a scribe being taught by a master, the ALLOGRAPH seems to have priority as the model which there could be no IDIOGRAPH. But if we ask “where does an ALLOGRAPH come from?”, then it seems to depend on IDIOGRAPHs, which in turn turn depend on GRAPHS.

    This is why it is so hard to specify the relations between these entities.

    Paul.

  3. This post is referred to on the following web page: AAA – ΑΔΛ : actualité paléographique et ontologie des formes alphabétiques « Paléographie médiévale

  4. This post is referred to on the following web page: Modélisation des signes graphiques (1) « Paléographie médiévale

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>