Further Details about the Vacancy

The vacancy for a Django/Python web developer closes soon, and I have had some enquiries about the exact work that would be required from this person, so I thought it might be useful to give my own very personal thoughts about some of the things we want the person to do. Of course any applicants should focus on the formal Job Description and Person Specification in the application pack on the King's HR Website, as this is what will be used for selecting candidates, but I hope that the informal discussion here might also be useful and encourage you to apply.

General Points

First, it's worth emphasising that the European Research Council who fund this are very keen on 'thinking big'. They have put up a lot of money and they want a lot in return: the goal of this project really is to transform the field, setting the standard for the next ten years or so. This is a lot of responsibility, but it's also an exciting opportunity and brings with it the time and resources potentially to do something really well and which could change things for a lot of people.

Be sure to read all pages of the About section carefully. You will see that the idea of DigiPal is to present material about handwriting in new ways, combining both verbal description and visual images to allow searching, manipulation, and so on. So I want to be able to ask questions like 'show me images of all letters in manuscripts attributed to Canterbury which have ascenders [i.e. the parts of letters that go above the line in b, d, h, l etc.] so that I can see how Canterbury scribes wrote them, and let me compare those to ones from Worcester', and so on. The developer would be writing the software to allow all of this to happen.

Two years full-time is quite a lot, and the developer will be dedicated to the project, meaning that there shouldn't be any need to rush but rather there should be plenty of opportunity to plan carefully, implement properly, document properly and so on. In fact, it's being used as a flagship for how projects should be run in future. On the other hand, there are pressing needs in terms of getting a usable interface running so that the researchers can do their work, so incremental development will be needed.

The person would be the lead developer and would do most of the work in practice, but there would still be contributions from the existing project team. Specifically, one or two people will still be helping with the interface design and usability testing, one person on the mapping, and probably still one other doing some of the development as required.

One of the goals is to have a generalised framework, and the funding body are quite keen on that and have put up quite a lot of money for it. Although the focus needs to be on delivering a functional product, there is plenty of scope for imagination and things that aren't necessarily crucial to the main work on eleventh-century script but which would have clear benefit to others down the line.

Architecture and Data Model

DigiPal has a fairly standard PostgreSQL + Django + JQuery/Javascript architecture (including OpenLayers) and a JPEG2000 image server. The RDB model is largely in place, so most of the work in practice will be Django development, plus some Javascript.

The existing model is set up and operating for handwriting from England in the eleventh century, which is essentially the same alphabet we use today (for an extract see Describing Handwriting Part IV). However, we also need to extend the framework to allow other forms of writing. We already have two PhD students planning to use it, one for 12th-century Latin from Scandinavia and one for 15th-century Hebrew. We hope also to get additional funding to extend it for cuneiform tablets and possibly also Greek inscriptions. This suggests a modular design which would in turn require the developer to redesign what we have now.

The existing data model doesn't follow any current standard (simply because no standard comes even close to what we want to describe). However, we would like users to be able to download data exported to standard form, presumably some form of RDF and perhaps also TEI. (TEI isn't really appropriate but it's what a lot of our users will probably want.) I'd also be extremely happy to have a linked data web service of some sort.

Data-Entry Interface

There will be a 'data-entry'/admin part and a 'public' part of the web interface. Both parts need to be designed carefully. The 'core' material of eleventh-century script will have about 1200 samples of handwriting. About half of these will have images, so we'll need to annotate the images and describe the letters we've annotated. For all 1200 we'll also need to describe the forms of letters which the scribes habitually wrote, though without images. For example, Scribe 123 might habitually write y with a long tail and hooked top, and so two scribe-letter-feature triples would need to be recorded for this ('Scribe 123 writes letter y with long tail', 'Scribe 123 writes letter y with hooked top'). We can therefore reasonably expect to have to enter somewhere in the region of 70,000+ scribe-letter-feature sets. This means that the data entry needs to be very quick and responsive: just a two-second wait per data-point will result in over a week of lost productivity throughout the life of the project!

The data entry end also needs to function off-line as researchers will need to work in libraries which won't always have internet access. We can safely assume that people won't be offline for more than a couple of days at a time, so presumably this will involve some sort of batch update. The existing system doesn't allow for this at all.

Public Interface

The complex data-model and unusual content raises all sorts of questions about interface design which the developer would also be involved in. For example, there is obvious scope for faceted browsing, mapping, timelines and so on ('Give me a map plotting all scribes who habitually wrote the letter g in this particular way' and so on). I'm sure there are other more creative things that could be done here as well, and I hope the person appointed would have lots of ideas about this.

Regarding the front end, we've built in quite a lot of scope for user testing, focus groups and so on, so that will need to be considered in the development schedule. Again, we need incremental development with regular testing, and ideally the developer would also play a role in the user testing itself (though we do have specialists for this).

As well as all the searching and manipulating, we'd also want users to be able to create accounts and then log in to save searches and so on. They should then be able to make those searches publicly visible if they want and with stable URLs, so they can refer to them from elsewhere as supporting evidence to their arguments.

Although not emphasised here, there is also a real opportunity for outreach, bringing this material beyond scholars and to the wider public. Again, the appointed person would be responsible for any development that is done here and ideally would contribute ideas as well.

Those are my thoughts at the moment. Of course it's still not the full spec, and there will be many smaller tasks such as scripts to refactor legacy content; there is also the potential for presenting at conferences. I'm sure there are other points as well, but does this help to give you some idea? Do let me know if not, or if you have any other questions.

Comments

Posts by Date

Posts by Author

Feeds

RSS / Atom