You are here: Home / Methodology / Using CTS references

Using CTS references

We use an adapted version of software provided by the Homer Multitext project to power our CTS identifiers "behind the scenes". The Homer Multitext project are (at the time of writing) updating the supporting software for CTS, from a Google AppEngine implementation to a SPARQL/RDF-based implementation. To make your own CTS identifiers work, we recommend contacting the Homer Multitext project: Chris Blackwell, Neel Smith and colleagues for their software.

Creating CTS URN identifiers for SAWS texts

CTS stands for Canonical Text Services.

CTS URN* identifiers follow this pattern:

"urn" : "cts" : [namespace] : [textgroup] . [work] . [edition/translation/version]

For example, take the SAWS edition of Proclus, The Elements of Physics / Institutia Physica, as edited by Ritzenfeld, which takes the CTS ID of urn:cts:greekLit:tlg4036.tlg006.saws01

This means:

CTS namespace (a collective name for a type of text) = "greekLit" (all texts in the Greek manuscript tradition)
CTS textgroup (a collective name for a group of related texts) = "tlg4036" (all texts attributed to Proclus, under the TLG identifier 4036)
CTS work (a work that is represented in the text) = "tlg006" (the TLG identifier for the work 'Elements of Physics', also known as 'Institutia Physica', which exists independently of any particular individual version or edition).
CTS version (a particular edition, translation or version of a text) = "saws01" (the first edition of this work by SAWS)

To refer to individual sections of this document, CTS identifiers take a form such as urn:cts:sawsTexts:tlg4036.tlg005.saws01:Prop-45.ci1

This means that within this SAWS edition of Proclus Institutia Physica, there is a Section called Prop-45 and within that Section there is a ContentItem ci1.

More explanation and documentation for CTS (Canonical Text Services) references and CITE references can be found via the Homer Multitext project documentation pages.

Terms such as Section and ContentItem are defined in our SAWS ontology - a vocabulary for making statements about our documents. See the SAWS ontology documentation for more details.

Converting the CTS URNs to http:// identifiers

For SAWS we wanted to make a CTS URL*, an identifier starting with http:// . We preferred this form of identifier because an http:// identifier can be used in a web browser to return web pages.

So each URN was prepended with /cts. Taking our earlier example: urn:cts:sawsTexts:tlg4036.tlg005.saws01:Prop-45.ci1 becomes /cts/urn:cts:greekLit:tlg4036.tlg006.saws01:Prop-45.ci1

This means that our CTS identifiers can now be used as web addresses. We can retrieve information for each identified piece of text using its unique identifier.

Allocating identifiers to SAWS texts using the CTS scheme: some points

Most SAWS texts do not fit so neatly into the above classification, so required some further thought. In general, to allocate CTS identifiers for our texts:

We used TLG identifiers where they were available for textgroups and works, as in the Proclus example given above.
Where texts didn't have a TLG identifier, if a manuscript was being transcribed and there is an identifier for that manuscript, then this identifier was used for the text group. For example, Kitab al-Haraka, in the Hacı Mahmud Efendi 5683 manuscript, becomes urn:cts:sawsTexts:HME5683.KHar.saws01
For a work with a recognised collective name, e.g. Appendix Gnomica [Codex Vaticanus Graecus 742: 63v-70r], this was used as part of the identifier: urn:cts:greekLit:VatGr742.ApG_Vat.saws01
Where no manuscript identifiers or collective names were available, we represented other aspects of the text description as closely as possible. e.g. the SAWS edition of Miskawayh, Fawz al-asghar, excerpts, using Udayma's edition, becomes urn:cts:sawsTexts:Misk.Fawz.sawsUda01

You can see the CTS ids for SAWS texts at this page.

NB some technical explanations of terms

URN = Universal Resource Name,
URI = Universal Resource Identifier,
URL = Universal Resource Location.

Without going too much into the details of the differences between these three things, they are all types of ID to uniquely identify something. A URI is the generic term, including URL and URN. So a URL is also a URI, and a URN is also a URL. The differences are in the format of the identifiers. For our purposes, we are mostly interested in URLs starting http:// These URLs can be used to retrieve web pages when put into a web browser.