About the Project

Documentation

The Theme Item Selections Metadata Analysis Developing the Theoretical Model Developing the Conceptual Model Full Text Analysis and Transformation Ontology Design and RDF Production Resources

The Theme: Understanding Our Research Framework

As our team included members with a strong interest in Japanese culture — including a Japanese member — we were naturally drawn to selecting a theme rooted in Japan's rich cultural heritage. Among the many possibilities, the Japanese Tea Ceremony stood out for its symbolic depth and multifaceted nature.

The Tea Ceremony is far more than a ritual of serving and drinking tea. It embodies a complex system of values, aesthetics, philosophies (such as Zen), and traditional craftsmanship, all performed through carefully choreographed actions. This makes it an ideal subject for semantic modeling and Linked Open Data, where meaningful connections between diverse entities can be expressed and visualized.

Additionally, because the Tea Ceremony has been extensively documented in both Japanese and international contexts, it allows for the integration of existing open data resources, including museum collections, historical figures, and conceptual vocabularies. This supports both the preservation and reinterpretation of cultural heritage through digital tools.

Item Selections: Curating Quality Data

The item selection process began with a brainstorming session in which we explored various aspects of Japanese culture that interested us. Among the topics that emerged were kimono decorations, the aesthetic concept of wabi-sabi, ikebana (the art of flower arrangement), and traditional craftsmanship. As we discussed these, we realized that many of these themes were deeply embedded in the broader cultural domain of the Japanese tea ceremony. This ceremony, with its highly symbolic and multi-sensory nature, offered a unifying framework that could meaningfully encompass the other concepts we were drawn to. Therefore, we decided to focus our project on the theme of the Japanese tea ceremony.

To identify relevant cultural heritage items, we conducted targeted research across institutional collections such as those of The Met and the British Museum. We received support and suggestions through ChatGPT, which helped us refine our queries and discover objects already accompanied by metadata in prospect of the metadata analysis we had to do later on.

A central item that emerged during this process was The Book of Tea by Okakura Kakuzō. This short but rich text (56 pages) not only aligned perfectly with the core themes of our project but also served as a conceptual bridge between many of the original ideas we had discussed. Because of its foundational relevance, we decided to read the full text carefully, annotating passages that could inspire or justify the inclusion of specific objects in our collection. The version of the book with notes on connections can be found here (inside the GitHub repository).

To manage the item selection collaboratively, we used a shared Notion workspace throughout the project. This allowed us to discuss, compare, and evaluate potential objects in real time. We prioritized items that were clearly connected to the tea ceremony and could be directly linked to specific excerpts or ideas in The Book of Tea. This method proved to be especially helpful when we began building our theoretical model, as it ensured conceptual coherence between the textual and material dimensions of our project.

Metadata Analysis: Extracting Meaningful Information

This step involves the identification and examination of the metadata standards employed by the institutions providing the selected items.

Three of our items—the Portrait of Rikyū, the Teaspoon, and the Teabowl—originate from the Metropolitan Museum of Art (The Met). The Met holds an extensive collection of Asian art, a significant portion of which is Open Access, thereby facilitating easier referencing and reuse. The Met offers an API key that allows users to access and download detailed item information in JSON format, alongside a dataset available in CSV format. However, in terms of metadata standards, The Met does not adhere to a single recognized standard; instead, it employs a combination of vocabularies and approaches. A detailed explanation of how these vocabularies were formed could not be found. Therefore, we decided to describe the data according to the CDWA(Categories for the Description of Works of Art) in order to understand and organize the categorical structure of the descriptions. This served as a foundation for representing the data ontologically using CIDOC-CRM, a formal ontology designed to model cultural heritage information, for the RDF production step.

For the items from the British Museum (the Woodblock print), the National Gallery of Art (the Japanese Footbridge), and the Museum of Fine Arts, Boston (Kimono), no identifiable metadata standards or downloadable metadata files were available. Consequently, we chose to align their descriptive information using the CDWA framework, as we had done with the items from The Met. We then applied the same methodology for ontology design using CIDOC-CRM.

The ukiyo-e painting titled Invitation to a Tea Ceremony was sourced from Europeana. Europeana provides an API that grants access to an extensive range of digital collections from cultural heritage institutions across Europe in JSON format. Europeana requires that all aggregators and data partners map their original metadata to the EDM (Europeana Data Model). EDM primarily utilizes its own schema, designated with the edm: namespace, but also incorporates widely adopted vocabularies such as dcterms and skos. As a result, we chose to describe the relationships surrounding this item using EDM.

Regarding the bibliographic item The Book of Tea, the eBook version was sourced from the Library of Congress, which provided metadata in MARCXML, DCTERMS, and MODS formats. For the purpose of understanding the metadata, we adopted MODS for description as it is human-readable. Then we applied BIBFRAME to describe classes and properties formally in our ontology.

The other two bibliographic items—The Illustrated Book of Ikebana and Mysterious Japan—were sourced from the Internet Archive, which also provides metadata in MARCXML. Consequently, we applied MODS for descrition, BIBFRAME for ontology representation to these items as well.

Developing the Theoretical Model: Building Conceptual Frameworks

While reading The Book of Tea, we began identifying recurring concepts and underlying themes within the text. During this process, we created an initial hand-drawn sketch to visualize the conceptual relationships between the main ideas explored in the book. This preliminary diagram later evolved into our formal Theoretical Model as well as an RDF graph representing the connections found in selected chapters, particularly the opening and closing ones. These diagrams illustrate how the different themes, referred to as “keywords/concepts” in our encoded TEI file, interact and inform one another throughout the text.

As our main source, The Book of Tea once again proved essential: not only was it the basis for identifying relationships between themes, but it also served as the central node around which all our selected items could be meaningfully connected. The book is widely regarded as a classic in Japanese cultural discourse, especially in regard to the philosophy and aesthetics surrounding tea, which further justified its central role in our theoretical model.

To create the visual model collaboratively, we used Miro, which allowed us to work together remotely and iteratively. Through Miro, we formalized the diagram by identifying the relationships between objects and themes using clear, human-readable predicates. This step laid the foundation for the later development of our conceptual model, offering a structured approach to organizing both textual and object-based data.

In order to ensure semantic consistency, we developed a shared list of predicates written in CamelCase format. For this, we initially consulted ChatGPT to obtain a list of commonly used predicates. We then adapted and refined this list based on our specific needs, renaming some of the predicates and tailoring others to better reflect the conceptual connections unique to our project. The final list of predicates used is included below.

PREDICATES (all the arrows going in and out for ...):

for WHAT (object/general)

isPartOf
isUsedIn
depicts
hasClassification
hasStyle
isA
isATypeOf
isIllustradedIn
isProducedBy
hasGenre
hasMaterial
hasSubject
isIntroducedIn
describes
isDepictedIn
isRelatedTo

for WHERE (place)

hasPlace

for WHO (people)

isPoducedBy
isPublishedBy
isInscriptedBy
hasAuthor
isMadeBy
isAMasterOf
isIntroducedIn
isIllustradedIn

for WHEN (date)

isPartOf
hasDate

for SADO (domain)

isPartOf
isUsedIn
isAMasterOf
isExplainedIn
illustratesEtiquetteOf
isIllustradeIn
isWornIn

for ITEMS (our 10 items)

isUsedIn
isPartOf
isProducedBy
depicts
hasCover
hasAuthor
isIntroducedIn
isExplainedIn
isInscriptedBy
describes
isIllustradedIn
isMadeBy
illustratesEtiquetteOf
hasSubject
hasDate
isPublishedBy
hasPlace
hasMaterial
hasGenre
isA
hasClassification
isProvidedBy
usesTechnique

Developing the Conceptual Model: Practical Implementation

In the Theoretical Model phase, we identified and explored the relationships between the metadata of each item, the tea ceremony, and the associated concepts. After this analytical process, we moved on to the development of the Conceptual Model to represent the findings in a more formal structure.

As a starting point we created custom URIs for the items selected for the project, setting the base URI.
The Base URI: https://w3id.org/a-lod-of-tea

Regarding the choice of schemas, we prioritized the metadata vocabularies identified in the earlier metadata analysis phase:

CIDOC-CRM for museum items

EDM for the item sourced from Europeana

BIBFRAME for books and archival materials

These were used as the primary frameworks, with adjustments made as needed in subsequent stages.

To formally represent the subject–predicate–object relationships, we began with the items themselves as the subject. For each predicate, we investigated how best to express the relationship identified in the theoretical model by using appropriate properties from existing ontology schemas. The object was then modeled as an abstract class. the relations of some particularly relevant to the project's thematic focus were later described into concrete entities during the RDF production phase.

While we prioritized domain-specific schemas aligned with each item's metadata standard (CIDOC-CRM, EDM, and BIBFRAME), in practice we encountered several cases where the intended relationships could not be adequately expressed within those frameworks alone. To address these gaps, we incorporated general-purpose ontologies such as FOAF for modeling Person and Organization, SKOS for representing concept entities. This allowed for greater flexibility while maintaining semantic coherence across heterogeneous data sources

Relationships related to the tea ceremony are rarely linear. The practice encompasses complex and interwoven concepts, often without clearly defined hierarchies or causal structures. For instance, the tea ceremony is influenced by Taoism and Zen, and is closely connected with the concept of wabi-sabi. Physical elements such as ikebana, the tearoom, and roji are interconnected with each of these concepts. These relationships cannot be captured through simple binary associations.

Moreover, the tea ceremony, like many traditional practices, has multiple schools and lineages, each with its own philosophies and interpretations. Our main reference, The Book of Tea, has served to shape the basis of our linking also in the conceptual modelling phase. Rather than focusing on the practical aspects of the ceremony, Okakura emphasizes its spiritual and conceptual significance. Inspired by this, we designed our ontology to reflect and incorporate these broader philosophical dimensions.

Full Text Analysis and Transformation: Deep Content Processing

For the full text analysis, we focused on The Book of Tea by Okakura Kakuzō as our central textual source - the other two items that are also books represent illustrations or photographs as their main content so this is the only full-text item. Our goal was to manually encode the text using the TEI (Text Encoding Initiative) P5 guidelines - we began by transcribing and encoding the text in a TEI/XML file, applying appropriate structural and semantic markup writing the mandatory sections at first, following the slides created by Professor D'Aquino, and enriched the annotations adding more sections based on our encoding needs and also to connect it to our domain as much as possible. This included encoding chapters, paragraphs, and significant concepts using tags such as "div","p", "head" and "term" - the last one depicts the main keywords and concept that are underlined in the book and are related to our domain. We also tagged names of people ("persName"), places ("placeName"), added bibliographic references ("bibl") - since there are two publications and the encoded text refers to the transcribed version of the original - allowing for a more precise mapping of the knowledge embedded in the book. To enhance semantic richness, we linked the mentioned places to external authority files using GeoNames URIs, where available. Additionally, we compiled a structured list of the people and places referenced in the book using ("listPerson") and ("listPlace"), making the encoded file both human and machine-readable.

To convert the TEI file into RDF, we opted for an XSLT-based transformation. Instead of writing a transformation pipeline in Python, we used FreeFormatter to simplify the process. With guidance from ChatGPT (as suggested from the professor because of the complexity level of XSLT), we wrote a custom XSLT stylesheet, ensuring the use of the correct XML and RDF namespaces. After testing and refining the stylesheet, we uploaded both the TEI/XML and XSLT files to the FreeFormatter tool. The transformation produced an RDF/Turtle file that captured essential entities and relationships from the encoded text, especially from the introductory and final chapters. To present the logic of this transformation clearly, we created the HTML page version of the XSL file that shows the key mappings and semantic structure used in the conversion.

Additionally, as requested, we developed a Python script (with the help of ChatGPT) capable of transforming the TEI/XML file into RDF, offering an alternative method for data processing and enabling more flexibility in how we worked with the file. This script also made it easier to validate our RDF output, which we checked using RDF Grapher to ensure consistency and correctness.

During this process, we created a dedicated URI for our own digital edition of the book ("http://example.org/bookoftea/the-book-of-tea-digital-edition") to distinguish it from the URI used for the main cultural heritage item representation of the book, which served as a conceptual hub connecting the other selected objects. So the RDF Graph of this digital edition is independent fron the one referring to the entire book as an item, linked into the combination of RDFs graph comprehensive of all the other items we selected for the project.

The produced files (view them separately following the links below):

View them all together in the dedicated folder here!

Ontology Design and RDF Production: Semantic Web Technologies

In parallel with the development of the Conceptual Model, we designed the ontology to support the transformation of data into RDF format. The ontology structure was informed by the metadata standards either adopted by our project or provided by partner institutions. Each item’s descriptive information was first organized into CSV files, reflecting both individual attributes and relationships derived from our conceptual and theoretical models.

To describe concretely abstract classes modeled conceptually, in addition to the custom URIs of our items, we also decided to assign them to people, activities, places, and abstract concepts closely connected to the theme and items (such as Sen no Rikyū, Zen, wabi-sabi, or the Japanese tea ceremony itself). The full list of generated custom URIs is as follows, with the base URI as https://w3id.org/a-lod-of-tea/:

item

rikyu-portrait
teabowl
teascoop
kimono
book-of-tea
samurai-woodblock
teagarden
ikebana-book
tea-ceremony-painting
japanese-bridge

activity

tea-ceremony
ikebana

concept

taoism
wabisabi
zen

person

sen-no-rikyu
okakura-kakuzo
honami-koetsu

place

tearoom
roji

Instances of more general real-world entities were instead aligned with existing external vocabularies such as Wikidata and GeoNames.

Each item was treated as a subject in a triple structure, with relevant predicates and objects forming the full description. These descriptions were mapped to ontology schemas also referenced in the conceptual model. They may also be along with a owl:sameAs, in the case that a wikidata entry of the item exists.

In the case of non-item entities (such as abstract concepts or locations), separate triples were constructed with these entities as the subject, along with an rdf:type and rdfs:label to enhance human readability and semantic clarity.

The transformation of the CSV-based data into RDF was implemented in Python using the rdflib library, and serialized in Turtle syntax. Turtle was chosen for its compact structure and readability, especially for URI-based data.

The final RDF dataset contains 127 triples, and was visualized using the RDF Grapher tool.

Prefixes Used (in Turtle Syntax)


                          @prefix tea: <https://w3id.org/a-lod-of-tea/> .
                          @prefix crm: <https://www.cidoc-crm.org/> .
                          @prefix edm: <http://www.europeana.eu/schemas/edm/> .
                          @prefix bf: <http://id.loc.gov/ontologies/bibframe/> .
                          @prefix dcterms: <http://purl.org/dc/terms/> .
                          @prefix foaf: <http://xmlns.com/foaf/0.1/> .
                          @prefix owl: <http://www.w3.org/2002/07/owl#> .
                          @prefix schema: <https://schema.org/> .
                          @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
                          @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
                          @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
                          @prefix wd: <https://www.wikidata.org/wiki/> .
                          @prefix gn: <http://www.geonames.org/> .

Finally, all RDF outputs were merged into a single RDF file using Python. While combining everything into one CSV file from the start would also have been possible, we opted for separate files per item to support a more structured and step-by-step modeling process.

The produced files:
CSV files
Triples
Python files
Turtle files
Full RDF data set

Resources: Research References and Sources

Research:

The Book of Tea: https://www.loc.gov/resource/gdcebookspublic.2019299129/

Ontology schemas:

Europeana Data Model: https://pro.europeana.eu/discover-the-data/about-europeana-eu
CIDOC-CRM: https://cidoc-crm.org/
BIBFRAME: https://id.loc.gov/ontologies/bibframe.html
Dublin Core Terms: https://www.dublincore.org/specifications/dublin-core/dcmi-terms/
SKOS: https://www.w3.org/2009/08/skos-reference/skos.html
RDF: https://www.w3.org/TR/rdf-schema/
FOAF: http://xmlns.com/foaf/0.1/
Wikidata: https://www.wikidata.org/wiki/
GeoNames: http://www.geonames.org/
CDWA: https://www.getty.edu/publications/categories-description-works-art/
MARCXML: https://www.loc.gov/standards/marcxml/

Additional ones for the full text:

DBPEDIA: http://dbpedia.org/ontology/
dc elements: http://purl.org/dc/elements/1.1/
Library of Congress Subjects Headings: http://id.loc.gov/authorities/subjects/
SCHEMA1: http://schema.org/
TEI: http://www.tei-c.org/ns/1.0
XSD: http://www.w3.org/2001/XMLSchema#

Institutions:

The Metropolitan Museum of Art: https://www.metmuseum.org/
The Met API: https://metmuseum.github.io/
Europeana: https://www.europeana.eu/
Europeana API: https://www.europeana.eu/
Museum of Fine Arts, Boston: https://www.mfa.org/
British Museum: https://www.britishmuseum.org/
National Gallery of Art: https://www.nga.gov/
Library of Congress: https://www.loc.gov/
Internet Archive: https://archive.org/

Tools:

RDF Grapher: https://www.ldf.fi/service/rdf-grapher
RDFLib: https://rdflib.readthedocs.io/en/stable/
Miro: https://miro.com
Notion: https://www.notion.com
Free formatter: https://www.freeformatter.com
ChatGPT: https://chatgpt.com
Google spreadsheets: https://docs.google.com/spreadsheets/u/0/
Canva (to make our logo): https://www.canva.com

Team

We are master’s students in Digital Humanities and Digital Knowledge at the University of Bologna, with a shared passion for culture and technology. We’re interested in making connections and finding meaning within the humanities and exploring how to shape and express them in digital contexts.

aLODofTEA