About the Project
Documentation
The Theme: Understanding Our Research Framework
As our team included members with a strong interest in Japanese culture — including a Japanese member — we were naturally drawn to selecting a theme rooted in Japan's rich cultural heritage. Among the many possibilities, the Japanese Tea Ceremony stood out for its symbolic depth and multifaceted nature.
The Tea Ceremony is far more than a ritual of serving and drinking tea. It embodies a complex system of values, aesthetics, philosophies (such as Zen), and traditional craftsmanship, all performed through carefully choreographed actions. This makes it an ideal subject for semantic modeling and Linked Open Data, where meaningful connections between diverse entities can be expressed and visualized.
Additionally, because the Tea Ceremony has been extensively documented in both Japanese and international contexts, it allows for the integration of existing open data resources, including museum collections, historical figures, and conceptual vocabularies. This supports both the preservation and reinterpretation of cultural heritage through digital tools.
Item Selections: Curating Quality Data
The item selection process began with a brainstorming session in which we explored various aspects of Japanese culture that interested us. Among the topics that emerged were kimono decorations, the aesthetic concept of wabi-sabi, ikebana (the art of flower arrangement), and traditional craftsmanship. As we discussed these, we realized that many of these themes were deeply embedded in the broader cultural domain of the Japanese tea ceremony. This ceremony, with its highly symbolic and multi-sensory nature, offered a unifying framework that could meaningfully encompass the other concepts we were drawn to. Therefore, we decided to focus our project on the theme of the Japanese tea ceremony.
To identify relevant cultural heritage items, we conducted targeted research across institutional collections such as those of The Met and the British Museum. We received support and suggestions through ChatGPT, which helped us refine our queries and discover objects already accompanied by metadata in prospect of the metadata analysis we had to do later on.
A central item that emerged during this process was The Book of Tea by Okakura Kakuzō. This short but rich text (56 pages) not only aligned perfectly with the core themes of our project but also served as a conceptual bridge between many of the original ideas we had discussed. Because of its foundational relevance, we decided to read the full text carefully, annotating passages that could inspire or justify the inclusion of specific objects in our collection. The version of the book with notes on connections can be found here (inside the GitHub repository).
To manage the item selection collaboratively, we used a shared Notion workspace throughout the project. This allowed us to discuss, compare, and evaluate potential objects in real time. We prioritized items that were clearly connected to the tea ceremony and could be directly linked to specific excerpts or ideas in The Book of Tea. This method proved to be especially helpful when we began building our theoretical model, as it ensured conceptual coherence between the textual and material dimensions of our project.
Metadata Analysis: Extracting Meaningful Information
This step involves the identification and examination of the metadata standards employed by the institutions providing the selected items.
Three of our items—the Portrait of Rikyū, the Teaspoon, and the Teabowl—originate from the Metropolitan Museum of Art (The Met). The Met holds an extensive collection of Asian art, a significant portion of which is Open Access, thereby facilitating easier referencing and reuse. The Met offers an API key that allows users to access and download detailed item information in JSON format, alongside a dataset available in CSV format. However, in terms of metadata standards, The Met does not adhere to a single recognized standard; instead, it employs a combination of vocabularies and approaches. A detailed explanation of how these vocabularies were formed could not be found. Therefore, we decided to describe the data according to the CDWA(Categories for the Description of Works of Art) in order to understand and organize the categorical structure of the descriptions. This served as a foundation for representing the data ontologically using CIDOC-CRM, a formal ontology designed to model cultural heritage information, for the RDF production step.
For the items from the British Museum (the Woodblock print), the National Gallery of Art (the Japanese Footbridge), and the Museum of Fine Arts, Boston (Kimono), no identifiable metadata standards or downloadable metadata files were available. Consequently, we chose to align their descriptive information using the CDWA framework, as we had done with the items from The Met. We then applied the same methodology for ontology design using CIDOC-CRM.
The ukiyo-e painting titled Invitation to a Tea Ceremony was sourced from Europeana. Europeana provides an API that grants access to an extensive range of digital collections from cultural heritage institutions across Europe in JSON format. Europeana requires that all aggregators and data partners map their original metadata to the EDM (Europeana Data Model). EDM primarily utilizes its own schema, designated with the edm:
namespace, but also incorporates widely adopted vocabularies such as dcterms
and skos
. As a result, we chose to describe the relationships surrounding this item using EDM.
Regarding the bibliographic item The Book of Tea, the eBook version was sourced from the Library of Congress, which provided metadata in MARCXML, DCTERMS, and MODS formats. For the purpose of understanding the metadata, we adopted MODS for description as it is human-readable. Then we applied BIBFRAME to describe classes and properties formally in our ontology.
The other two bibliographic items—The Illustrated Book of Ikebana and Mysterious Japan—were sourced from the Internet Archive, which also provides metadata in MARCXML. Consequently, we applied MODS for descrition, BIBFRAME for ontology representation to these items as well.
Developing the Theoretical Model: Building Conceptual Frameworks
While reading The Book of Tea, we began identifying recurring concepts and underlying themes within the text. During this process, we created an initial hand-drawn sketch to visualize the conceptual relationships between the main ideas explored in the book. This preliminary diagram later evolved into our formal Theoretical Model as well as an RDF graph representing the connections found in selected chapters, particularly the opening and closing ones. These diagrams illustrate how the different themes, referred to as “keywords/concepts” in our encoded TEI file, interact and inform one another throughout the text.
As our main source, The Book of Tea once again proved essential: not only was it the basis for identifying relationships between themes, but it also served as the central node around which all our selected items could be meaningfully connected. The book is widely regarded as a classic in Japanese cultural discourse, especially in regard to the philosophy and aesthetics surrounding tea, which further justified its central role in our theoretical model.
To create the visual model collaboratively, we used Miro, which allowed us to work together remotely and iteratively. Through Miro, we formalized the diagram by identifying the relationships between objects and themes using clear, human-readable predicates. This step laid the foundation for the later development of our conceptual model, offering a structured approach to organizing both textual and object-based data.
In order to ensure semantic consistency, we developed a shared list of predicates written in CamelCase format. For this, we initially consulted ChatGPT to obtain a list of commonly used predicates. We then adapted and refined this list based on our specific needs, renaming some of the predicates and tailoring others to better reflect the conceptual connections unique to our project. The final list of predicates used is included below.
- PREDICATES (all the arrows going in and out for ...):
for WHAT (object/general)
- isPartOf
- isUsedIn
- depicts
- hasClassification
- hasStyle
- isA
- isATypeOf
- isIllustradedIn
- isProducedBy
- hasGenre
- hasMaterial
- hasSubject
- isIntroducedIn
- describes
- isDepictedIn
- isRelatedTo
for WHERE (place)
- hasPlace
for WHO (people)
- isPoducedBy
- isPublishedBy
- isInscriptedBy
- hasAuthor
- isMadeBy
- isAMasterOf
- isIntroducedIn
- isIllustradedIn
for WHEN (date)
- isPartOf
- hasDate
for SADO (domain)
- isPartOf
- isUsedIn
- isAMasterOf
- isExplainedIn
- illustratesEtiquetteOf
- isIllustradeIn
- isWornIn
for ITEMS (our 10 items)
- isUsedIn
- isPartOf
- isProducedBy
- depicts
- hasCover
- hasAuthor
- isIntroducedIn
- isExplainedIn
- isInscriptedBy
- describes
- isIllustradedIn
- isMadeBy
- illustratesEtiquetteOf
- hasSubject
- hasDate
- isPublishedBy
- hasPlace
- hasMaterial
- hasGenre
- isA
- hasClassification
- isProvidedBy
- usesTechnique
Developing the Conceptual Model: Practical Implementation
In the Theoretical Model phase, we identified and explored the relationships between the metadata of each item, the tea ceremony, and the associated concepts. After this analytical process, we moved on to the development of the Conceptual Model to represent the findings in a more formal structure.
As a starting point we created custom URIs for the items selected for the project, setting the base URI.
The Base URI: https://w3id.org/a-lod-of-tea
Regarding the choice of schemas, we prioritized the metadata vocabularies identified in the earlier metadata analysis phase:
CIDOC-CRM for museum items
EDM for the item sourced from Europeana
BIBFRAME for books and archival materials
These were used as the primary frameworks, with adjustments made as needed in subsequent stages.
To formally represent the subject–predicate–object relationships, we began with the items themselves as the subject. For each predicate, we investigated how best to express the relationship identified in the theoretical model by using appropriate properties from existing ontology schemas. The object was then modeled as an abstract class. the relations of some particularly relevant to the project's thematic focus were later described into concrete entities during the RDF production phase.
While we prioritized domain-specific schemas aligned with each item's metadata standard (CIDOC-CRM, EDM, and BIBFRAME), in practice we encountered several cases where the intended relationships could not be adequately expressed within those frameworks alone. To address these gaps, we incorporated general-purpose ontologies such as FOAF for modeling Person
and Organization
, SKOS for representing concept
entities. This allowed for greater flexibility while maintaining semantic coherence across heterogeneous data sources
Relationships related to the tea ceremony are rarely linear. The practice encompasses complex and interwoven concepts, often without clearly defined hierarchies or causal structures. For instance, the tea ceremony is influenced by Taoism and Zen, and is closely connected with the concept of wabi-sabi. Physical elements such as ikebana, the tearoom, and roji are interconnected with each of these concepts. These relationships cannot be captured through simple binary associations.
Moreover, the tea ceremony, like many traditional practices, has multiple schools and lineages, each with its own philosophies and interpretations. Our main reference, The Book of Tea, has served to shape the basis of our linking also in the conceptual modelling phase. Rather than focusing on the practical aspects of the ceremony, Okakura emphasizes its spiritual and conceptual significance. Inspired by this, we designed our ontology to reflect and incorporate these broader philosophical dimensions.
Full Text Analysis and Transformation: Deep Content Processing
For the full text analysis, we focused on The Book of Tea by Okakura Kakuzō as our central textual source - the other two items that are also books represent illustrations or photographs as their main content so this is the only full-text item. Our goal was to manually encode the text using the TEI (Text Encoding Initiative) P5 guidelines - we began by transcribing and encoding the text in a TEI/XML file, applying appropriate structural and semantic markup writing the mandatory sections at first, following the slides created by Professor D'Aquino, and enriched the annotations adding more sections based on our encoding needs and also to connect it to our domain as much as possible. This included encoding chapters, paragraphs, and significant concepts using tags such as "div","p", "head" and "term" - the last one depicts the main keywords and concept that are underlined in the book and are related to our domain. We also tagged names of people ("persName"), places ("placeName"), added bibliographic references ("bibl") - since there are two publications and the encoded text refers to the transcribed version of the original - allowing for a more precise mapping of the knowledge embedded in the book. To enhance semantic richness, we linked the mentioned places to external authority files using GeoNames URIs, where available. Additionally, we compiled a structured list of the people and places referenced in the book using ("listPerson") and ("listPlace"), making the encoded file both human and machine-readable.
To convert the TEI file into RDF, we opted for an XSLT-based transformation. Instead of writing a transformation pipeline in Python, we used FreeFormatter to simplify the process. With guidance from ChatGPT (as suggested from the professor because of the complexity level of XSLT), we wrote a custom XSLT stylesheet, ensuring the use of the correct XML and RDF namespaces. After testing and refining the stylesheet, we uploaded both the TEI/XML and XSLT files to the FreeFormatter tool. The transformation produced an RDF/Turtle file that captured essential entities and relationships from the encoded text, especially from the introductory and final chapters. To present the logic of this transformation clearly, we created the HTML page version of the XSL file that shows the key mappings and semantic structure used in the conversion.
Additionally, as requested, we developed a Python script (with the help of ChatGPT) capable of transforming the TEI/XML file into RDF, offering an alternative method for data processing and enabling more flexibility in how we worked with the file. This script also made it easier to validate our RDF output, which we checked using RDF Grapher to ensure consistency and correctness.
During this process, we created a dedicated URI for our own digital edition of the book ("http://example.org/bookoftea/the-book-of-tea-digital-edition") to distinguish it from the URI used for the main cultural heritage item representation of the book, which served as a conceptual hub connecting the other selected objects. So the RDF Graph of this digital edition is independent fron the one referring to the entire book as an item, linked into the combination of RDFs graph comprehensive of all the other items we selected for the project.
The produced files (view them separately following the links below):
View them all together in the dedicated folder here!
Ontology Design and RDF Production: Semantic Web Technologies
In parallel with the development of the Conceptual Model, we designed the ontology to support the transformation of data into RDF format. The ontology structure was informed by the metadata standards either adopted by our project or provided by partner institutions. Each item’s descriptive information was first organized into CSV files, reflecting both individual attributes and relationships derived from our conceptual and theoretical models.
To describe concretely abstract classes modeled conceptually, in addition to the custom URIs of our items, we also decided to assign them to people, activities, places, and abstract concepts closely connected to the theme and items (such as Sen no Rikyū, Zen, wabi-sabi, or the Japanese tea ceremony itself). The full list of generated custom URIs is as follows, with the base URI as https://w3id.org/a-lod-of-tea/
:
item
- rikyu-portrait
- teabowl
- teascoop
- kimono
- book-of-tea
- samurai-woodblock
- teagarden
- ikebana-book
- tea-ceremony-painting
- japanese-bridge
activity
- tea-ceremony
- ikebana
concept
- taoism
- wabisabi
- zen
person
- sen-no-rikyu
- okakura-kakuzo
- honami-koetsu
place
- tearoom
- roji
Instances of more general real-world entities were instead aligned with existing external vocabularies such as Wikidata and GeoNames.
Each item was treated as a subject in a triple structure, with relevant predicates and objects forming the full description. These descriptions were mapped to ontology schemas also referenced in the conceptual model. They may also be along with a owl:sameAs
, in the case that a wikidata entry of the item exists.
In the case of non-item entities (such as abstract concepts or locations), separate triples were constructed with these entities as the subject, along with an rdf:type
and rdfs:label
to enhance human readability and semantic clarity.
The transformation of the CSV-based data into RDF was implemented in Python using the rdflib
library, and serialized in Turtle syntax. Turtle was chosen for its compact structure and readability, especially for URI-based data.
The final RDF dataset contains 127 triples, and was visualized using the RDF Grapher tool.
Prefixes Used (in Turtle Syntax)
@prefix tea: <https://w3id.org/a-lod-of-tea/> .
@prefix crm: <https://www.cidoc-crm.org/> .
@prefix edm: <http://www.europeana.eu/schemas/edm/> .
@prefix bf: <http://id.loc.gov/ontologies/bibframe/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix schema: <https://schema.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix wd: <https://www.wikidata.org/wiki/> .
@prefix gn: <http://www.geonames.org/> .
Finally, all RDF outputs were merged into a single RDF file using Python. While combining everything into one CSV file from the start would also have been possible, we opted for separate files per item to support a more structured and step-by-step modeling process.
The produced files:
CSV files
Triples
Python files
Turtle files
Full RDF data set
Resources: Research References and Sources
Research:
- The Book of Tea: https://www.loc.gov/resource/gdcebookspublic.2019299129/
Ontology schemas:
- Europeana Data Model: https://pro.europeana.eu/discover-the-data/about-europeana-eu
- CIDOC-CRM: https://cidoc-crm.org/
- BIBFRAME: https://id.loc.gov/ontologies/bibframe.html
- Dublin Core Terms: https://www.dublincore.org/specifications/dublin-core/dcmi-terms/
- SKOS: https://www.w3.org/2009/08/skos-reference/skos.html
- RDF: https://www.w3.org/TR/rdf-schema/
- FOAF: http://xmlns.com/foaf/0.1/
- Wikidata: https://www.wikidata.org/wiki/
- GeoNames: http://www.geonames.org/
- CDWA: https://www.getty.edu/publications/categories-description-works-art/
- MARCXML: https://www.loc.gov/standards/marcxml/
- DBPEDIA: http://dbpedia.org/ontology/
- dc elements: http://purl.org/dc/elements/1.1/
- Library of Congress Subjects Headings: http://id.loc.gov/authorities/subjects/
- SCHEMA1: http://schema.org/
- TEI: http://www.tei-c.org/ns/1.0
- XSD: http://www.w3.org/2001/XMLSchema#
Additional ones for the full text:
Institutions:
- The Metropolitan Museum of Art: https://www.metmuseum.org/
- The Met API: https://metmuseum.github.io/
- Europeana: https://www.europeana.eu/
- Europeana API: https://www.europeana.eu/
- Museum of Fine Arts, Boston: https://www.mfa.org/
- British Museum: https://www.britishmuseum.org/
- National Gallery of Art: https://www.nga.gov/
- Library of Congress: https://www.loc.gov/
- Internet Archive: https://archive.org/
Tools:
- RDF Grapher: https://www.ldf.fi/service/rdf-grapher
- RDFLib: https://rdflib.readthedocs.io/en/stable/
- Miro: https://miro.com
- Notion: https://www.notion.com
- Free formatter: https://www.freeformatter.com
- ChatGPT: https://chatgpt.com
- Google spreadsheets: https://docs.google.com/spreadsheets/u/0/
- Canva (to make our logo): https://www.canva.com
Team
We are master’s students in Digital Humanities and Digital Knowledge at the University of Bologna, with a shared passion for culture and technology. We’re interested in making connections and finding meaning within the humanities and exploring how to shape and express them in digital contexts.