dokieli on the Authentic Web

More details about this document
Identifier
https://csarven.ca/presentations/dokieli-authentic-web
Author
Sarven Capadisli
Virginia Balseiro
María Ruiz de Assín de los Santos
Published
Modified
License
CC BY 4.0
Language
English
Document Type
Slideshow
Inbox
https://csarven.ca/presentations/inbox/dokieli-authentic-web/
Topics
Audience

dokieli on the Authentic Web

W3C Credible Web Community Group, Authentic Web, Bern,

Sarven Capadisli's avatar Sarven Capadisli https://csarven.ca/#i @csarven
Virginia Balseiro's avatar Virginia Balseiro https://virginiabalseiro.com/#me @yesvirginia
María Ruiz de Assín de los Santos's avatar María Ruiz de Assín de los Santos https://es.linkedin.com/in/maria-ruiz-de-assin

Knowledge Graph

View this presentation's knowledge graph.

Overview

  • About dokieli
  • Research
  • Implementation
  • Challenges
  • Discussion

okieli ~ dokieli

What is dokieli?
Free and open source software.
Principles: user autonomy, universal access.
Use cases: authoring, publishing, annotating, sharing, knowledge organisation, credibility assessment, visualisation.
Think modern day WorldWideWeb browser-editor or Amaya.
Why does it exist?
Frustration with scholarly communication, decentralised web publishing..
To demonstrate standards-based solution.
Who is it for?
Readers, researchers, journalists, educators and speakers, bloggers and content creators, technical authors, developers and technologists.
Do you have a roadmap?
Yes (credibility assessment, internationalization and localization, web browser extension, modularization, collaborative editing, end-to-end encryption, mobile support, knowledge organisation)

See also: history of dokieli

Support and Collaboration

Challenges

General problem space
  • Information overload, inefficiency, uninformed decision-making or lack of meaningful context.
  • Echo chambers, misinformation, and disinformation.
  • Centralization, surveillance, and tracking.
  • Control/ownership of content and identity.
Our problem space
The web platform lacks built-in authenticity and credibility assessment methods.
Goals
  • Equip individuals with tools to critically engage with content, using contextual insights to assess credibility by highlighting gaps, providing supplemental data, and incorporating annotations.
  • Credibility assessment is performed by individuals or communities using dokieli's indicators, rather than dokieli itself. This design encourages digital literacy and supports independent critical thinking.

dokieli background research

Peer-reviewed articles:

Research Methodology

Mixed-methods approach

  • Online qualitative focus group discussions

    • 15 participants: creators (journalists), disseminators (teachers), consumers (citizens)
    • Digital skills among all participants
    • 3 focus groups: semi-structured interview protocol organized around three core thematic areas:

      • Digital Identity and Privacy
      • Information Consumption and Trust
      • Decentralisation
  • Online questionnaire

    • 40 participants representing diverse nationalities, age groups, professional backgrounds, and digital confidence levels.
    • IT and Software Development (67%)
    • journalism (4%)
    • education/content creation (2%)
    • other professions (21%)
    • Digital confidence levels were notably high (82% reporting high or very high confidence levels)
    • Same thematic structure as the focus group

Key Findings

Results show consistent patterns across digital identity, information consumption, and decentralisation

  • Digital identity

    • Digital identity is only partly under their control
    • Perceived lack of control
    • Higher technical skills don’t reduce concern
    • Distinction between privacy and intimacy
    • Tools for protection of data
  • Information consumption

    • Comparison and verification of sources
    • News and information as symbolic violence (emotion)
    • Journalists have the most diversified and complex information ecosystem
    • (Poll): 70% recognise algorithms influence what they see. This reveals inequalities.
  • Decentralisation

    • Most citizens are unfamiliar with decentralised technologies
    • (Poll): high familiarity among respondents
    • Main barrier they identify is that few people around them actually use decentralised tools, followed by usability issues and technical complexity
    • Digital identity, information consumption, and engagement with decentralised tools all reflect social and digital inequalities.
    • Gender imbalance in participation

Next Steps

  1. Concept refinement and requirement definition: sharpen the concept of the platform
  2. Continue feature development and refinement
  3. Iterative user testing ⟶ improving cycles
  4. Pilot program: training, user feedback collection and analysis, impact assessment and case studies

Use cases

Readers
As a reader of online content, I want to see contextual credibility indicators while browsing articles in order to assess the trustworthiness of the information.
Creators
  • As a fact-checker, e.g., journalist, activist, I need to annotate and share fact-checked insights on web content in order to provide transparency and counter misinformation.
  • As a student or researcher, I want to highlight, organize, and reference credible sources directly within online content in order to streamline my research process.
  • As a content creator, I want to produce content for public consumption, and assess and process information sources to maintain their credibility.
Disseminators
As a teacher or trainer, I want to make sure that the information I distribute is credible and trustworthy.

Technical approaches

Inspection
Web Annotations for assessing, bookmarking, classifying, commenting, highlighting, linking, replying, questioning, and tagging.
Citations: refutes, extends, cites as evidence, cites for information, cites as authority.
Memento and Robust Links: Time travel, check content drift and link rot.
Corroboration
Notifications requesting claim verification. Fact-check-motivated Web Annotations.
Reputation
Annotations using personal storage and social graphs with provenance data.
Transparency
Labelling information and aggregating machine-readable disclosures.

NER

  • Client-side: lightweight and privacy-preserving, but slightly less accurate.
  • Server-side: evaluation can also be performed depending on user's preferences.

Claim check

  • can run server-side (faster, more accurate, smaller client footprint) or client-side (avoids server calls but increases file size).
  • Uses transformer-based text classification on the ClaimBuster dataset.

Oh yeah?

Background: TimBL's Design Issues, The "Oh yeah?" button (1997).

  1. The user highlights a sentence on a webpage.
  2. Clicks the "Oh yeah?" button.
  3. The "Oh yeah?" panel shows:

    • Content analysis and knowledge extractions (topic and named entity recognition, sentiment, content warnings)
    • Content validation (topic and named entity recognition, sentiment, content warnings)
    • Content analysis and knowledge extractions (missing citations, source validation)
    • Fact-Checking and evidence (claim checks, Wikidata, nanopublications, WHOIS, relevant annotations)
    • Request claim check from social network and domain experts

Example Oh yeah?

Video of credibility assessment in dokieli [, WebM]

The dokieli project was presented live from Bern and Wiesbaden by Sarven and Virginia during the Authentic Web workshop held at W3C on December 9, 2025.

Paris Hilton visited the Hilton in Paris.

Herbert Van de Sompel is a Belgian librarian and information scientist, and time travelled in Austria.

Nina Simone was born in 1933.

Vaccines cause autism.

Climate change is caused by aliens.

The evidence shows that the Earth is flat.

9 out of 10 dentists recommend eating as much candy as possible.

99% of dogs are good boys and girls, and 1% are excellent boys and girls.

Standards

Dokieli implements and uses and ocean of web standards:

  • Linked Data Platform, Solid Protocol - to discover interactions from the user's storage
  • Linked Data Notifications - to discover a document's inbox and notifications.
  • Web Annotation Protocol - to discover annotation services and collections of annotations.
  • Web Annotation Vocabulary - to use applicable annotations that are associated with a document or its parts.
  • ActivityPub - to send activities and discover interactions from the user's and their contact's outbox.
  • ActivityStreams - to use relevant activities and identify users.
  • WebID - to identify users and access their profile information, including preferences.
  • Solid Type Indexes - to locate specific resource types for both the user and their contacts.
  • Web Access Control - to set authorization rules on resources for different agents.
  • ODRL Information Model - to examine information about the storage (location, name, description, owners, URI persistence policy, and digital rights policies).
  • Memento - to access the memento (version history) of a document.
  • Robust Links - to help mitigate content drift and link rot, links can be enhanced with additional context.
  • SPARQL Query Language - to query structured data from public.
  • SPAR Ontologies - to create and consume fine-grained (e.g., sentence-level) typed citations.
  • PROV: The PROV Ontology - to provide provenance activity about the performed statistical data analysis.
  • ...
New standards?
A standardised Credibility Assessment Data Model
Development of an Argumentation Data Model
Development or extension of Web Annotation motivations
Next steps on Web Extensions (charter Working Group draft charter)

Threat model

Background: Dokieli Threat Modeling - STRIDE

Potential concerns
Moderation
Preventing harassment
Manipulation of the fact-checking features by bad actors
Selection or curation of data sources
Reliable way to identify biases

Let's make it so!