1. Linked Statistical Data Analysis

    eHumanities, Amsterdam, 2014-03-27

    #LinkedData

    Sarven's avatar Sarven Capadisli http://csarven.ca/#i @csarven

  2. Statistical Data

    Hypercube
    2004-2006 2005-2007 2006-2008
    MaleFemaleMaleFemaleMaleFemale
    Newport76.780.777.180.977.081.5
    Cardiff78.783.378.683.778.783.4
    Monmouthshire76.681.376.581.576.681.7
    Merthyr Tydfil75.579.175.579.474.979.6
    • Dimensions: Time, Location, Sex
    • Measure: values
    • Attributes: unit of measurement, accuracy
  3. Statistical Data on the Web (Characteristics)

    • Decentralized
    • Heterogeneous
    • Structured
    • High volume
    • Formats (e.g., CSV, Excel, PC-Axis, SDMX-ML, XML)
    • Distribution and Access

    Clean? Synchronised? Comparable? Provenance? Trustable? Analyses?

  4. A Linked Dataspace

    from Statistical Linked Dataspaces

  5. Statistical Linked Dataspaces

    .. from Government, IGO, NGO data

  6. Statistical Linked Dataspaces (2010-2011)

  7. Galway City page on DataGovIE

    Screenshot of Galway City page on DataGovIE
  8. School Explorer

    School Explorer Pilot screenshot
  9. Statistical Linked Dataspaces (2012)

  10. Central government debt indicator chart view for some countries

    Screenshot of World Bank indicator GC.DOD.TOTL.GD.ZS for countries CA,US,DE,CH,IE in worldbank.270a.info/view

    See also: http://worldbank.270a.info/view?indicator=GC.DOD.TOTL.GD.ZS&country=CA,US,DE,CH,IE

  11. Statistical Linked Dataspaces (2013)

    Source format? SDMX-ML

  12. Statistical Linked Dataspaces (2014+)

  13. External adoption of Linked SDMX

    • Swiss Federal Statistics Office / Bern University of Applied Sciences (pilot)
    • Italian National Institute of Statistics / SpazioDati (pilot)
    • LOD2 Statistical Workbench
    • ?
  14. Interlinking

  15. Provenance

  16. Provenance

  17. 270a Cloud (Statistical Linked Dataspaces)

  18. Interesting queries?

    • Number of people born in Bern before 1900
    • Inflation rate in Italy when the prime minister was ...
    • Development projects in low-middle income countries situated above the equator
  19. How about interesting analysis?

    • statistically significant analysis about GDP and mortality-rate
    • strong correlations
    • predicting or forecasting
    • Investigating the WHYs
  20. stats.270a.info

    Citizen-centric interfaces for statistical stuff.

    Intended for data journalists, researchers, non-developers!

    ... and Linked Data friendly.

  21. Analysis user-interface (Plot) 1/3

    http://stats.270a.info/analysis/worldbank:SP.DYN.IMRT.IN/transparency:CPI2009/year:2009

  22. Analysis user-interface (Summary) 2/3

    http://stats.270a.info/analysis/worldbank:SP.DYN.IMRT.IN/transparency:CPI2009/year:2009

  23. Oh yeah?

    Provenance user-interface

  24. Analysis user-interface (Provenance) 3/3

    http://stats.270a.info/provenance/fa698e46868fe348865678884e89ef84b0be6c64

  25. Adding some context

    Time Series Example
  26. stats.270a.info Toolkit

    • Shiny server (node)
    • R (Shiny, SPARQL packages)
    • Jena Fuseki
    • Apache
    • Linked Data Pages
  27. Challenges and Goals

    • Federated querying statistical data
    • UI and URI design
    • Creating statistical artefacts
    • Finding purpose
  28. Some numbers

    • Over 1 billion triples in the 270a Cloud
    • Regression analysis: over 100 billion (estimate)
    • How about other analysis?
  29. .. but what is really interesting here?

  30. Lets take another step back

  31. Identifying things

    Now! That should clear up a few things around here. The Far Side
  32. Linked Statistical Artefacts

    • Dataset: http://worldbank.270a.info/dataset/world-bank-finances
    • Observation: http://ecb.270a.info/dataset/SEE/A/AT/WBR0/EXT/X/E/2011
    • Dimension: http://oecd.270a.info/dimension/1.0/TIME
    • Measure: http://ecb.270a.info/measure/1.0/OBS_VALUE
    • Attribute: http://transparency.270a.info/classification/attribute/matching-percentiles
    • Concept: http://imf.270a.info/concept/1.0/PGI/REF_AREA
    • Code list: http://fao.270a.info/code/0.1/CL_UN_COUNTRY
    • Hierarchical code list: http://bfs.270a.info/code/1.0/HR_HGDE_HIST
    • Regression Analysis: http://stats.270a.info/analysis/worldbank:GC.DOD.TOTL.GD.ZS/transparency:CPI2009/year:2009

    Cool URIs? 1, 5, 100, 10000 years? Ha!

  33. Linked Statistical Data Analysis

    Sarven's avatar Sarven Capadisli

    http://csarven.ca/#i

    @csarven

  34. Credits

  35. Credits