python format status version

DeTrusty as a Library

Installation

If you want to use DeTrusty as a library, you can install it from its source code on GitHub or download the package from PyPI.

Requirements

DeTrusty is implemented in Python3. The current version of DeTrusty supports python. DeTrusty uses the requests library for managing the HTTP(S) requests to the SPARQL endpoints. The SPARQL parser uses ply. The generation of the source descriptions from RML mappings is enabled by rdflib.

Local Source Code

You can install DeTrusty from your local source code by performing the following steps.

git clone git@github.com:SDM-TIB/DeTrusty.git
cd DeTrusty
python -m pip install -e .

GitHub

DeTrusty can also be installed from its source code in GitHub without explicitly cloning the repository:

python -m pip install -e 'git+https://github.com/SDM-TIB/DeTrusty#egg=DeTrusty'

PyPI

The easiest way to install DeTrusty is to download the package from PyPI:

python -m pip install DeTrusty

Executing Queries

Note

The goal of this section is to explain how to run SPARQL queries when using DeTrusty as a library. We refer the reader to the chapter Source Descriptions for more information about what source descriptions are and why they are needed. Additionally, we refer to Decomposition Types for more details about what decomposition is and an overview of different decomposition types.

DeTrusty offers the function run_query for executing SPARQL queries.

run_query(query, decomposition_type='STAR', sparql_one_dot_one=None, config=<DeTrusty.Molecule.MTManager.Config object>, join_stars_locally=True, print_result=True, yasqe=False, timeout=0)

Executes a SPARQL query over a federation of SPARQL endpoints.

The SPARQL query is decomposed based on the specified decomposition type. DeTrusty identifies the possible sources for each sub-query using the metadata collected previously. If the query contains the SERVICE clause, DeTrusty executed the sub-query at the specified endpoint instead.

Parameters:
  • query (str) – The SPARQL query to be executed. Might be a string holding the SPARQL query or path to a query file. The query file can be local or remote (accessible via GET request).

  • decomposition_type (str, optional) – The decomposition type to be used for decomposing the query. Possible values are ‘STAR’ for a star-shaped decomposition, ‘EG’ for exclusive groups decomposition, and ‘TRIPLE’ for a triple-wise decomposition, i.e., each triple pattern of the query produces a sub-query. Default is ‘STAR’.

  • sparql_one_dot_one (bool, optional) –

    Deprecated since version 0.15.0: No longer needed. DeTrusty is using one parser now.

  • config (DeTrusty.Molecule.MTManager.Config, optional) – The configuration holding the metadata about the federation over which the SPARQL query should be executed. If no value is specified, DeTrusty will attempt to load the configuration from ./Config/rdfmts.json.

  • join_stars_locally (bool, optional) – Indicates whether joins should be performed at the query engine. ‘True’ meaning joins will be performed in DeTrusty, ‘False’ leads to joins being executed in the sources if possible. Default behavior is join execution at the query engine level.

  • print_result (bool, optional) – Indicates whether the actual query result should be returned. ‘True’ meaning the result will be included in the answer. ‘False’ only returns the metadata of the query result, like the cardinality. Default is ‘True’.

  • yasqe (bool, optional) – Indicates whether the SPARQL query was sent from the YASGUI interface of DeTrusty’s Web interface. This is a workaround for YASQE not being able to show the query results when the validation data is included. Set to ‘True’ to omit the validation data. Default is ‘False’.

  • timeout (float, optional) –

    Warning

    This feature is currently considered experimental and will not produce partial results!

    Added in version 0.17.0.

    DeTrusty will stop the query execution once the timeout is reached. The timeout is specified in seconds. The default is 0 and reflects no limit.

Returns:

A dictionary including the query answer and additional metadata following the SPARQL protocol. It returns an error message in the ‘error’ field if something went wrong. Other metadata might be omitted in that case.

Return type:

dict

Examples

The example calls assume that an object config with the source descriptions exists.

>>> run_query('SELECT ?s WHERE { ?s a <http://example.com/Person> }', config=config)
>>> run_query('./query.rq', config=config)
>>> run_query('http://example.com/queries/query.rq', config=config)
>>> run_query('./query.rq', config=config, timeout=10)

Additionally, DeTrusty offers the function get_config to create an object with the internal representation of the source descriptions.

get_config(config_input)

Creates an object with the internal representation of the source descriptions.

Based on the type of input, this function will create a Config object either from a local file, remote file, or a list of dictionaries. The source descriptions are later used by DeTrusty for the source selection and decomposition. The main usage of this function is to read stored source descriptions from a file (local or remote) in order to reuse them for query execution over the same federation.

Parameters:

config_input (str | list[dict]) – The source description to transform into the internal representation. Might be a string holding the path to a configuration file. The configuration file can be local or remote (accessible via GET request). The source description can also be a parsed JSON, i.e., a list of Python dictionaries. Each dictionary represents a so-called RDF Molecule Template.

Returns:

The object holding the internal representation of the source descriptions. The result might be None if there was an issue while reading the source descriptions that did not lead to an exception.

Return type:

Config

Examples

The example calls assume that the files rdftms.json and rdfmts.ttl are valid source description files created by DeTrusty. See Creating Source Descriptions for more information. Additionally, it is assumed that the used SPARQL endpoint is serving valid source descriptions.

>>> get_config('./rdfmts.json')
>>> get_config('./rdfmts.ttl')
>>> get_config('http://example.com/rdfmts.json')
>>> get_config('http://example.com/rdfmts.ttl')
>>> get_config('http://src_desc.example.com/sparql')

Knowledge4COVID-19

In order to gain hands-on experience with DeTrusty, we provide a couple of examples. This particular example uses the queries related to the Knowledge4COVID-19 KG [4]. Before running the examples below, please, make sure that you have DeTrusty installed.

The federation of knowledge graphs for this example contains the following SPARQL endpoints:

Q1. COVID-19 Drugs for Patients with Asthma

“Retrieve from DBpedia the excretion rate, metabolism, and routes of administration of the COVID-19 drugs in the treatments to treat COVID-19 in patients with Asthma.”

PREFIX dbp: <http://dbpedia.org/property/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX k4covid: <http://research.tib.eu/covid-19/vocab/>
PREFIX k4covide: <http://research.tib.eu/covid-19/entity/>

SELECT DISTINCT ?treatment ?sameAsCovidDrug ?excretation ?metabolism ?routes WHERE {
    ?treatment k4covid:hasCovidDrug ?covidDrug.
    FILTER( ?comorbidity=k4covide:Asthma )
    ?treatment k4covid:hasComorbidity ?comorbidity.
    ?treatment k4covid:hasComorbidityDrug ?comorbidityDrug.
    ?comorbidityDrug k4covid:hasCUIAnnotation ?CUIComorbidityDrug.
    ?CUIComorbidityDrug owl:sameAs ?sameAsComorbidityDrug .
    ?covidDrug k4covid:hasCUIAnnotation ?CUICovidDrug.
    ?CUICovidDrug owl:sameAs ?sameAsCovidDrug .
    ?sameAsCovidDrug dbp:excretion ?excretation.
    ?sameAsCovidDrug dbp:metabolism ?metabolism.
    ?sameAsCovidDrug dbp:routesOfAdministration ?routes.
} ORDER BY ?treatment

This query collects data from the Knowledge4COVID-19 KG and DBpedia. See below how to execute the query with DeTrusty.

from DeTrusty import get_config, run_query

config = get_config('https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/rdfmts.ttl')
query = 'https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/Q1.rq'
result = run_query(query, config=config, join_stars_locally=False)
print(result)

Q2. COVID-19 Drugs for Patients with a Cardiopathy

“Retrieve from Wikidata the CheMBL code, metabolism, and routes of administration of the COVID-19 drugs in the treatments to treat COVID-19 in patients with a cardiopathy.”

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX k4covid: <http://research.tib.eu/covid-19/vocab/>
PREFIX k4covide: <http://research.tib.eu/covid-19/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT DISTINCT ?treatment ?sameAsComorbidityDrug ?idDrug ?activeIngredient ?mass
WHERE {
    ?treatment k4covid:hasCovidDrug ?covidDrug.
    ?treatment k4covid:hasComorbidity k4covide:Cardiopathy .
    ?treatment k4covid:hasComorbidityDrug ?comorbidityDrug.
    ?comorbidityDrug k4covid:hasCUIAnnotation ?CUIComorbidityDrug.
    ?CUIComorbidityDrug owl:sameAs ?sameAsComorbidityDrug .
    ?covidDrug k4covid:hasCUIAnnotation ?CUICovidDrug.
    ?CUICovidDrug owl:sameAs ?sameAsCovidDrug .
    ?sameAsComorbidityDrug wdt:P592 ?idDrug .
    ?sameAsComorbidityDrug wdt:P3780 ?activeIngredient .
    ?sameAsComorbidityDrug  wdt:P2067 ?mass .
}

This query collects data from the Knowledge4COVID-19 KG and Wikidata. See below how to execute the query with DeTrusty.

from DeTrusty import get_config, run_query

config = get_config('https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/rdfmts.ttl')
query = 'https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/Q2.rq'
result = run_query(query, config=config, join_stars_locally=False)
print(result)

Q3. Detailed COVID-19 Drug Information for Patients with Asthma

“Retrieve from DBpedia the excretion rate, metabolism, and routes of administration, CheMBL and Kegg codes, smile notation, and trade name of the COVID-19 drugs in the treatments to treat COVID-19 in patients with Asthma.”

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX k4covid: <http://research.tib.eu/covid-19/vocab/>
PREFIX k4covide: <http://research.tib.eu/covid-19/entity/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>

SELECT DISTINCT ?treatment ?sameAsCovidDrug ?excretion ?metabolism ?routes ?dbDrugBank ?dbSmiles ?tradeName
WHERE {
    ?treatment k4covid:hasCovidDrug ?covidDrug.
    ?treatment k4covid:hasComorbidity k4covide:Asthma.
    ?treatment k4covid:hasComorbidityDrug ?comorbidityDrug.
    ?comorbidityDrug k4covid:hasCUIAnnotation ?CUIComorbidityDrug.
    ?CUIComorbidityDrug owl:sameAs ?sameAsComorbidityDrug .
    ?covidDrug k4covid:hasCUIAnnotation ?CUICovidDrug.
    ?CUICovidDrug owl:sameAs ?sameAsCovidDrug .
    ?sameAsCovidDrug dbp:excretion ?excretion.
    ?sameAsCovidDrug dbp:metabolism ?metabolism.
    ?sameAsCovidDrug dbp:routesOfAdministration ?routes.
    ?sameAsCovidDrug dbp:chembl ?dbCheml.
    ?sameAsCovidDrug dbp:kegg ?dbKegg.
    ?sameAsCovidDrug dbo:drugbank ?dbDrugBank .
    ?sameAsCovidDrug dbp:smiles ?dbSmiles .
    ?sameAsCovidDrug dbp:tradename ?tradeName.
} ORDER BY ?treatment

This query collects data from the Knowledge4COVID-19 KG and DBpedia. See below how to execute the query with DeTrusty.

from DeTrusty import get_config, run_query

config = get_config('https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/rdfmts.ttl')
query = 'https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/Q3.rq'
result = run_query(query, config=config, join_stars_locally=False)
print(result)

Q4. COVID-19 Comorbidity Information

“Retrieve from DBpedia the disease label, ICD-10 and mesh codes, and risks of the comorbidities of the COVID-19 treatments.”

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX k4covid: <http://research.tib.eu/covid-19/vocab/>
PREFIX k4covide: <http://research.tib.eu/covid-19/entity/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>

SELECT DISTINCT ?sameAsComorbidity ?icd10 ?risk WHERE {
    ?treatment k4covid:hasCovidDrug ?covidDrug.
    ?treatment k4covid:hasComorbidity ?comorbidity.
    ?comorbidity k4covid:hasCUIAnnotation ?annotationComorbidity .
    ?annotationComorbidity owl:sameAs ?sameAsComorbidity .
    ?sameAsComorbidity dbo:icd10 ?icd10.
    ?sameAsComorbidity dbo:meshId ?meshID .
    ?sameAsComorbidity dbp:risks ?risk
}

This query collects data from the Knowledge4COVID-19 KG and Wikidata. See below how to execute the query with DeTrusty.

from DeTrusty import get_config, run_query

config = get_config('https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/rdfmts.ttl')
query = 'https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/Q4.rq'
result = run_query(query, config=config, join_stars_locally=False)
print(result)

Q5. COVID-19 Comorbidity Drugs

“Retrieve the COVID-19 and comorbidity drugs on a treatment and the CheMBL code, mass, and excretion route for the comorbidity drugs.”

PREFIX dbp: <http://dbpedia.org/property/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX k4covid: <http://research.tib.eu/covid-19/vocab/>
PREFIX k4covide: <http://research.tib.eu/covid-19/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT DISTINCT ?treatment ?sameAsComorbidityDrug ?sameAsCovidDrug ?idDrug ?mass ?excretation
WHERE {
  ?treatment k4covid:hasCovidDrug ?covidDrug.
  ?treatment k4covid:hasComorbidity k4covide:Cardiopathy .
  ?treatment k4covid:hasComorbidityDrug ?comorbidityDrug.
  ?comorbidityDrug k4covid:hasCUIAnnotation ?CUIComorbidityDrug.
  ?CUIComorbidityDrug owl:sameAs ?sameAsComorbidityDrug .
  ?covidDrug k4covid:hasCUIAnnotation ?CUICovidDrug.
  ?CUICovidDrug owl:sameAs ?sameAsCovidDrug .
  {
    ?sameAsComorbidityDrug wdt:P592 ?idDrug .
    ?sameAsComorbidityDrug wdt:P2067 ?mass .
  }
  UNION
  {
    ?sameAsComorbidityDrug dbp:excretion ?excretation .
  }
}

This query collects data from the Knowledge4COVID-19 KG, DBpedia, and Wikidata. See below how to execute the query with DeTrusty.

from DeTrusty import get_config, run_query

config = get_config('https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/rdfmts.ttl')
query = 'https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/Q5.rq'
result = run_query(query, config=config, join_stars_locally=False)
print(result)

CoyPu

This particular example uses queries related to the project CoyPu. Before running the examples below, please, make sure that you have DeTrusty installed.

The federation of knowledge graphs for this example contains the following SPARQL endpoints:

Q1. Life Expectancy in 2017

“Retrieve the life expectancy for all countries in 2017 as reported by World Bank and Wikidata”

PREFIX wb: <http://worldbank.org/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX time: <http://www.w3.org/2006/time#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?country_code ?country_name ?year (AVG(?year_exp_WB) AS ?exp_wb) (AVG(?year_exp) AS ?exp_wiki)
WHERE {
    ?country a wb:Country .
    ?country dc:identifier ?country_code .
    ?country rdfs:label ?country_name.
    ?country owl:sameAs ?sameAsCountry .
    ?country wb:hasAnnualIndicatorEntry ?annualIndicator .
    ?annualIndicator wb:hasIndicator <http://worldbank.org/Indicator/SP.DYN.LE00.IN> .
    ?annualIndicator owl:hasValue ?year_exp_WB .
    ?annualIndicator time:year ?year .

    ?sameAsCountry p:P2250 ?itemLifeExpectancy .
    ?itemLifeExpectancy ps:P2250 ?year_exp .
    ?itemLifeExpectancy pq:P585 ?time .
    BIND(year(?time) AS ?year)
    FILTER(?year=2017)
}
GROUP BY ?country_code ?country_name ?year
ORDER BY ?country_code

This query collects data from the World Bank KG and Wikidata. See below how to execute the query with DeTrusty.

from DeTrusty import get_config, run_query

config = get_config('https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/CoyPu/rdfmts.ttl')
query = 'https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/CoyPu/Q1.rq'
result = run_query(query, config=config, join_stars_locally=False)
print(result)

Q2. GDP and CO2 Emission per Capita in Germany

“Retrieve the GDP per capita and carbon emission per capita for Germany per year.”

PREFIX wb: <http://worldbank.org/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX time: <http://www.w3.org/2006/time#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT ?year (AVG(?value/?population) AS ?gdp_per_capita) (AVG(?value1/?population)*1000000 AS ?carbon_per_capita)
WHERE {
    ?indicator a wb:AnnualIndicatorEntry .
    ?indicator wb:hasIndicator <http://worldbank.org/Indicator/NY.GDP.MKTP.CD> .
    ?indicator wb:hasCountry ?country .
    ?indicator owl:hasValue ?value .
    ?indicator time:year ?year .
    ?country dc:identifier 'DEU' .

    ?indicator1 a wb:AnnualIndicatorEntry .
    ?indicator1 wb:hasIndicator <http://worldbank.org/Indicator/EN.ATM.CO2E.KT> .
    ?indicator1 wb:hasCountry ?country1 .
    ?indicator1 owl:hasValue ?value1 .
    ?indicator1 time:year ?year .
    ?country dc:identifier 'DEU' .

    ?countryWiki wdt:P298 'DEU' .
    ?countryWiki p:P1082 ?itemP .
    ?itemP pq:P585 ?time.
    ?itemP ps:P1082 ?population .
    BIND(year(?time) AS ?year)
}
GROUP BY ?year
ORDER BY ?year

This query collects data from the World Bank KG and Wikidata. See below how to execute the query with DeTrusty.

from DeTrusty import get_config, run_query

config = get_config('https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/CoyPu/rdfmts.ttl')
query = 'https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/CoyPu/Q2.rq'
result = run_query(query, config=config, join_stars_locally=False)
print(result)

Source Descriptions

We refer the reader to the chapter Source Descriptions for more information about what source descriptions are, why they are needed, and how they can be created. A detailed explanation of how to create the source descriptions is given in Creating Source Descriptions.