python format status version

DeTrusty as a Library

Installation

If you want to use DeTrusty as a library, you can install it from its source code on GitHub or download the package from PyPI.

Requirements

DeTrusty is implemented in Python3. The current version of DeTrusty supports python. DeTrusty uses the requests library for managing the HTTP(S) requests to the SPARQL endpoints. The SPARQL parser uses ply. The generation of the source descriptions from RML mappings is enabled by rdflib.

Local Source Code

You can install DeTrusty from your local source code by performing the following steps.

git clone git@github.com:SDM-TIB/DeTrusty.git
cd DeTrusty
python -m pip install -e .

GitHub

DeTrusty can also be installed from its source code in GitHub without explicitly cloning the repository:

python -m pip install -e 'git+https://github.com/SDM-TIB/DeTrusty#egg=DeTrusty'

PyPI

The easiest way to install DeTrusty is to download the package from PyPI:

python -m pip install DeTrusty

Executing Queries

Note

The goal of this section is to explain how to run SPARQL queries when using DeTrusty as a library. We refer the reader to the chapter Source Descriptions for more information about what source descriptions are and why they are needed. Additionally, we refer to Decomposition Types for more details about what decomposition is and an overview of different decomposition types.

DeTrusty offers the function run_query for executing SPARQL queries.

run_query(query, decomposition_type='STAR', sparql_one_dot_one=None, config=<DeTrusty.Molecule.MTManager.Config object>, join_stars_locally=True, print_result=True, yasqe=False, timeout=0)

Executes a SPARQL query over a federation of SPARQL endpoints.

The SPARQL query is decomposed based on the specified decomposition type. DeTrusty identifies the possible sources for each sub-query using the metadata collected previously. If the query contains the SERVICE clause, DeTrusty executed the sub-query at the specified endpoint instead.

Parameters:
  • query (str) – The SPARQL query to be executed. Might be a string holding the SPARQL query or path to a query file. The query file can be local or remote (accessible via GET request).

  • decomposition_type (str, optional) – The decomposition type to be used for decomposing the query. Possible values are ‘STAR’ for a star-shaped decomposition, ‘EG’ for exclusive groups decomposition, and ‘TRIPLE’ for a triple-wise decomposition, i.e., each triple pattern of the query produces a sub-query. Default is ‘STAR’.

  • sparql_one_dot_one (bool, optional) –

    Deprecated since version 0.15.0: No longer needed. DeTrusty is using one parser now.

  • config (DeTrusty.Molecule.MTManager.Config, optional) – The configuration holding the metadata about the federation over which the SPARQL query should be executed. If no value is specified, DeTrusty will attempt to load the configuration from ./Config/rdfmts.json.

  • join_stars_locally (bool, optional) – Indicates whether joins should be performed at the query engine. ‘True’ meaning joins will be performed in DeTrusty, ‘False’ leads to joins being executed in the sources if possible. Default behavior is join execution at the query engine level.

  • print_result (bool, optional) – Indicates whether the actual query result should be returned. ‘True’ meaning the result will be included in the answer. ‘False’ only returns the metadata of the query result, like the cardinality. Default is ‘True’.

  • yasqe (bool, optional) – Indicates whether the SPARQL query was sent from the YASGUI interface of DeTrusty’s Web interface. This is a workaround for YASQE not being able to show the query results when the validation data is included. Set to ‘True’ to omit the validation data. Default is ‘False’.

  • timeout (float, optional) –

    Warning

    This feature is currently considered experimental and will not produce partial results!

    Added in version 0.17.0.

    DeTrusty will stop the query execution once the timeout is reached. The timeout is specified in seconds. The default is 0 and reflects no limit.

Returns:

A dictionary including the query answer and additional metadata following the SPARQL protocol. It returns an error message in the ‘error’ field if something went wrong. Other metadata might be omitted in that case.

Return type:

dict

Examples

The example calls assume that an object config with the source descriptions exists.

>>> run_query('SELECT ?s WHERE { ?s a <http://example.com/Person> }', config=config)
>>> run_query('./query.rq', config=config)
>>> run_query('http://example.com/queries/query.rq', config=config)
>>> run_query('./query.rq', config=config, timeout=10)

Additionally, DeTrusty offers the function get_config to create an object with the internal representation of the source descriptions.

get_config(config_input)

Creates an object with the internal representation of the source descriptions.

Based on the type of input, this function will create a Config object either from a local file, remote file, or a list of dictionaries. The source descriptions are later used by DeTrusty for the source selection and decomposition. The main usage of this function is to read stored source descriptions from a file (local or remote) in order to reuse them for query execution over the same federation.

Parameters:

config_input (str | list[dict]) – The source description to transform into the internal representation. Might be a string holding the path to a configuration file. The configuration file can be local or remote (accessible via GET request). The source description can also be a parsed JSON, i.e., a list of Python dictionaries. Each dictionary represents a so-called RDF Molecule Template.

Returns:

The object holding the internal representation of the source descriptions. The result might be None if there was an issue while reading the source descriptions that did not lead to an exception.

Return type:

Config

Examples

The example calls assume that the files rdftms.json and rdfmts.ttl are valid source description files created by DeTrusty. See Creating Source Descriptions for more information.

>>> get_config('./rdfmts.json')
>>> get_config('./rdfmts.ttl')
>>> get_config('http://example.com/rdfmts.json')
>>> get_config('http://example.com/rdfmts.ttl')

Knowledge4COVID-19

In order to gain hands-on experience with DeTrusty, we provide a couple of examples. This particular example uses the queries related to the Knowledge4COVID-19 KG [4]. Before running the examples below, please, make sure that you have DeTrusty installed.

The federation of knowledge graphs for this example contains the following SPARQL endpoints:

Q1. COVID-19 Drugs for Patients with Asthma

“Retrieve from DBpedia the excretion rate, metabolism, and routes of administration of the COVID-19 drugs in the treatments to treat COVID-19 in patients with Asthma.”

PREFIX dbp: <http://dbpedia.org/property/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX k4covid: <http://research.tib.eu/covid-19/vocab/>
PREFIX k4covide: <http://research.tib.eu/covid-19/entity/>

SELECT DISTINCT ?treatment ?sameAsCovidDrug ?excretation ?metabolism ?routes WHERE {
    ?treatment k4covid:hasCovidDrug ?covidDrug.
    FILTER( ?comorbidity=k4covide:Asthma )
    ?treatment k4covid:hasComorbidity ?comorbidity.
    ?treatment k4covid:hasComorbidityDrug ?comorbidityDrug.
    ?comorbidityDrug k4covid:hasCUIAnnotation ?CUIComorbidityDrug.
    ?CUIComorbidityDrug owl:sameAs ?sameAsComorbidityDrug .
    ?covidDrug k4covid:hasCUIAnnotation ?CUICovidDrug.
    ?CUICovidDrug owl:sameAs ?sameAsCovidDrug .
    ?sameAsCovidDrug dbp:excretion ?excretation.
    ?sameAsCovidDrug dbp:metabolism ?metabolism.
    ?sameAsCovidDrug dbp:routesOfAdministration ?routes.
} ORDER BY ?treatment

This query collects data from the Knowledge4COVID-19 KG and DBpedia. See below how to execute the query with DeTrusty.

from DeTrusty import get_config, run_query

config = get_config('https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/rdfmts.ttl')
query = 'https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/Q1.rq'
result = run_query(query, config=config, join_stars_locally=False)
print(result)

Q2. COVID-19 Drugs for Patients with a Cardiopathy

“Retrieve from Wikidata the CheMBL code, metabolism, and routes of administration of the COVID-19 drugs in the treatments to treat COVID-19 in patients with a cardiopathy.”

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX k4covid: <http://research.tib.eu/covid-19/vocab/>
PREFIX k4covide: <http://research.tib.eu/covid-19/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT DISTINCT ?treatment ?sameAsComorbidityDrug ?idDrug ?activeIngredient ?mass
WHERE {
    ?treatment k4covid:hasCovidDrug ?covidDrug.
    ?treatment k4covid:hasComorbidity k4covide:Cardiopathy .
    ?treatment k4covid:hasComorbidityDrug ?comorbidityDrug.
    ?comorbidityDrug k4covid:hasCUIAnnotation ?CUIComorbidityDrug.
    ?CUIComorbidityDrug owl:sameAs ?sameAsComorbidityDrug .
    ?covidDrug k4covid:hasCUIAnnotation ?CUICovidDrug.
    ?CUICovidDrug owl:sameAs ?sameAsCovidDrug .
    ?sameAsComorbidityDrug wdt:P592 ?idDrug .
    ?sameAsComorbidityDrug wdt:P3780 ?activeIngredient .
    ?sameAsComorbidityDrug  wdt:P2067 ?mass .
}

This query collects data from the Knowledge4COVID-19 KG and Wikidata. See below how to execute the query with DeTrusty.

from DeTrusty import get_config, run_query

config = get_config('https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/rdfmts.ttl')
query = 'https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/Q2.rq'
result = run_query(query, config=config, join_stars_locally=False)
print(result)

Q3. Detailed COVID-19 Drug Information for Patients with Asthma

“Retrieve from DBpedia the excretion rate, metabolism, and routes of administration, CheMBL and Kegg codes, smile notation, and trade name of the COVID-19 drugs in the treatments to treat COVID-19 in patients with Asthma.”

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX k4covid: <http://research.tib.eu/covid-19/vocab/>
PREFIX k4covide: <http://research.tib.eu/covid-19/entity/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>

SELECT DISTINCT ?treatment ?sameAsCovidDrug ?excretion ?metabolism ?routes ?dbDrugBank ?dbSmiles ?tradeName
WHERE {
    ?treatment k4covid:hasCovidDrug ?covidDrug.
    ?treatment k4covid:hasComorbidity k4covide:Asthma.
    ?treatment k4covid:hasComorbidityDrug ?comorbidityDrug.
    ?comorbidityDrug k4covid:hasCUIAnnotation ?CUIComorbidityDrug.
    ?CUIComorbidityDrug owl:sameAs ?sameAsComorbidityDrug .
    ?covidDrug k4covid:hasCUIAnnotation ?CUICovidDrug.
    ?CUICovidDrug owl:sameAs ?sameAsCovidDrug .
    ?sameAsCovidDrug dbp:excretion ?excretion.
    ?sameAsCovidDrug dbp:metabolism ?metabolism.
    ?sameAsCovidDrug dbp:routesOfAdministration ?routes.
    ?sameAsCovidDrug dbp:chembl ?dbCheml.
    ?sameAsCovidDrug dbp:kegg ?dbKegg.
    ?sameAsCovidDrug dbo:drugbank ?dbDrugBank .
    ?sameAsCovidDrug dbp:smiles ?dbSmiles .
    ?sameAsCovidDrug dbp:tradename ?tradeName.
} ORDER BY ?treatment

This query collects data from the Knowledge4COVID-19 KG and DBpedia. See below how to execute the query with DeTrusty.

from DeTrusty import get_config, run_query

config = get_config('https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/rdfmts.ttl')
query = 'https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/Q3.rq'
result = run_query(query, config=config, join_stars_locally=False)
print(result)

Q4. COVID-19 Comorbidity Information

“Retrieve from DBpedia the disease label, ICD-10 and mesh codes, and risks of the comorbidities of the COVID-19 treatments.”

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX k4covid: <http://research.tib.eu/covid-19/vocab/>
PREFIX k4covide: <http://research.tib.eu/covid-19/entity/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>

SELECT DISTINCT ?sameAsComorbidity ?icd10 ?risk WHERE {
    ?treatment k4covid:hasCovidDrug ?covidDrug.
    ?treatment k4covid:hasComorbidity ?comorbidity.
    ?comorbidity k4covid:hasCUIAnnotation ?annotationComorbidity .
    ?annotationComorbidity owl:sameAs ?sameAsComorbidity .
    ?sameAsComorbidity dbo:icd10 ?icd10.
    ?sameAsComorbidity dbo:meshId ?meshID .
    ?sameAsComorbidity dbp:risks ?risk
}

This query collects data from the Knowledge4COVID-19 KG and Wikidata. See below how to execute the query with DeTrusty.

from DeTrusty import get_config, run_query

config = get_config('https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/rdfmts.ttl')
query = 'https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/Q4.rq'
result = run_query(query, config=config, join_stars_locally=False)
print(result)

Q5. COVID-19 Comorbidity Drugs

“Retrieve the COVID-19 and comorbidity drugs on a treatment and the CheMBL code, mass, and excretion route for the comorbidity drugs.”

PREFIX dbp: <http://dbpedia.org/property/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX k4covid: <http://research.tib.eu/covid-19/vocab/>
PREFIX k4covide: <http://research.tib.eu/covid-19/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT DISTINCT ?treatment ?sameAsComorbidityDrug ?sameAsCovidDrug ?idDrug ?mass ?excretation
WHERE {
  ?treatment k4covid:hasCovidDrug ?covidDrug.
  ?treatment k4covid:hasComorbidity k4covide:Cardiopathy .
  ?treatment k4covid:hasComorbidityDrug ?comorbidityDrug.
  ?comorbidityDrug k4covid:hasCUIAnnotation ?CUIComorbidityDrug.
  ?CUIComorbidityDrug owl:sameAs ?sameAsComorbidityDrug .
  ?covidDrug k4covid:hasCUIAnnotation ?CUICovidDrug.
  ?CUICovidDrug owl:sameAs ?sameAsCovidDrug .
  {
    ?sameAsComorbidityDrug wdt:P592 ?idDrug .
    ?sameAsComorbidityDrug wdt:P2067 ?mass .
  }
  UNION
  {
    ?sameAsComorbidityDrug dbp:excretion ?excretation .
  }
}

This query collects data from the Knowledge4COVID-19 KG, DBpedia, and Wikidata. See below how to execute the query with DeTrusty.

from DeTrusty import get_config, run_query

config = get_config('https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/rdfmts.ttl')
query = 'https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/K4COVID/Q5.rq'
result = run_query(query, config=config, join_stars_locally=False)
print(result)

CoyPu

This particular example uses queries related to the project CoyPu. Before running the examples below, please, make sure that you have DeTrusty installed.

The federation of knowledge graphs for this example contains the following SPARQL endpoints:

Q1. Life Expectancy in 2017

“Retrieve the life expectancy for all countries in 2017 as reported by World Bank and Wikidata”

PREFIX wb: <http://worldbank.org/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX time: <http://www.w3.org/2006/time#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?country_code ?country_name ?year (AVG(?year_exp_WB) AS ?exp_wb) (AVG(?year_exp) AS ?exp_wiki)
WHERE {
    ?country a wb:Country .
    ?country dc:identifier ?country_code .
    ?country rdfs:label ?country_name.
    ?country owl:sameAs ?sameAsCountry .
    ?country wb:hasAnnualIndicatorEntry ?annualIndicator .
    ?annualIndicator wb:hasIndicator <http://worldbank.org/Indicator/SP.DYN.LE00.IN> .
    ?annualIndicator owl:hasValue ?year_exp_WB .
    ?annualIndicator time:year ?year .

    ?sameAsCountry p:P2250 ?itemLifeExpectancy .
    ?itemLifeExpectancy ps:P2250 ?year_exp .
    ?itemLifeExpectancy pq:P585 ?time .
    BIND(year(?time) AS ?year)
    FILTER(?year=2017)
}
GROUP BY ?country_code ?country_name ?year
ORDER BY ?country_code

This query collects data from the World Bank KG and Wikidata. See below how to execute the query with DeTrusty.

from DeTrusty import get_config, run_query

config = get_config('https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/CoyPu/rdfmts.ttl')
query = 'https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/CoyPu/Q1.rq'
result = run_query(query, config=config, join_stars_locally=False)
print(result)

Q2. GDP and CO2 Emission per Capita in Germany

“Retrieve the GDP per capita and carbon emission per capita for Germany per year.”

PREFIX wb: <http://worldbank.org/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX time: <http://www.w3.org/2006/time#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT ?year (AVG(?value/?population) AS ?gdp_per_capita) (AVG(?value1/?population)*1000000 AS ?carbon_per_capita)
WHERE {
    ?indicator a wb:AnnualIndicatorEntry .
    ?indicator wb:hasIndicator <http://worldbank.org/Indicator/NY.GDP.MKTP.CD> .
    ?indicator wb:hasCountry ?country .
    ?indicator owl:hasValue ?value .
    ?indicator time:year ?year .
    ?country dc:identifier 'DEU' .

    ?indicator1 a wb:AnnualIndicatorEntry .
    ?indicator1 wb:hasIndicator <http://worldbank.org/Indicator/EN.ATM.CO2E.KT> .
    ?indicator1 wb:hasCountry ?country1 .
    ?indicator1 owl:hasValue ?value1 .
    ?indicator1 time:year ?year .
    ?country dc:identifier 'DEU' .

    ?countryWiki wdt:P298 'DEU' .
    ?countryWiki p:P1082 ?itemP .
    ?itemP pq:P585 ?time.
    ?itemP ps:P1082 ?population .
    BIND(year(?time) AS ?year)
}
GROUP BY ?year
ORDER BY ?year

This query collects data from the World Bank KG and Wikidata. See below how to execute the query with DeTrusty.

from DeTrusty import get_config, run_query

config = get_config('https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/CoyPu/rdfmts.ttl')
query = 'https://raw.githubusercontent.com/SDM-TIB/DeTrusty/master/example/CoyPu/Q2.rq'
result = run_query(query, config=config, join_stars_locally=False)
print(result)

Creating Source Descriptions

Note

The goal of this section is to explain how to create the source descriptions when using DeTrusty as a library. We refer the reader to the chapter Source Descriptions for more information about what source descriptions are and why they are needed.

DeTrusty offers the function create_rdfmts for the source description creation.

create_rdfmts(endpoints, output='/DeTrusty/Config/rdfmts.ttl')

Generating rdfmts.ttl, which need to be supplied during query execution using run_query

Parameters:
  • endpoints (list or dict) – The endpoints from which information will be collected.

  • output (str, optional) –

    Path for the generated configuration to be saved at.

    • If not provided: default path will be used, i.e. path/to/DeTrusty-installation/Config/rdfmts.ttl’

    • In case of None: the config will not be saved, instead a TTLConfig object is returned.

Returns:

Only generating return value when output=None, else a file with the configuration is saved at the location provided with the parameter output.

Return type:

TTLConfig, optional

The remainder of this section provides examples showing how to use the function in different scenarios.

Standard Case

The standard case is to include only public SPARQL endpoints in the federation, collect the source description via SPARQL queries, and consider all classes in the endpoints. The following code snippet collects the source description of two public endpoints and saves it in the file ./Config/rdfmts.ttl.

from DeTrusty.Molecule.MTCreation import create_rdfmts

endpoints = ['http://url_to_endpoint_1', 'https://url_to_endpoint_2:port/sparql']
create_rdfmts(endpoints, './Config/rdfmts.ttl')

Alternatively, None can be passed instead of a path. In that case, create_rdfmts returns the source description in DeTrusty’s internal structure. This is helpful if several queries are to be executed in order to avoid reading the source description from a file prior to executing each query.

from DeTrusty.Molecule.MTCreation import create_rdfmts

endpoints = ['http://url_to_endpoint_1', 'https://url_to_endpoint_2:port/sparql']
config = create_rdfmts(endpoints, None)  # returns the configuration instead of writing it to a file

Additionally, the source description object provides a method for saving it to a file.

config.saveToFile('/path/for/new_mts')

Private Endpoints

Starting with version 0.6.0, DeTrusty can handle private endpoints that require authentication. Currently, basic authentication with Base64 encoding as well as token-based authentication via a token server are supported. In the case of token-based authentication, DeTrusty expects the token server to return the token in the response field access_token. Additionally, DeTrusty expects the lifespan of the token in seconds to be returned in the field expires_in. The configuration changes slightly compared to the standard case:

from DeTrusty.Molecule.MTCreation import create_rdfmts

endpoints = {
  'https://url_to_endpoint_1': {
    'keycloak': 'https://url_to_token_server',
    'username': 'YOUR_USERNAME',
    'password': 'YOUR_PASSWORD'
  }
}
create_rdfmts(endpoints, './Config/rdfmts.ttl')

The keys of endpoints are the URLs of the SPARQL endpoints. Each endpoint is represented as a dictionary itself; holding all parameters in the form of (key, value) pairs. keycloak is the URL to the token server, username and password represent the credentials for the token server or SPARQL endpoint.

Note

If your SPARQL endpoint uses basic authentication with Base64 encoding instead of a token server, simply omit keycloak.

Source Descriptions from RML Mappings

Starting with version 0.6.0, DeTrusty can collect the metadata necessary for source selection and decomposition from RML mappings. Collecting metadata from RML mappings instead of the SPARQL endpoint considerably increases the performance of the metadata collection process. Of course, this is only feasible for endpoints that were created using RML mappings.

from DeTrusty.Molecule.MTCreation import create_rdfmts

endpoints = {
  'https://url_to_endpoint_1': {
    'mappings': [
      'path/to/mapping/file1',
      'path/to/mapping/file2'
    ]
  }
}
create_rdfmts(endpoints, './Config/rdfmts.ttl')

The key mappings holds a list of paths to the mapping files that were used to create the RDF data served by the SPARQL endpoint.

Restricting Classes of an Endpoint

Starting with version 0.7.0, DeTrusty can restrict the collection of metadata to specific classes of an endpoint.

Warning

The classes that are to be included in the source description creation process need to be specified using their full URIs.

from DeTrusty.Molecule.MTCreation import create_rdfmts

endpoints = {
  'https://url_to_endpoint_1': {
    'types': [
      'http://example.com/ontology/ClassA',
      'http://example.com/ontology/ClassB'
    ]
  }
}
create_rdfmts(endpoints, './Config/rdfmts.ttl')

The key types holds a list of all the classes of the endpoint that should be considered for the source description creation process.