Trav-SHACL as a Library
Installation
If you want to use Trav-SHACL as a library, you can install it from its source code on GitHub or download the package from PyPI.
Requirements
Trav-SHACL is implemented in Python3.
The current version supports .
Trav-SHACL uses the rdflib
library for parsing the SHACL shape schema and the SPARQLWrapper
library for contacting SPARQL endpoints.
Local Source Code
You can install Trav-SHACL from your local source code by performing the following steps.
git clone git@github.com:SDM-TIB/Trav-SHACL.git
cd Trav-SHACL
python -m pip install -e .
GitHub
Trav-SHACL can also be installed from its source code in GitHub without explicitly cloning the repository:
python -m pip install -e 'git+https://github.com/SDM-TIB/Trav-SHACL#egg=DeTrusty'
PyPI
The easiest way to install Trav-SHACL is to download the package from PyPI:
python -m pip install TravSHACL
Testing
In order to run the test suite, clone the Trav-SHACL repository from GitHub as explained above (see Local Source Code). Install the production and development dependencies as shown below.
pip3 install -r requirements.txt -r requirements-dev.txt
Then, start the Docker container with the test data.
If you are changing the host port of the test data container, make sure to also change the port in test_cases.py
.
Otherwise, Trav-SHACL cannot connect to the SPARQL endpoint provided by the container and the test suite will fail.
docker-compose -f tests/docker-compose.yml up -d
Finally, run the tests by executing the following command.
pytest
Example
You can run Trav-SHACL over example data provided in the GitHub repository. While it is necessary to clone the repository (or download the example folder) from GitHub, you can still install Trav-SHACL from PyPI. Note that you might want to create a virtual environment for Trav-SHACL. In order to serve the example data, you need to have Docker installed.
Preparing the Data
Assuming your current working directory contains the example
folder, you can start the Docker container serving the SPARQL endpoint with the example data as shown below:
docker-compose -f ./example/docker-compose.yml up -d example_data
Note
The SPARQL endpoint might take a few seconds in order to be started. You can check its accessibility by navigating to http://localhost:9090/sparql
Code
Now the data is accessible and you can validate it against the provided example shapes.
from TravSHACL import parse_heuristics, GraphTraversal, ShapeSchema
prio_target = 'TARGET' # shapes with target definition are preferred, alternative value: ''
prio_degree = 'IN' # shapes with a higher in-degree are prioritized, alternative value 'OUT'
prio_number = 'BIG' # shapes with many constraints are evaluated first, alternative value 'SMALL'
shape_schema = ShapeSchema(
schema_dir='./shapes/LUBM',
endpoint='http://localhost:9090/sparql',
endpoint_user=None, # username if validating a private endpoint
endpoint_password=None, # password if validating a private endpoint
graph_traversal=GraphTraversal.DFS,
heuristics=parse_heuristics(prio_target + ' ' + prio_degree + ' ' + prio_number),
use_selective_queries=True,
max_split_size=256,
output_dir='./result/', # directory where the output files will be stored
order_by_in_queries=False, # sort the results of SPARQL queries in order to ensure the same order across several runs
save_outputs=True # save outputs to output_dir, alternative value: False
)
result = shape_schema.validate() # validate the SHACL shape schema
print(result)
Note
All parameters of ShapeSchema
are keyword-only. The only required parameters are schema_dir
and endpoint
.
Instead of passing a directory containing the shapes, Trav-SHACL also accepts instances of rdflib.Graph
.
The following code executes the example from above by passing an rdflib.Graph
.
import os
from rdflib import Graph
from TravSHACL import parse_heuristics, GraphTraversal, ShapeSchema
prio_target = 'TARGET' # shapes with target definition are preferred, alternative value: ''
prio_degree = 'IN' # shapes with a higher in-degree are prioritized, alternative value 'OUT'
prio_number = 'BIG' # shapes with many constraints are evaluated first, alternative value 'SMALL'
schema_dir = './shapes/LUBM'
shapes_graph = Graph()
for shape_file in os.listdir(schema_dir):
shapes_graph.parse(os.path.join(schema_dir, shape_file)) # reading all shape files into the shapes graph
shape_schema = ShapeSchema(
schema_dir=shapes_graph, # passing an RDFlib graph containing the shapes
endpoint='http://localhost:9090/sparql',
graph_traversal=GraphTraversal.DFS,
heuristics=parse_heuristics(prio_target + ' ' + prio_degree + ' ' + prio_number),
use_selective_queries=True,
max_split_size=256,
output_dir='./result/', # directory where the output files will be stored
order_by_in_queries=False, # sort the results of SPARQL queries in order to ensure the same order across several runs
save_outputs=True # save outputs to output_dir, alternative value: False
)
result = shape_schema.validate() # validate the SHACL shape schema
print(result)
Parameters
Before executing the above script, let us have a look at the different parameters.
schema_dir
path to the directory containing the shape files (or an RDFlib graph)endpoint
URL of the endpoint to evaluated; alternatively, an RDFLib graph can be passedendpoint_user
(optional) username if validating a private endpoint; default:None
endpoint_password
(optional) password if validating a private endpoint; default:None
graph_traversal
(optional) defines the graph traversal algorithm to be used, is one of[GraphTraversal.BFS, GraphTraversal.DFS]
; default:GraphTraversal.DFS
heuristics
(optional) used to determine the seed shape. Use the methodparse_heuristics
with a string in order to set the desired heuristics; default:parse_heuristics('TARGET IN BIG')
.TARGET
if shapes with a target definition should be prioritized, otherwise omitprioritize in- or outdegree of shapes, one of
[IN, OUT]
or to be omittedprioritize shapes based on their number of constraints, one of
[BIG, SMALL]
or to be omitted
use_selective_queries
(optional) use more selective constraint queries, is one of[True, False]
; default:True
max_split_size
(optional) maximum number of entities in FILTER or VALUES clause of a SPARQL query; default:256
output_dir
(optional) directory where the output files will be stored; default:None
order_by_in_queries
(optional) sort the results of all SPARQL queries, ensures the same order in the result logs over several runs, is one of[True, False]
; default:False
save_outputs
(optional) creates one file each for violated and validated targets, otherwise only statistics and traces will be stored, is one of[True, False]
; default:False
Results: Internal Structure
Executing the above code from within the example
folder will print the validation result using the internal representation.
More insights can be found in the various files that are generated in result
.
Let us discuss the printed result first.
{
'<http://example.org/GraduateCourseShape>': {
'valid_instances': {
('<http://example.org/FullProfessorShape>', 'http://www.Department0.University0.edu/FullProfessor9', True),
('<http://example.org/GraduateCourseShape>', 'http://www.Department0.University0.edu/GraduateCourse3', True),
('<http://example.org/GraduateCourseShape>', 'http://www.Department0.University0.edu/GraduateCourse16', True)
('<http://example.org/GraduateCourseShape>', 'http://www.Department0.University0.edu/GraduateCourse33', True),
('<http://example.org/GraduateCourseShape>', 'http://www.Department0.University0.edu/GraduateCourse42', True),
('<http://example.org/GraduateStudentShape>', 'http://www.Department0.University0.edu/GraduateStudent0', True),
('<http://example.org/GraduateStudentShape>', 'http://www.Department0.University0.edu/GraduateStudent91', True),
('<http://example.org/FullProfessorShape>', 'http://www.Department9.University0.edu/FullProfessor1', True),
('<http://example.org/GraduateCourseShape>', 'http://www.Department9.University0.edu/GraduateCourse1', True),
('<http://example.org/GraduateStudentShape>', 'http://www.Department9.University0.edu/GraduateStudent28', True)
},
'invalid_instances': {
('<http://example.org/GraduateStudentShape>', 'http://www.Department9.University0.edu/GraduateStudent5', True)
}
},
'<http://example.org/UniversityShape>': {
'valid_instances': {
('<http://example.org/DepartmentShape>', 'http://www.Department0.University0.edu', True),
('<http://example.org/DepartmentShape>', 'http://www.Department1.University0.edu', True),
('<http://example.org/DepartmentShape>', 'http://www.Department9.University0.edu', True),
('<http://example.org/UniversityShape>', 'http://www.University0.edu', True)
},
'invalid_instances': {
('<http://example.org/UniversityShape>', 'http://www.University1.edu', True),
('<http://example.org/UniversityShape>', 'http://www.University2.edu', True),
('<http://example.org/UniversityShape>', 'http://www.University3.edu', True),
('<http://example.org/UniversityShape>', 'http://www.University4.edu', True)
}
},
'<http://example.org/DepartmentShape>': {
'valid_instances': set(),
'invalid_instances': set()
},
'<http://example.org/GraduateStudentShape>': {
'valid_instances': set(),
'invalid_instances': {
('<http://example.org/GraduateStudentShape>', 'http://www.Department2.University0.edu/GraduateStudent7', True)
}
},
'<http://example.org/FullProfessorShape>': {
'valid_instances': set(),
'invalid_instances': {
('<http://example.org/FullProfessorShape>', 'http://www.Department1.University0.edu/FullProfessor0', True),
('<http://example.org/FullProfessorShape>', 'http://www.Department2.University0.edu/FullProfessor2', True),
('<http://example.org/FullProfessorShape>', 'http://www.Department5.University0.edu/FullProfessor3', True)
}
},
'unbound': {
'valid_instances': set()
}
}
The keys of the dictionary correspond to the names of the validated shapes.
For each shape, Trav-SHACL records the entities that have either been validated (valid_instances
) or violated (invalid_instances
).
Such a record is a tuple containing the name of the shape which the entity belongs to, the identifier of the entity itself, and True
.
Trav-SHACL keeps this structure since the validation result of an entity may rely on the satisfaction of an entity from another shape.
If the other entity has not yet been validated, the validation is postponed until the needed validation result is available.
After evaluating all shapes, entities with pending decisions are marked as valid since no violations were found.
These entities are recorded in unbound
.
Results: Output Files
Additionally, Trav-SHACL stores the following files:
stats.txt
contains statistics about the validation, likenumber of targets
number of valid targets
number of invalid targets
number of executed queries
number of generated rules
maximum query execution time (in ms)
total query execution time (in ms)
total saturation time (in ms)
total validation time (in ms)
targets_valid.log
contains all valid targets (one per line) in the form shape_name`(`entity)targets_invalid.log
contains all invalid targets (one per line) in the form shape_name`(`entity)validation.log
log file containing the node order, executed queries, etc.validationReport.ttl
a validation report in Turtle format that adheres to the SHACL specification