3. API Documentation

Python library for computing diefficiency metrics dief@t and dief@k.

The metrics dief@t and dief@k allow for measuring the diefficiency during an elapsed time period t or while k answers are produced, respectively. dief@t and dief@k rely on the computation of the area under the curve (AUC) of answer traces, and thus capturing the answer rate concentration over a time interval.

DEFAULT_COLORS = ('#ECC30B', '#D56062', '#84BCDA')

Default colors for printing plots: yellow, red, blue

continuous_efficiency_with_diefk(traces)

Compares dief@k at different answer completeness percentages.

This function repeats the results reported in “Experiment 2” (see [1]). “Experiment 2” measures the continuous efficiency of approaches when producing the first 25%, 50%, 75%, and 100% of the answers.

Parameters

traces (ndarray) – Dataframe with the answer trace. Attributes of the dataframe: test, approach, answer, time.

Returns

Dataframe with all the metrics. The structure is: test, approach, diefk25, diefk50, diefk75, diefk100.

Return type

ndarray

Examples

>>> continuous_efficiency_with_diefk(traces)
diefk(inputtrace, inputtest, k=-1)

Computes the dief@k metric for a specific test at a given number of answers k.

dief@k measures the diefficiency while k answers are produced by computing the area under the curve of the answer traces. By default, the function computes the minimum of the total number of answer produces by the approaches.

Parameters
  • inputtrace (ndarray) – Dataframe with the answer trace. Attributes of the dataframe: test, approach, answer, time.

  • inputtest (str) – Specifies the specific test to analyze from the answer trace.

  • k (int) – Number of answers to compute dief@k for. By default, the function computes the minimum of the total number of answers produced by the approaches.

Returns

Dataframe with the dief@k values for each approach. Attributes of the dataframe: test, approach, diefk.

Return type

ndarray

Examples

>>> diefk(traces, "Q9.sparql")
>>> diefk(traces, "Q9.sparql", 1000)
diefk2(inputtrace, inputtest, kp=-1.0)

Computes the dief@k metric for a specific test at a given percentage of answers kp.

dief@k measures the diefficiency while the first kp percent of answers are produced by computing the area under the curve of the answer traces. By default, this function behaves the same as diefk. This also holds for kp = 1.0. The function computes the portion kp of the minimum number of answers produces by the approaches.

Parameters
  • inputtrace (ndarray) – Dataframe with the answer trace. Attributes of the dataframe: test, approach, answer, time.

  • inputtest (str) – Specifies the specific test to analyze from the answer trace.

  • kp (float) – Ratio of answers to compute dief@k for (kp in [0.0;1.0]). By default and when kp=1.0, this function behaves the same as diefk. It computes the kp portion of the minimum number of answers produced by the approaches.

Returns

Dataframe with the dief@k values for each approach. Attributes of the dataframe: test, approach, diefk.

Return type

ndarray

Examples

>>> diefk2(traces, "Q9.sparql")
>>> diefk2(traces, "Q9.sparql", 0.25)
dieft(inputtrace, inputtest, t=-1.0, continue_to_end=True)

Computes the dief@t metric for a specific test at a given time point t.

dief@t measures the diefficiency during an exlapsed time period t by computing the area under the curve of the answer traces. By default, the function computes the maximum of the execution time among the approaches in the answer trace, i.e., until the point in time when the slowest approach finishes.

Parameters
  • inputtrace (ndarray) – Dataframe with the answer trace. Attributes of the dataframe: test, approach, answer, time.

  • inputtest (str) – Specifies the specific test to analyze from the answer trace.

  • t (float) – Point in time to compute dief@t for. By default, the function computes the maximum of the execution time among the approaches in the answer trace.

  • continue_to_end (bool) – Indicates whether the AUC should be continued until the end of the time frame

Returns

Dataframe with the dief@t values for each approach. Attributes of the dataframe: test, approach, dieft.

Return type

ndarray

Examples

>>> dieft(traces, "Q9.sparql")
>>> dieft(traces, "Q9.sparql", 7.5)
load_metrics(filename)

Reads the other metrics from a CSV file.

Conventional query performance measurements. The attribues of the file specified in the header are expected to be:

  • test: the name of the executed test

  • approach: the name of the approach executed

  • tfft: time elapsed until the first answer was generated

  • totaltime: time elapsed until the last answer was generated

  • comp: number of answers produced

Parameters

filename (str) – Path to the CSV file that contains the other metrics. Attributes of the file specified in the header: test, approach, tfft, totaltime, comp.

Returns

Dataframe with the other metrics. Attributes of the dataframe: test, approach, tfft, totaltime, comp.

Return type

ndarray

Examples

>>> load_trace("data/metrics.csv")
load_trace(filename)

Reads answer traces from a CSV file.

Answer traces record the points in time when an approach produces an answer. The attribues of the file specified in the header are expected to be:

  • test: the name of the executed test

  • approach: the name of the approach executed

  • answer: the number of the answer produced

  • time: time elapsed from the start of the execution until the generation of the answer

Parameters

filename (str) – Path to the CSV file that contains the answer traces. Attributes of the file specified in the header: test, approach, answer, time.

Returns

Dataframe with the answer trace. Attributes of the dataframe: test, approach, answer, time.

Return type

ndarray

Examples

>>> load_trace("data/traces.csv")
performance_of_approaches_with_dieft(traces, metrics, continue_to_end=True)

Compares dief@t with other conventional metrics used in query performance analysis.

This function repeats the results reported in “Experiment 1” of [1]. “Experiment 1” compares the performance of testing approaches when using metrics defined in the literature (total execution time, time for the first tuple, throughput, and completeness) and the metric dieft@t.

Parameters
  • traces (ndarray) – Dataframe with the answer trace. Attributes of the dataframe: test, approach, answer, time.

  • metrics (ndarray) – Metrics dataframe with the result of the other metrics. The structure is as follows: test, approach, tfft, totaltime, comp.

  • continue_to_end (bool) – Indicates whether the AUC should be continued until the end of the time frame

Returns

Dataframe with all the metrics. The structure is: test, approach, tfft, totaltime, comp, throughput, invtfft, invtotaltime, dieft

Return type

ndarray

Examples

>>> performance_of_approaches_with_dieft(traces, metrics)
plot_all_answer_traces(inputtrace, colors=('#ECC30B', '#D56062', '#84BCDA'))

Plots the answer traces of all tests; one plot per test.

Answer traces record the points in time when an approach produces an answer. This function generates one plot per test showing the answer traces of all approaches for that specific test.

Parameters
  • inputtrace (ndarray) – Dataframe with the answer trace. Attributes of the dataframe: test, approach, answer, time.

  • colors (list) – List of colors to use for the different approaches.

Returns

Plot of the answer traces of each approach when evaluating the input test.

Return type

list

Examples

>>> plot_all_answer_traces(traces)
>>> plot_all_answer_traces(traces, ["#ECC30B","#D56062","#84BCDA"])
plot_all_continuous_efficiency_with_diefk(diefkDF, colors=('#ECC30B', '#D56062', '#84BCDA'))

Generates radar plots that compare dief@k at different answer completeness percentages; one per test.

This function plots the results reported in “Experiment 2” (see [1]). “Experiment 2” measures the continuous efficiency of approaches when producing the first 25%, 50%, 75%, and 100% of the answers.

Parameters
  • diefkDF (ndarray) – Dataframe with the results from “Experiment 2”.

  • colors (list) – List of colors to use for the different approaches.

Returns

List of matplotlib plots (one per test) over the provided metrics.

Return type

list

Examples

>>> plot_all_continuous_efficiency_with_diefk(diefkDF)
>>> plot_all_continuous_efficiency_with_diefk(diefkDF, ["#ECC30B","#D56062","#84BCDA"])
plot_all_performance_of_approaches_with_dieft(allmetrics, colors=('#ECC30B', '#D56062', '#84BCDA'))

Generates radar plots that compare dief@t with conventional metrics; one plot per test.

This function plots the results reported in “Experiment 1” (see [1]). “Experiment 1” compares the performance of testing approaches when using metrics defined in the literature (total execution time, time for the first tuple, throughput, and completeness) and the metric dieft@t.

Parameters
  • allmetrics (ndarray) – Dataframe with all the metrics from “Experiment 1”.

  • colors (list) – List of colors to use for the different approaches.

Returns

List of matplotlib radar plots (one per test) over the provided metrics.

Return type

list

Examples

>>> plot_all_performance_of_approaches_with_dieft(extended_metrics)
>>> plot_all_performance_of_approaches_with_dieft(extended_metrics, ["#ECC30B","#D56062","#84BCDA"])
plot_answer_trace(inputtrace, inputtest, colors=('#ECC30B', '#D56062', '#84BCDA'))

Plots the answer trace of a given test for all approaches.

Answer traces record the points in time when an approach produces an answer. The plot generated by this function shows the answer traces of all approaches for the same test, e.g., execution of a specific query.

Parameters
  • inputtrace (ndarray) – Dataframe with the answer trace. Attributes of the dataframe: test, approach, answer, time.

  • inputtest (str) – Specifies the specific test to analyze from the answer trace.

  • colors (list) – List of colors to use for the different approaches.

Returns

Plot of the answer traces of each approach when evaluating the input test.

Return type

Figure

Examples

>>> plot_answer_trace(traces, "Q9.sparql")
>>> plot_answer_trace(traces, "Q9.sparql", ["#ECC30B","#D56062","#84BCDA"])
plot_continuous_efficiency_with_diefk(diefkDF, q, colors=('#ECC30B', '#D56062', '#84BCDA'))

Generates a radar plot that compares dief@k at different answer completeness percentages for a specific test.

This function plots the results reported for a single given test in “Experiment 2” (see [1]). “Experiment 2” measures the continuous efficiency of approaches when producing the first 25%, 50%, 75%, and 100% of the answers.

Parameters
  • diefkDF (ndarray) – Dataframe with the results from “Experiment 2”.

  • q (str) – ID of the selected test to plot.

  • colors (list) – List of colors to use for the different approaches.

Returns

Matplotlib plot for the specified test over the provided metrics.

Return type

Figure

Examples

>>> plot_continuous_efficiency_with_diefk(diefkDF, "Q9.sparql")
>>> plot_continuous_efficiency_with_diefk(diefkDF, "Q9.sparql", ["#ECC30B","#D56062","#84BCDA"])
plot_execution_time(metrics, colors=('#ECC30B', '#D56062', '#84BCDA'), log_scale=False)

Creates a bar chart with the overall execution time for all the tests and approaches in the metrics data.

Bar chart presenting the conventional performance measure execution time. Each test is represented as a group of bars representing the approaches.

Parameters
  • metrics (ndarray) – Dataframe with the metrics. Attributes of the dataframe: test, approach, tfft, totaltime, comp.

  • colors (list) – List of colors to use for the different approaches.

  • log_scale (bool) – (optional) If log_scale is set to True, logarithmic scale for the y-axis will be used.

Returns

Plot of the execution time for all tests and approaches in the metrics data provided.

Return type

Figure

Examples

>>> plot_execution_time(metrics)
>>> plot_execution_time(metrics, ["#ECC30B","#D56062","#84BCDA"])
>>> plot_execution_time(metrics, log_scale=True)
>>> plot_execution_time(metrics, ["#ECC30B","#D56062","#84BCDA"], log_scale=True)
plot_performance_of_approaches_with_dieft(allmetrics, q, colors=('#ECC30B', '#D56062', '#84BCDA'))

Generates a radar plot that compares dief@t with conventional metrics for a specific test.

This function plots the results reported for a single given test in “Experiment 1” (see [1]). “Experiment 1” compares the performance of testing approaches when using metrics defined in the literature (total execution time, time for the first tuple, throughput, and completeness) and the metric dieft@t.

Parameters
  • allmetrics (ndarray) – Dataframe with all the metrics from “Experiment 1”.

  • q (str) – ID of the selected test to plot.

  • colors (list) – List of colors to use for the different approaches.

Returns

Matplotlib radar plot for the specified test over the provided metrics.

Return type

Figure

Examples

>>> plot_performance_of_approaches_with_dieft(extended_metrics, "Q9.sparql")
>>> plot_performance_of_approaches_with_dieft(extended_metrics, "Q9.sparql", ["#ECC30B","#D56062","#84BCDA"])
sorted_alphanumeric(list_)

Sorts a list alphanumerically.

The given list is sorted alphanumerically. This is done using two lambda expressions for the sorting key.

Parameters

list – The list to be sorted.

Returns

The alphanumerically sorted list.

Examples

>>> sorted_alphanumeric(['Q1', 'Q3', 'Q10', 'Q2'])
['Q1', 'Q2', 'Q3', 'Q10']
>>> sorted_alphanumeric(['A', '1', '10', '2', '100', 'B', 'a', 'Hello', 'Q1', 'Q2'])
['1', '2', '10', '100', 'A', 'a', 'B', 'Hello', 'Q1', Q2']
>>> sorted_alphanumeric(['1.0.0', '0.9.0', '1.2.1', '1.2.0'])
['0.9.0', '1.0.0', '1.2.0', '1.2.1']