3. API Documentation
Python library for computing diefficiency metrics dief@t and dief@k.
The metrics dief@t and dief@k allow for measuring the diefficiency during an elapsed time period t or while k answers are produced, respectively. dief@t and dief@k rely on the computation of the area under the curve (AUC) of answer traces, and thus capturing the answer rate concentration over a time interval.
- DEFAULT_COLORS = ('#ECC30B', '#D56062', '#84BCDA')
Default colors for printing plots: yellow, red, blue
- continuous_efficiency_with_diefk(traces)
Compares dief@k at different answer completeness percentages.
This function repeats the results reported in “Experiment 2” (see [1]). “Experiment 2” measures the continuous efficiency of approaches when producing the first 25%, 50%, 75%, and 100% of the answers.
- Parameters
traces (ndarray) – Dataframe with the answer trace. Attributes of the dataframe: test, approach, answer, time.
- Returns
Dataframe with all the metrics. The structure is: test, approach, diefk25, diefk50, diefk75, diefk100.
- Return type
Examples
>>> continuous_efficiency_with_diefk(traces)
- diefk(inputtrace, inputtest, k=-1)
Computes the dief@k metric for a specific test at a given number of answers k.
dief@k measures the diefficiency while k answers are produced by computing the area under the curve of the answer traces. By default, the function computes the minimum of the total number of answer produces by the approaches.
- Parameters
inputtrace (ndarray) – Dataframe with the answer trace. Attributes of the dataframe: test, approach, answer, time.
inputtest (str) – Specifies the specific test to analyze from the answer trace.
k (int) – Number of answers to compute dief@k for. By default, the function computes the minimum of the total number of answers produced by the approaches.
- Returns
Dataframe with the dief@k values for each approach. Attributes of the dataframe: test, approach, diefk.
- Return type
Examples
>>> diefk(traces, "Q9.sparql") >>> diefk(traces, "Q9.sparql", 1000)
- diefk2(inputtrace, inputtest, kp=-1.0)
Computes the dief@k metric for a specific test at a given percentage of answers kp.
dief@k measures the diefficiency while the first kp percent of answers are produced by computing the area under the curve of the answer traces. By default, this function behaves the same as
diefk
. This also holds for kp = 1.0. The function computes the portion kp of the minimum number of answers produces by the approaches.- Parameters
inputtrace (ndarray) – Dataframe with the answer trace. Attributes of the dataframe: test, approach, answer, time.
inputtest (str) – Specifies the specific test to analyze from the answer trace.
kp (float) – Ratio of answers to compute dief@k for (kp in [0.0;1.0]). By default and when kp=1.0, this function behaves the same as diefk. It computes the kp portion of the minimum number of answers produced by the approaches.
- Returns
Dataframe with the dief@k values for each approach. Attributes of the dataframe: test, approach, diefk.
- Return type
Examples
>>> diefk2(traces, "Q9.sparql") >>> diefk2(traces, "Q9.sparql", 0.25)
- dieft(inputtrace, inputtest, t=-1.0, continue_to_end=True)
Computes the dief@t metric for a specific test at a given time point t.
dief@t measures the diefficiency during an exlapsed time period t by computing the area under the curve of the answer traces. By default, the function computes the maximum of the execution time among the approaches in the answer trace, i.e., until the point in time when the slowest approach finishes.
- Parameters
inputtrace (ndarray) – Dataframe with the answer trace. Attributes of the dataframe: test, approach, answer, time.
inputtest (str) – Specifies the specific test to analyze from the answer trace.
t (float) – Point in time to compute dief@t for. By default, the function computes the maximum of the execution time among the approaches in the answer trace.
continue_to_end (bool) – Indicates whether the AUC should be continued until the end of the time frame
- Returns
Dataframe with the dief@t values for each approach. Attributes of the dataframe: test, approach, dieft.
- Return type
Examples
>>> dieft(traces, "Q9.sparql") >>> dieft(traces, "Q9.sparql", 7.5)
- load_metrics(filename)
Reads the other metrics from a CSV file.
Conventional query performance measurements. The attribues of the file specified in the header are expected to be:
test: the name of the executed test
approach: the name of the approach executed
tfft: time elapsed until the first answer was generated
totaltime: time elapsed until the last answer was generated
comp: number of answers produced
- Parameters
filename (str) – Path to the CSV file that contains the other metrics. Attributes of the file specified in the header: test, approach, tfft, totaltime, comp.
- Returns
Dataframe with the other metrics. Attributes of the dataframe: test, approach, tfft, totaltime, comp.
- Return type
Examples
>>> load_trace("data/metrics.csv")
- load_trace(filename)
Reads answer traces from a CSV file.
Answer traces record the points in time when an approach produces an answer. The attribues of the file specified in the header are expected to be:
test: the name of the executed test
approach: the name of the approach executed
answer: the number of the answer produced
time: time elapsed from the start of the execution until the generation of the answer
- Parameters
filename (str) – Path to the CSV file that contains the answer traces. Attributes of the file specified in the header: test, approach, answer, time.
- Returns
Dataframe with the answer trace. Attributes of the dataframe: test, approach, answer, time.
- Return type
Examples
>>> load_trace("data/traces.csv")
- performance_of_approaches_with_dieft(traces, metrics, continue_to_end=True)
Compares dief@t with other conventional metrics used in query performance analysis.
This function repeats the results reported in “Experiment 1” of [1]. “Experiment 1” compares the performance of testing approaches when using metrics defined in the literature (total execution time, time for the first tuple, throughput, and completeness) and the metric dieft@t.
- Parameters
traces (ndarray) – Dataframe with the answer trace. Attributes of the dataframe: test, approach, answer, time.
metrics (ndarray) – Metrics dataframe with the result of the other metrics. The structure is as follows: test, approach, tfft, totaltime, comp.
continue_to_end (bool) – Indicates whether the AUC should be continued until the end of the time frame
- Returns
Dataframe with all the metrics. The structure is: test, approach, tfft, totaltime, comp, throughput, invtfft, invtotaltime, dieft
- Return type
Examples
>>> performance_of_approaches_with_dieft(traces, metrics)
- plot_all_answer_traces(inputtrace, colors=('#ECC30B', '#D56062', '#84BCDA'))
Plots the answer traces of all tests; one plot per test.
Answer traces record the points in time when an approach produces an answer. This function generates one plot per test showing the answer traces of all approaches for that specific test.
- Parameters
- Returns
Plot of the answer traces of each approach when evaluating the input test.
- Return type
Examples
>>> plot_all_answer_traces(traces) >>> plot_all_answer_traces(traces, ["#ECC30B","#D56062","#84BCDA"])
- plot_all_continuous_efficiency_with_diefk(diefkDF, colors=('#ECC30B', '#D56062', '#84BCDA'))
Generates radar plots that compare dief@k at different answer completeness percentages; one per test.
This function plots the results reported in “Experiment 2” (see [1]). “Experiment 2” measures the continuous efficiency of approaches when producing the first 25%, 50%, 75%, and 100% of the answers.
- Parameters
- Returns
List of matplotlib plots (one per test) over the provided metrics.
- Return type
Examples
>>> plot_all_continuous_efficiency_with_diefk(diefkDF) >>> plot_all_continuous_efficiency_with_diefk(diefkDF, ["#ECC30B","#D56062","#84BCDA"])
- plot_all_performance_of_approaches_with_dieft(allmetrics, colors=('#ECC30B', '#D56062', '#84BCDA'))
Generates radar plots that compare dief@t with conventional metrics; one plot per test.
This function plots the results reported in “Experiment 1” (see [1]). “Experiment 1” compares the performance of testing approaches when using metrics defined in the literature (total execution time, time for the first tuple, throughput, and completeness) and the metric dieft@t.
- Parameters
- Returns
List of matplotlib radar plots (one per test) over the provided metrics.
- Return type
Examples
>>> plot_all_performance_of_approaches_with_dieft(extended_metrics) >>> plot_all_performance_of_approaches_with_dieft(extended_metrics, ["#ECC30B","#D56062","#84BCDA"])
- plot_answer_trace(inputtrace, inputtest, colors=('#ECC30B', '#D56062', '#84BCDA'))
Plots the answer trace of a given test for all approaches.
Answer traces record the points in time when an approach produces an answer. The plot generated by this function shows the answer traces of all approaches for the same test, e.g., execution of a specific query.
- Parameters
- Returns
Plot of the answer traces of each approach when evaluating the input test.
- Return type
Examples
>>> plot_answer_trace(traces, "Q9.sparql") >>> plot_answer_trace(traces, "Q9.sparql", ["#ECC30B","#D56062","#84BCDA"])
- plot_continuous_efficiency_with_diefk(diefkDF, q, colors=('#ECC30B', '#D56062', '#84BCDA'))
Generates a radar plot that compares dief@k at different answer completeness percentages for a specific test.
This function plots the results reported for a single given test in “Experiment 2” (see [1]). “Experiment 2” measures the continuous efficiency of approaches when producing the first 25%, 50%, 75%, and 100% of the answers.
- Parameters
- Returns
Matplotlib plot for the specified test over the provided metrics.
- Return type
Examples
>>> plot_continuous_efficiency_with_diefk(diefkDF, "Q9.sparql") >>> plot_continuous_efficiency_with_diefk(diefkDF, "Q9.sparql", ["#ECC30B","#D56062","#84BCDA"])
- plot_execution_time(metrics, colors=('#ECC30B', '#D56062', '#84BCDA'), log_scale=False)
Creates a bar chart with the overall execution time for all the tests and approaches in the metrics data.
Bar chart presenting the conventional performance measure execution time. Each test is represented as a group of bars representing the approaches.
- Parameters
- Returns
Plot of the execution time for all tests and approaches in the metrics data provided.
- Return type
Examples
>>> plot_execution_time(metrics) >>> plot_execution_time(metrics, ["#ECC30B","#D56062","#84BCDA"]) >>> plot_execution_time(metrics, log_scale=True) >>> plot_execution_time(metrics, ["#ECC30B","#D56062","#84BCDA"], log_scale=True)
- plot_performance_of_approaches_with_dieft(allmetrics, q, colors=('#ECC30B', '#D56062', '#84BCDA'))
Generates a radar plot that compares dief@t with conventional metrics for a specific test.
This function plots the results reported for a single given test in “Experiment 1” (see [1]). “Experiment 1” compares the performance of testing approaches when using metrics defined in the literature (total execution time, time for the first tuple, throughput, and completeness) and the metric dieft@t.
- Parameters
- Returns
Matplotlib radar plot for the specified test over the provided metrics.
- Return type
Examples
>>> plot_performance_of_approaches_with_dieft(extended_metrics, "Q9.sparql") >>> plot_performance_of_approaches_with_dieft(extended_metrics, "Q9.sparql", ["#ECC30B","#D56062","#84BCDA"])
- sorted_alphanumeric(list_)
Sorts a list alphanumerically.
The given list is sorted alphanumerically. This is done using two lambda expressions for the sorting key.
- Parameters
list – The list to be sorted.
- Returns
The alphanumerically sorted list.
Examples
>>> sorted_alphanumeric(['Q1', 'Q3', 'Q10', 'Q2']) ['Q1', 'Q2', 'Q3', 'Q10'] >>> sorted_alphanumeric(['A', '1', '10', '2', '100', 'B', 'a', 'Hello', 'Q1', 'Q2']) ['1', '2', '10', '100', 'A', 'a', 'B', 'Hello', 'Q1', Q2'] >>> sorted_alphanumeric(['1.0.0', '0.9.0', '1.2.1', '1.2.0']) ['0.9.0', '1.0.0', '1.2.0', '1.2.1']