Iterative Optimization Heuristics Profiler
The performance analyzer for Iterative Optimization Heuristics (IOHs). It provides:
 a webbased interface to analyze and visualize the empirical performance of IOHs
 interactive plot
 statistical evaluation
 report generation
R
programming interfaces in the backend
Development Team
 Hao Wang, Leiden Institute of Advanced Computer Science.
 Diederick Vermetten, Leiden Institute of Advanced Computer Science.
 Furong Ye, Leiden Institute of Advanced Computer Science.
 Carola Doerr, CNRS and Sorbonne University.
 Ofer Shir Migal  The Galilee Research Institute, TelHai College.
 Thomas Bäck, Leiden Institute of Advanced Computer Science.
Installation
Software dependency
 [mandatory]
R
As IOHanalyzer is written as aR
package, theR
environment has to be installed first. The binary file and installation manual for R can be found here https://cran.rproject.org/.  [optional]
orca
required to download plotly figures. Please see https://github.com/plotly/orca for the installation instruction.  [optional]
inkscape
required to support pdf and eps figure format in downloading. Please visit the Inskscape Wiki page http://wiki.inkscape.org/wiki/index.php/Installing_Inkscape for the detail.
Host it online yourself?
We provide docker file for deploying IOHanalyzer on the server. Please see https://github.com/IOHprofiler/IOHanalyzerdocker for details.
Reference
When using IOHprofiler and parts thereof, please kindly cite this work as
Carola Doerr, Hao Wang, Furong Ye, Sander van Rijn, Thomas Bäck: IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics, arXiv eprints:1810.05281, 2018.
@ARTICLE{IOHprofiler,
author = {Carola Doerr and Hao Wang and Furong Ye and Sander van Rijn and Thomas B{\"a}ck},
title = {{IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics}},
journal = {arXiv eprints:1810.05281},
archivePrefix = "arXiv",
eprint = {1810.05281},
year = 2018,
month = oct,
keywords = {Computer Science  Neural and Evolutionary Computing},
url = {https://arxiv.org/abs/1810.05281}
}
Acknowledgements
This work was supported by the Chinese scholarship council (CSC No. 201706310143), a public grant as part of the Investissement d’avenir project, reference ANR11LABX0056LMH, LabEx LMH, in a joint call with the Gaspard Monge Program for optimization, operations research, and their interactions with data sciences, by Paris IledeFrance Region, and by COST Action CA15140 “Improving Applicability of NatureInspired Optimisation by Joining Theory and Practice (ImAppNIO)”.
License
This application is governed by the BSD 3Clause license.
BSD 3Clause License
Copyright © 2018, All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Welcome to IOHanalyzer!
This the analyzer for the empirical performance of Iterative Optimization Heuristics (IOHs). It provides two perspectives to look at the empirical performance:
 FixedTarget running time: focuses on the distribution of running time (a.k.a. the number of function evaluations) at various given target values (thus called “fixedtarget”), functions and dimensions.
 FixedBudget results: focuses on the distribution of the best function value at various given budget values (thus called “fixedbudget”), functions and dimensions.
To get started, you could
 upload your own dataset using the Upload Data box on the bottom left. The supported format is specified in the data format tab.
 or choose one of the preloaded datasets in the Load Data from Repository box on the bottom right and directly explore the analyzer.
In details, the following functionalities are provided:
Data Summary  Expected Values  Probability Density/Mass Function  Empirical Cumulative Distribution (ECDF)  

FixedTarget running time  Summary of running times  Expected Running Time (ERT)  Probability Mass Function of running time  ECDF of running times 
FixedBudget results  Summary of function values  Expected Target Value  Probability Density Function of target values  ECDF of targets times 
For more information on the features of IOHanalyzer and how to use them, as well as a full specification of the accepted data formats, please visit our wiki.
Upload Data
Load from repositories
Data Processing Prompt
List of Processed Data
Overview of All Available Functions
Overview of Selected Function
Data Overview
This table provides an overview on function values for all algorithms chosen on the left:
 worst recorded: the worst \(f(x)\) value ever recorded across all iterations,
 worst reached: the worst \(f(x)\) value reached in the last iterations,
 best reached: the best \(f(x)\) value reached in the last iterations,
 mean reached: the mean \(f(x)\) value reached in the last iterations,
 median reached: the median \(f(x)\) value reached in the last iterations,
 succ: the number of runs which successfully hit the best reached \(f(x)\).
Runtime Statistics at Chosen Target Values
This table summarizes for each algorithm and each target value chosen on the left:
 runs: the number of runs that have found at least one solution of the required target quality \(f(x)\),
 mean: the average number of function evaluations needed to find a solution of function value at least \(f(x)\)
 median, \(2\%, 5\%,\ldots,98\%\) : the quantiles of these firsthitting times
When not all runs managed to find the target value, the statistics hold only for those runs that did. That is, the mean value is the mean of the successful runs. Same for the quantiles. An alternative version with simulated restarts is currently in preparation.
Original Runtime Samples
This table shows for each selected algorithm \(A\), each selected target value \(f(x)\), and each run \(r\) the number \(T(A,f(x),r)\) of evaluations performed by the algorithm until it evaluated for the first time a solution of quality at least \(f(x)\).
Expected Runtime (ERT): single function
The mean, median, standard deviation and ERT of the runtime samples are depicted against the best objective values. The displayed elements (mean, median, standard deviations and ERT) can be switched on and off by clicking on the legend on the right. A tooltip and toolbar appears when hovering over the figure.
Expected Runtime Comparisons (across dimensions)
The ERT of the runtime samples across all functions. ERT is decided based on the target values in the table below, with the default being the best reached f(x) by any of the selected algorithms. Infinite ERTS are shown as seperate dots on the graph.
The chosen target values per dimension are as follows (double click an entry to edit it):
The raw ERTvalues are:
Expected Runtime (ERT): all functions
The ERT is shown against the target values for all functions in the selected dimension.
Expected Runtime Comparisons (across functions on one dimension)
The ERT of the runtime samples across all functions. ERT is decided based on the target values in the table below, with the default being the best reached f(x) by any of the selected algorithms. When using a lineplot, Infinite ERTS are shown as nonconnected dots on the graph.
The chosen target values per function are as follows (double click an entry to edit it):
The raw ERTvalues are:
Histogram of FixedTarget Runtimes
This histogram counts how many runs needed between \(t\) and \(t+1\) function evaluations. The bins \([t,t+1)\) are chosen automatically. The bin size is determined by the socalled Freedman–Diaconis rule: \(\text{Bin size}= 2\frac{Q_3  Q_1}{\sqrt[3]{n}}\), where \(Q_1, Q_3\) are the \(25\%\) and \(75\%\) percentile of the runtime and \(n\) is the sample size. The displayed algorithms can be selected by clicking on the legend on the right. A tooltip and toolbar appears when hovering over the figure.
Empirical Probability Mass Function of the Runtime
Warning! The probability mass function of the runtime is approximated by the treating the runtime as a continuous random variable and applying the kernel estimation (KDE):
The plot shows the distribution of the first hitting times of the individual runs (dots), and an estimated distribution of the probability mass function. The displayed algorithms can be selected by clicking on the legend on the right. A tooltip and toolbar appear when hovering over the figure. This also includes the option to download the plot as png file. A csv file with the runtime data can be downlowaded from the Data Summary tab.
Cumulative Difference Plot of FixedTarget Runtimes
The cumulative difference plot can be useful when box/violinplots fail to provide a clear result. The plot can be used to compare exactly two algorithms (two IDs).
The cumulative difference plot compares the samples from two algorithms through the first order stochastic dominance. In the left side of the xaxis, the shortest runtimes of the algorithms are compared. While in the right side, the longest runtimes are compared.
If the curve does not cross the xaxis, this means that the algorithm that is in the same side as the curve stochasticaly dominates the other one. If the curve is positive and negative, then one of the algorithms might be better at getting short runtimes (if in the left part the plot, the curve is in the side of that algorithm), or better at avoiding long runtimes (if in the right part of the plot, the curve is in the side of that algorithm). For more information, refer to the paper or the GitHub repo.
Empirical Cumulative Distribution: Single target
Each EDCF curve shows the proportion of the runs that have found a solution of at least the required target value within the budget given by the \(x\)axis. The displayed curves can be selected by clicking on the legend on the right. A tooltip and toolbar appears when hovering over the figure. This also includes the option to download the plot as png file.
The approximated Area Under the Curve of this displayed singletarget ECDF is:
Aggregated Empirical Cumulative Distribution: Single function
The evenly spaced target values are:
The fraction of (run,target value) pairs \((i,v)\) satisfying that the best solution that the algorithm has found in the \(i\)th run within the given time budget \(t\) has quality at least \(v\) is plotted against the available budget \(t\). The displayed elements can be switched on and off by clicking on the legend on the right. A tooltip and toolbar appears when hovering over the figure.
The approximated Area Under the Curve of this displayed singlefunction ECDF is:
Aggregated Empirical Cumulative Distribution: All functions
The fraction of (run,target value, ...) pairs \((i,v, ...)\) satisfying that the best solution that the algorithm has found in the \(i\)th (run of function \(f\) in dimension \(d\)) within the given time budget \(t\) has quality at least \(v\) is plotted against the available budget \(t\). The displayed elements can be switched on and off by clicking on the legend on the right. A tooltip and toolbar appears when hovering over the figure. Aggregation over functions and dimension can be switched on or off using the checkboxes on the left; when aggregation is off the selected function / dimension is chosen according the the value in the bottomleft selectionbox.
The approximated Area Under the Curve of this displayed ECDF is:
The selected targets are:
Parameter Statistics at Chosen Target Values
This table summarizes for each algorithm and each target value chosen on the left:
 runs: the number of runs where nonmissing parameter values are observed, for each required target value \(f(x)\),
 mean: the average value of the specified parameter when target value \(f(x)\) is hit.
 median, \(2\%, 5\%,\ldots,98\%\) : the quantiles of these parameter values
When not all runs managed to find the target value, the statistics hold only for those runs that did. That is, the mean value is the mean of the successful runs. Same for the quantiles. An alternative version with simulated restarts is currently in preparation.
Expected Parameter Value (per function)
The mean or median of internal parameters of the algorithm found with a fixedbudget of evaluations are depicted against the budget. The displayed elements can be switched on and off by clicking on the legend on the right. A tooltip and toolbar appears when hovering over the figure.
Parameter Sample at Chosen Target Values
This table shows for each selected algorithm \(A\), each selected target value \(f(x)\), and each run \(r\) the parameter value observed when the target value \(f(x)\) is reached for the first time.
Hypothesis Testing
The KolmogorovSmirnov test is performed on empirical CDFs of running times for each pair of algorithms, in order to determine which algorithm gives a significantly smaller running time distribution. The resulting pvalues are arranged in a matrix, where each cell \((i, j)\) contains a pvalue from the test with the alternative hypothesis: the running time of algorithm \(i\) is smaller (thus better) than that of \(j\).
Decisions of the test, based on the \(p\) value matrix and the \(\alpha\) value, are visualized in a heatmap (left) and a network (right). In each cell (row, column) of the heatmap, the alternative hypothesis is again: the row algorithm has smaller (better) running time than the column. The color indicates:
 Red: A row is better than the column
 Blue: A row is worse than the column
 Gray: no significant distinction between the row and column
Deep Statistical Comparison (DSC) analysis  Ranking per Function
The DSC comparison is described in the paper: 'DSCTool: A webservicebased framework for statistical comparison of stochastic optimization algorithms.' by T. Eftimov et al. This is the first of 3 parts of the process: the perfunction ranking procedure. The two other processes are the omnibus test and the posthoc processing, which are shown in the two boxes below this one. Note that both of these are impacted by the settings selected for this ranking procedure!
The chosen budget values per (function, dimension)pair are as follows (double click an entry to edit it):
The results of the ranking are shown in the following plot, using the visualization techniques as described in the paper: 'PerformViz: A Machine Learning Approach to Visualize and Understand the Performance of Singleobjective Optimization Algorithms' By T. Eftimov et al. Performviz allows one to clearly see, from a single plot, which algorithms are most suited for a given problem, the influence of each problem on the overall algorithm performance and similarities among both algorithms and problems.
Deep Statistical Comparison (DSC) analysis  Omnibus Test
This is the result of the omnibus test on the data from the ranking procedure above. Note that this test has to be performed before doing the posthoc comparison!
Deep Statistical Comparison (DSC) analysis  Posthoc comparison
The results of the posthoc comparison are:
Winningfractionbased heatmaps
The algorithms are compared pairwise according to the fraction of times their mean is better over all selected (function,dimension)pairs
The chosen target function values per (function, dimension)pair are as follows (double click an entry to edit it):
The results of the ranking are:
Glicko2based ranking
The Glicko2 This procedure ranks algorithms based on a glico2procedure. Every round, for every function and dimension of the datasetlist, each pair of algorithms competes. This competition samples a random runtime for the provided target (best achieved target among all algorithms). Whichever algorithm has the lower runtime wins the game. Then, from these games, the glico2rating is used to determine the ranking.
The chosen target function values per (function, dimension)pair are as follows (double click an entry to edit it):
The results of the ranking are:
MultiFunction Statistics
MultiFunction Hitting Times
Contribution to portfolio (Shapleyvalues)
This figure shows the approximated shapley values according to the algorithms contribution to the overall portfolios ECDF.
The selected targets are:
Data Overview
Target Statistics at Chosen Budget Values
This table summarizes for each algorithm and each budget \(B\) chosen on the left:
 runs: the number of runs that have performed at least \(B\) evaluations,
 mean: the average bestsofar function value obtain within a budget of \(B\) evaluations,
 median, \(2\%, 5\%,\ldots,98\%\) : the quantiles of the best function values found within the first \(B\) evaluations.
When not all runs evaluated at least \(B\) search points, the statistics hold for the subset of runs that did. Alternative statistics using simulated restarted algorithms are in preparation.
Original Target Samples
Expected Target Value (per function)
The mean, median and standard deviation of the best function values found with a fixedbudget of evaluations are depicted against the budget. The displayed elements can be switched on and off by clicking on the legend on the right. A tooltip and toolbar appears when hovering over the figure.
Expected Function values: all functions
Expected Function Value comparisons
The chosen budget values per function are as follows (double click an entry to edit it):
The raw functionvalues are:
Histogram of FixedBudget Targets
This histogram counts the number of runs whose bestsofar function values within the first \(B\) evaluations is between \(v_i\) and \(v_{i+1}\). The buckets \([v_i,v_{i+1})\) are chosen automatically according to the socalled Freedman–Diaconis rule: \(\text{Bin size}= 2\frac{Q_3  Q_1}{\sqrt[3]{n}}\), where \(Q_1, Q_3\) are the \(25\%\) and \(75\%\) percentile of the runtime and \(n\) is the sample size. The displayed IDs can be selected by clicking on the legend on the right. A tooltip and toolbar appears when hovering over the figure.
Empirical Probability Density Function of FixedBudget Function Values
The plot shows, for the budget selected on the left, the distribution of the bestsofar function values of the individual runs (dots), and an estimated distribution of the probability mass function. The displayed IDs can be selected by clicking on the legend on the right. A tooltip and toolbar appear when hovering over the figure. A csv file with the runtime data can be downloaded from the Data Summary tab.
Cumulative Difference Plot of FixedBudget Function Values
The cumulative difference plot can be useful when box/violinplots fail to provide a clear result. The plot can be used to compare exactly two algorithms (two IDs).
The cumulative difference plot compares the samples from two algorithms through the first order stochastic dominance. In the left side of the xaxis, the best values (small in minimization tasks, large in maximization tasks) that the algorithms produced are compared. In the right side, the worst values are compared.
If the curve does not cross the xaxis, this means that the algorithm that is in the same side as the curve stochasticaly dominates the other one. If the curve is positive and negative, then one of the algorithms might be better at getting good values (if in the left part the plot, the curve is in the side of that algorithm), or better at avoiding bad values (if in the right part of the plot, the curve is in the side of that algorithm). For more information, refer to the paper or the GitHub repo.
Empirical Cumulative Distribution of the FixedBudget Values: Single Budgets
Each EDCF curve shows the proportion of the runs that have found within the given budget B a solution of at least the required target value given by the xaxis. The displayed curves can be selected by clicking on the legend on the right. A tooltip and toolbar appears when hovering over the figure.
Empirical Cumulative Distribution of the FixedBudget Values: Aggregation
The evenly spaced budget values are:
The fraction of (run,budget) pairs \((i,B)\) satisfying that the best solution that the algorithm has found in the \(i\)th run within the first \(B\) evaluations has quality at most \(v\) is plotted against the target value \(v\). The displayed elements can be switched on and off by clicking on the legend on the right. A tooltip and toolbar appears when hovering over the figure.
Area Under the ECDF
The area under the ECDF is caculated for the sequence of budget values specified on the left. The displayed values are normalized against the maximal target value recorded for each algorithm. Intuitively, the smaller the area, the better the algorithm. The displayed IDs can be selected by clicking on the legend on the right. A tooltip and toolbar appears when hovering over the figure.
Expected Parameter Value (per function)
The mean or median of internal parameters of the algorithm found with a fixedbudget of evaluations are depicted against the budget. The displayed elements can be switched on and off by clicking on the legend on the right. A tooltip and toolbar appears when hovering over the figure.
Parameter Statistics at Chosen Runtime Values
This table summarizes for each algorithm and each runtime value chosen on the left:
 runs: the number of runs where nonmissing parameter values are observed, for each required runtime value,
 mean: the average value of the specified parameter after the specified amount of function evaluations.
 median, \(2\%, 5\%,\ldots,98\%\) : the quantiles of these parameter values
When not all runs used the specified runtime value, the statistics hold only for those runs that did. That is, the mean value is the mean of the applicable runs. Same for the quantiles.
Correlation of parameters over time
The .
Parameter Sample at Chosen Runtime Values
This table shows for each selected algorithm \(A\), each selected runtime value \(b\), and each run \(r\) the parameter value observed when the after a total budget of \(b\) function evaluations has been used.
Hypothesis Testing
The KolmogorovSmirnov test is performed on empirical CDFs of running times for each pair of algorithms, in order to determine which algorithm gives a significantly smaller running time distribution. The resulting pvalues are arranged in a matrix, where each cell \((i, j)\) contains a pvalue from the test with the alternative hypothesis: the running time of algorithm \(i\) is smaller (thus better) than that of \(j\).
Decisions of the test, based on the \(p\) value matrix and the \(\alpha\) value, are visualized in a heatmap (left) and a network (right). In each cell (row, column) of the heatmap, the alternative hypothesis is again: the row algorithm has smaller (better) running time than the column. The color indicates:
 Red: A row is better than the column
 Blue: A row is worse than the column
 Gray: no significant distinction between the row and column
MultiFunction Statistics
MultiFunction Hitting Times
Deep Statistical Comparison (DSC) analysis  Ranking per Function
The DSC comparison is described in the paper: 'DSCTool: A webservicebased framework for statistical comparison of stochastic optimization algorithms.' by T. Eftimov et al. This is the first of 3 parts of the process: the perfunction ranking procedure. The two other processes are the omnibus test and the posthoc processing, which are shown in the two boxes below this one. Note that both of these are impacted by the settings selected for this ranking procedure!
The chosen budget values per (function, dimension)pair are as follows (double click an entry to edit it):
The results of the ranking are shown in the following plot, using the visualization techniques as described in the paper: 'PerformViz: A Machine Learning Approach to Visualize and Understand the Performance of Singleobjective Optimization Algorithms' By T. Eftimov et al. Performviz allows one to clearly see, from a single plot, which algorithms are most suited for a given problem, the influence of each problem on the overall algorithm performance and similarities among both algorithms and problems.
Deep Statistical Comparison (DSC) analysis  Omnibus Test
This is the result of the omnibus test on the data from the ranking procedure above. Note that this test has to be performed before doing the posthoc comparison!
Deep Statistical Comparison (DSC) analysis  Posthoc comparison
The results of the posthoc comparison are:
Winningfractionbased heatmaps
The algorithms are compared pairwise according to the fraction of times their mean is better over all selected (function,dimension)pairs
The chosen target function values per (function, dimension)pair are as follows (double click an entry to edit it):
The results of the ranking are:
Glicko2based ranking
The Glicko2 This procedure ranks algorithms based on a glico2procedure. Every round, for every function and dimension of the datasetlist, each pair of algorithms competes. This competition samples a random function value for the provided runtime (maximum used among all algorithms). Whichever algorithm has the better function value wins the game. Then, from these games, the glico2rating is used to determine the ranking.
The chosen budget values per (function, dimension)pair are as follows (double click an entry to edit it):
The results of the ranking are:
Parallel coordinate plot of final position
The location of the best found solution by the algorithm, for each of the runs. Each xvalue corresponds to one coordinate of the solution.
Empirical Attainment Function
The empirical attainment function (EAF) estimates the percentage of runs that attain an arbitrary target value not later than a given runtime. For more information on the EAF, please see https://mlopezibanez.github.io/eaf/
EAF based differences
The empirical attainment function (EAF) estimates the percentage of runs that attain an arbitrary target value not later than a given runtime. By taking the difference between two EAFs, we can see areas of the (runtime, target)space where one algorithm dominates other algorithms. For more information on the EAF, please see https://mlopezibanez.github.io/eaf/
Empirical Attainment Function Partial Integral (`improved` ECDF)
The empirical attainment function (EAF) estimates the percentage of runs that attain an arbitrary target value not later than a given runtime. Taking the partial integral of the EAF results in a more accurate version of the Empirical Cumulative Distribution Function, since it does not rely on discritization of the targets. For more information on the EAF, please see https://mlopezibanez.github.io/eaf/
Empirical Attainment Function
The empirical attainment function (EAF) estimates the percentage of runs that attain an arbitrary target value not later than a given runtime. For more information on the EAF, please see https://mlopezibanez.github.io/eaf/
Empirical Attainment Function Partial Integral (`improved` ECDF)
The empirical attainment function (EAF) estimates the percentage of runs that attain an arbitrary target value not later than a given runtime. Taking the partial integral of the EAF results in a more accurate version of the Empirical Cumulative Distribution Function, since it does not rely on discritization of the targets. For more information on the EAF, please see https://mlopezibanez.github.io/eaf/
The approximated Area Over / Under the EAF is:
Color Settings
Example of the current colorscheme.
General settings
Set the figure download properties
Set the figure fontsizes
Subplot Options