CODE REPOSITORY

Kevin H. Knuth's
Code Repository

optBINS

optBINS is a data-based algorithm to determine the optimal number of bins to use in a histogram. It is strictly not a histogram that is estimated from data, but rather a piecewise-constant density function. The algorithms are written in Matlab and take data of virtually any dimension as it is limited by computer memory and not by artificial dimensionality constraints. Practical memory-based constraints limit the algorithm to approximately four dimensions.

The optBINS algorithm is based on the Bayesian methodology where the data are used to derive a posterior probability. This Posterior Probability is a product of the Likelihood Function and the Prior Probability. As the number of bins are increased, the density model better describes the data resulting in an increase in the Likelihood Function. However, the probability mass becomes more spread out as the number of bins is increased resulting in a decrease in the Prior Probability. The result is a balance between describing the data and keeping the model as simple as possible.

There are three main algorithms in the package:

optBINS is a brute-force search algorithm that works with one-dimensional data sets and returns the most probable numer of bins.

moptBINS is a brute-force search algorithm that works with multi-dimensional data sets. It is quite slow.

nsoptBINS is an algorithm based on John Skilling's Nested Sampling algorithm. It can handle both one-dimensional and multi-dimensional data sets. It is much faster than the brute-force searches, but because it is a sampling algorithm, on occasion it may miss the optimal solution. One also has the flexibility to chose to obtain either the most probable number of bins or the mean number of bins (along with the standard deviations).


DOWNLOAD OPTBINS HERE

optBINS Package
Copyright GNU General Public License: (C) Kevin H. Knuth 2006

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or any later version.

This program is distributed in the hope that it will be useful,but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

In the event that optBINS is used in published research, please cite the following reference:
Knuth K.H. 2006. Optimal Data-Based Binning for Histograms. [physics/0605197]

 
Contact

Dr. Kevin H. Knuth
PH 211
Department of Physics
University at Albany
Albany NY 12222
USA

Phone: +1-518-442-4653
FAX: +1-518-442-5260
Email: kknuth@albany.edu