Abstract:

In this paper, we introduce several graphical tools which can visualize the distribution of the selected model. For example, G-plot, H-plot, Scatter plot and Heatmap. To the best of our knowledge, this is the first attempt to visualize such a distribution.

Keywords:

distribution of the selected model

Our Purpose:

The selected model from a model selection procedure can be considered as a random ``point estimate’’ for the true model. Therefore, it is important to understand its random behavior through its distribution, i.e., the distribution of the selected model. As a first attempt, we introduce several graphical tools to visualize such a distribution and to help understand the model selection uncertainty. The proposed visualization is useful in graphical comparison of different selection methods, giving analysts a good sense of level of randomness each method comes with. We define the most frequently selected model as the mode model, denoted as \(m^\ast\), \[ \begin{align} m^\ast = \arg\max_{m \in \mathcal{M}} \mathbb{P}(\widehat{m}=m). \end{align} \]

Our Main Results:

Naive Visualization of the Distribution of Selected Model

The distribution of the selected model is generally hard to visualize, because the support of the distribution is on all possible models, and these models have complex relationships among themselves. We first present a naive visualization of such a distribution to show its difficulty. Here each circle represents one unique model. The vertical axis shows the model complexity. The models in the same row are arranged according to their model frequencies descendingly. There exists a line connecting two models if the large model \(m_2\) includes the small model \(m_1\) with one extra variable, i.e., \(m_2\supset m_1\) and \(| m_2 \setminus m_1|=1\).