EVENTS



PAST EVENTS:

Fall 2008 Courses
Date: May 20, 2008 -
A schedule of classes is now available for the Fall 2008 semester. See the "Upcoming" link under Courses to view our fall course offerings.

Colloquium Guest Speaker Douglas Steinley
Date: April 28, 2008 - IMU - Persimmon Room 12pm - 1pm
A variance-to-range ratio variable weighting procedure is proposed. The method is theoretically grounded in the inherent variability found in data exhibiting cluster structure. In addition, a variable selection procedure is proposed to operate in conjunction with the weighting technique. The performance of these procedures are compared to existing methods in the literature.

Colloquium Guest Speaker Richard Charnigo
Date: April 24, 2008 - IMU - Walnut Room 4 pm - 5 pm
Greater epidemiologic understanding of the relationships among fetal-infant mortality and its prognostic factors, including birthweight, could have vast public health implications. A key step toward that understanding is a realistic and tractable framework for analyzing birthweight distributions and heterogeneity in fetal-infant mortality. We propose describing a birthweight distribution using a normal mixture model in which the number of components is determined from the data, then estimating birthweight-specific mortality curves within each component of the normal mixture. We emphasize both methodological issues (e.g., How should the number of components be determined?) and interpretive issues (e.g., What do the components represent?). Data from the National Center for Health Statistics Public-Use Perinatal Mortality Data Files are used to compare our analytic framework to existing frameworks as well as to assess the reproducibility across repeated sampling of results obtained through our framework. (This talk is based on work with Lorie Wayne Chesnut, Tony LoBianco, and Russell S. Kirby.)

Colloquium Guest Speaker Michael Carbon
Date: April 07, 2008 - IMU - Walnut Room 12 pm - 1 pm
The purpose of this talk is to investigate the Frequency Polygon as a density estimator for stationary random fields indexed by multidimensional lattice points in space. Optimal binwidths which asymptotically minimize integrated mean square errors (IMSE) are derived. Under weak conditions, frequency polygons achieve the same rate of optimal uniform rate of convergence under general conditions. Rates of the a.s. convergence are given too. Finally, asymptotic normality of the frequency polygon estimator is established.

Colloquium Guest Speaker Zongwu Cai
Date: March 20, 2008 - TBA - 4:00 pm - 5:00 pm
Motivated by forecasting the inflation rate through nonstationary variables and efficient tests of stock return predictability as well as forecasts of the equity premium, this talk will focus on how to use nonparametric or semiparametric regression techniques to analyze nonstationary time series data. Development of a nonparametric approach to estimate the functionals will be discussed, as well as how the consistency and asymptotic normality of the proposed estimators are obtained. The asymptotic results have shown that the asymptotic bias is same for all estimators of functionals, but that the convergence rates are totally different for stationary and nonstationary covariates. These findings seem innovative in the literature.

Colloquium Guest Speaker Michael Levine
Date: March 03, 2008 - IMU Walnut Room 12pm-1pm
   We consider a new separable nonparametric volatility model that allows for “interactions” in both mean and conditional variance (volatility) function. It can be concisely described as an additive-interactive nonlinear ARCH model. We propose this model as a possible alternative to the generalized additive nonlinear ARCH (GANARCH) model of Kim and Linton (2004), with which it shares the common origin. Unlike the GANARCH model, it does not assume the known link function but includes second-order interaction terms in both mean and variance functions instead. This ensures a much more data-driven model compared to GANARCH of Kim and Linton (2004) since our assumptions do not assume that anything know about the data distribution. This is very beneficial since, in practice, the data distribution has to be selected based on the exploratory data analysis, which is very difficult for multivariate data. Thus, the proposed model is much more flexible compared to GANARCH.
   Motivated by the local instrumental variable estimation method (LIVE), also introduced in Kim and Linton (2004), we propose instrumental variable-based estimators of the components of the mean and volatility functions. The estimators are shown to be consistent and asymptotically normal. Explicit expressions for asymptotic means and variances of these estimators are also obtained. Several simulation experiments are conducted that show a very good performance of our algorithm for moderate sample sizes. Finally, the method is applied to the real data set of currency exchange rates where it leads to some interesting conclusions.
   Historically, multiple functional component testing in nonparametric models has been a fairly difficult problem. We introduce a novel F-type approach to testing the significance of the two-way interactive terms in the mean function based on the unbalanced design ANOVA with unequal variances. Simulation studies show that the method performs very well for sample sizes of about 5000, which are easily available in financial applications.

Colloquium Guest Speaker Joanne Peng
Date: February 25, 2008 - IMU Walnut Room 12pm-1pm
For the past 25 years, advances have been made in missing data methods. Most published work has focused on missing data in the dependent variable under various conditions. The present study sought to fill the void by comparing two approaches for handling missing data in categorical covariates in logistic regression. These two approaches were EM method of weights and multiple imputation.
Sample data were first drawn randomly from a population with known characteristics. Missing data on covariates were subsequently simulated under two conditions: missing completely at random and missing at random with different missing rates. A logistic regression model was fit to each sample using either the EM or the MI approach. The performance of these two approaches was assessed on four criteria: bias, efficiency, coverage, and rejection rate.
Results generally favored MI over EM. Practical issues such as implementation, inclusion of continuous covariates, and interactions between covariates were discussed.

Colloquium Guest Speaker Vivekananda Roy
Date: February 14, 2008 - IMU Walnut Room 4 pm - 5 pm
We study Markov chain Monte Carlo algorithms for exploring the intractable posterior density that results when a probit regression likelihood is combined with a flat prior on the regression coefficient. We prove that the data augmentation algorithm of Albert and Chib (1993) and the PX-DA algorithm of Liu and Wu (1999) both converge at a geometric rate, which ensures the existence of central limit theorems (CLTs) for ergodic averages under a second moment condition. While these two algorithms are essentially equivalent in terms of computational complexity, we show that the PX-DA algorithm is theoretically more efficient in the sense that the asymptotic variance in the CLT under the PX-DA algorithm is no larger than that under Albert and Chib's algorithm. A simple, consistent estimator of the asymptotic variance in the CLT is constructed using regeneration. As an illustration, we apply our results to the lupus data from van Dyk and Meng (2001). In this particular example, the estimated asymptotic relative efficiency of the PX-DA algorithm with respect to Albert and Chib's algorithm is about 65, which demonstrates that huge gains in efficiency are possible by using PX-DA algorithm.

Colloquium Guest Speaker Jien Chen
Date: February 04, 2008 - IMU Walnut Room 12pm-1pm
As a non-parametric method, Empirical Likelihood (EL) has been attracting serious attention from researchers in statistics, econometrics, engineering and biostatistics. By defining the estimation equations in EL appropriately, we can extend EL to various data settings, especially those in which parametric likelihoods are absent. In this talk, I will provide two examples of such extensions: quantile estimation and longitudinal data analysis. Quantile estimation for discrete data analysis has not been well studied. For a given 0 < p < 1, the commonly used sample quantile may or may not be consistent for the pth quantile, depending on whether or not the underlying distribution has a plateau at the level of p. I propose an EL-based categorization procedure that not only helps determine the shape of the true distribution at level p, but also provides a way of formulating a new estimator that is consistent in any case. For non-Gaussian longitudinal data, generalized estimating equations (GEE) are a popular class of marginal models. While the GEE estimator is consistent and robust, it may suffer significant loss of efficiency if the working correlation structure is misspecified. I consider the use of EL to select working correlations for GEE models, for which parametric likelihoods are absent and quasi-likelihoods are difficult to construct.

Colloquium Guest Speaker Brian Reich
Date: January 31, 2008 - IMU Walnut Room 4 pm - 5 pm
Storm surge, the onshore rush of sea water caused by the high winds and low pressure associated with a hurricane, can compound the effects of inland flooding caused by rainfall, leading to loss of property and loss of life for residents of coastal areas. Numerical ocean models are essential for creating storm surge forecasts for coastal areas. These models are driven primarily by the surface wind forcings. Currently, the gridded wind fields used by ocean models are specified by deterministic formulas that are based on the central pressure and location of the storm center. While these equations incorporate important physical knowledge about the structure of hurricane surface wind fields, they cannot always capture the asymmetric and dynamic nature of a hurricane. A new Bayesian multivariate spatial statistical modeling framework is introduced combining data with physical knowledge about the wind fields to improve the estimation of the wind vectors. Many spatial models assume the data follow a Gaussian distribution. However, this may be overly-restrictive for wind fields data which often display erratic behavior, such as sudden changes in time or space. In this paper we develop a semiparametric multivariate spatial model for these data. Our model builds on the stick-breaking prior, which is frequently used in Bayesian modeling to capture uncertainty in the parametric form of an outcome. The stick-breaking prior is extended to the spatial setting by assigning each location a different, unknown distribution, and smoothing the distributions in space with a series of kernel functions. This semiparametric spatial model is shown to improve prediction compared to usual Bayesian Kriging methods for the wind field of Hurricane Ivan.

Colloquium Guest Speaker Guilherme Rocha
Date: January 28, 2008 - IMU Maple Room - 12pm - 1pm
  Extracting useful information from high-dimensional data is an important focus of today's statistical research and practice. Penalized loss function minimization has been shown to be effective for this task. Quasi-norms on model parameters are frequently used as a penalty. Classical examples are AIC and BIC where the L0 quasi-norm (model dimension) is used as a penalty.
  More recently, penalization by the L1-norm (lasso) has enjoyed a lot of attention. L1-penalized estimates are cheaper to compute (convex optimization) and lead to more stable model estimates than their L0 counterparts.
  In this talk, I will present the Composite Absolute Penalties (CAP) family of penalties. CAP penalties allow given grouping and hierarchical relationships between the predictors to be expressed. They are built by defining groups of variables and combining the properties of norm penalties at the across group and within group levels. Grouped selection occurs for non-overlapping groups. Hierarchical variable selection is reached by defining groups with particular overlapping patterns.
  Under easily verifiable assumptions, CAP penalties are convex: an attractive property from a computational stand-point. Within this subfamily, unbiased estimates of the degrees of freedom (df) exist so the regularization parameter is selected without cross-validation.
  Simulation results show that CAP improves on the predictive performance of the LASSO for cases with p>>n and mis-specified groupings.
  This is joint work with Peng Zhao and Bin Yu.

Colloquium Guest Speaker Junhui Wang
Date: January 14, 2008 - IMU Maple Room - 12pm - 1pm
Hierarchical classification is critical to knowledge and context management as well as knowledge exploration, as in gene function classification and discovery and document categorization. In hierarchical classification, an input is classified by a structured hierarchy. In a situation as such, the central issue is how to effectively utilize inter-class relationship to improve the generalization performance of flat classification ignoring such dependency. In this talk, a novel large margin method based on constraints characterizing multi-path hierarchy is presented within the framework of regularization. In particular, I will discuss three aspects: (1) the idea and methodology development; (2) computational tools; (3) a statistical learning theory. Numerical examples will be provided to demonstrate the advantage of our proposed methodology against other existing competitors. An application to gene function prediction and discovery will be discussed.

Colloquium Guest Speaker Guang Cheng
Date: January 10, 2008 - IMU Walnut Room 4 pm - 5 pm
Semiparametric modeling is an excellent framework due to its flexibility to model some features parametrically without making assumptions on the other features. However, the infinite-dimensional nuisance parameter in the semiparametric models generally poses several challenges for making maximum likelihood inference for the parameter of interest at both theoretical and methodological levels. We will consider a series of profile likelihood based semiparametric inference procedures either based on numerical methods, i.e. K-step MLE, or through MCMC sampling, i.e. the Profile Sampler and the Penalized Profile Sampler. All the above profile likelihood based methods avoid evaluation of the infinite-dimensional operator and are easy to implement. Furthermore, we investigate their second order asymptotic behaviors, which are proven to be related to the convergence rate of the nuisance parameter and thus adjustable.

Guest Colloquium Speaker William Cleveland
Date: November 26, 2007 - IMU Walnut Room 12pm-1pm
Large, complex data sets are ubiquitous, the standard now rather than the exception. They present challenging problems of analysis because of their size and the complexity of their data structures and patterns. One approach is to compute summary statistics at the outset to reduce the complexity, but this expedient risks losing important information in the data. The goal should be lossless analysis: analyze the data at a level of detail and comprehensiveness that does not sacrifice
information.
Achieving lossless analysis of complex data today is immensely challenging. New fundamental approaches and methods are needed for each of the different areas that come into play in the analysis of the data --- databases, data processing, data structures, statistical models and methods, machine learning algorithms, data
visualization, computational algorithms, software environments, and hardware environments. In fact, it has never been harder to achieve lossless analysis because complexity has increased faster than our innovations in these areas.
Nothing serves lossless analysis better than data visualization, the only practical way to absorb large amounts of information in detail. But for today's complex sets we must visualize far larger amounts than in the past. We must be ready to accept large displays each covering tens or even hundreds of screensful (pages). For a single data set it is reasonable to have hundreds of such displays. These displays become a new database produced from the data that is queried and studied. For a display of 500 pages, we might query and study all or just a few of the pages depending on the task.
Producing, querying, and studying a visualization database needs new ideas. There are different modes of viewing the many pages and panels per page of a large display, from slow focused study to very rapid scans. We need creative interfaces to facilitate the different modes. We cannot fuss with very large displays, interacting with the micro-elements to get them right, because there is too much; instead there should be smart automation algorithms that get the large display right the first time. We must consider the physical screen space, its size and resolution, to make it work most effectively for the visual study. We need methods of display that result in pre-attentive visual formation of gestalts that show instantaneously the relevant patterns in the data. This necessitates, strangely, more displays, starting with broad brush looks to derivative displays whose redesigns show specific aspects of the broad brush more effectively. It also requires the study of visual perception.

Colloquium Guest Speaker Tonu Kollo
Date: November 12, 2007 - IMU Walnut Room 12pm-1pm
In the last ten years, remarkable development has occurred in the area of skewed multivariate distributions. Skew normal distribution was introduced in 1996 by Azzalini & Dalla Valle (Biometrika). Azzalini’s construction of the distribution was very fruitful and was later successfully applied to many other elliptical families of distributions. Random vector X is skew normally distributed with parameters α – a p-vector as the shape parameter and Σ - a positive definite p×p-matrix as the scale parameter when the density function of X is the product of the density function of N(0,Σ) and the distribution function of the standard normal distribution with the shape parameter appearing in the argument of the distribution function.
In the talk, basic properties of the skew normal family will be discussed and other more often used families of skewed elliptical distributions will be examined (multivariate skew t-distribution, for instance). With new families of distributions, new estimation and testing problems have risen. Classical estimation methods do not often work: maximum likelihood method can give wrong estimates and much bias is possible using moments’ method.
Another type of skewed multivariate distributions is presented by asymmetric Laplace distribution, which was carefully examined by T. Kozubowski in a series of papers at the end of 1990s. In this case, we do not have explicit expression for the density function and estimation, testing and fitting problems have to be solved on the basis of the characteristic function. This distribution will also be considered in more detail.

Colloquium Guest Speaker Victor Goodman
Date: November 05, 2007 - IMU Walnut Room 12pm-1pm
Forward Interest rates are simultaneously measured using up-to-the-minute bond trading quotes. The time variation of these rates determines a high-dimensional covariance matrix that might be used to model bond yields within a country's government bond market. Several PC analyses, with 1989-92 data in the U.K. market, 1887-94 data in the U.S. market, and 2001-05 data in the U.S. market, reveal a striking pattern involving the first three eigenvectors of each covariance matrix. In this talk I describe the pattern and make the (well-known) case for using a three-factor Gaussian model to describe bond trading.
It is difficult to implement models based on PC estimates for the first eigenvector. An initial attempt to produce a model in 1988 ended in failure since the model had a financial inconsistency. A more recent model behaves better; its covariance has the desired eigenvector and the model is arbitrage-free. Surprisingly, the new model appears when we condition prices not to collapse in the old model. This suggests that an issue of survivorship may arise even in no-default bond markets.

Colloquium Guest Speaker Jerome Busemeyer
Date: October 29, 2007 - IMU Walnut Room 12pm-1pm
Social and behavioral scientists face some of the same measurement problems that forced physicists to abandon classical probability theory. Their measurements are often incompatible, and the first measurement may disturb a second measurement. Thus only partial information about a complex system can be obtained at any point in time. Combining partial information about a system into a coherent understanding of the entire system is the hallmark of quantum theory. Quantum theory provides a fundamentally different approach to logic, reasoning, and probabilistic inference. For example, quantum logic does not always follow the distributive axiom of Boolean logic; quantum probabilities do not always obey the Kolmogorov law of total probability; quantum reasoning does not always obey the principle of monotonic reasoning.
For this talk, I will present a tutorial of the basic assumptions of classic versus quantum probability theories. These basic assumptions will be examined, side by side, in a parallel and elementary manner. Classic theory will emerge as a possibly overly restrictive case of the more general quantum theory. The fundamental implications of these contrasting assumptions for measurement in the social and behavioral sciences will be examined.

Colloquium Guest Speaker Juan Carlos Escanciano
Date: October 22, 2007 - IMU Redbud Room
SPECIAL TIME: 11:30AM - 12:30PM
A general method for testing the martingale difference hypothesis is proposed. The new tests are data-driven smooth tests based on the principal components of certain marked empirical processes that are asymptotically distribution-free, with critical values that are already tabulated. The smooth tests are shown to be optimal in a semiparametric sense discussed in the paper, and they are robust to conditional heteroscedasticity of unknown form. A simulation study shows that the data-driven smooth tests perform very well for a wide range of realistic alternatives and have more power than omnibus and other competing tests. Finally, two empirical examples highlight the merits of our approach.

Spring 2008 Courses
Date: October 05, 2007 -
Click here to view list of Spring 2008 Courses.