UPCOMING EVENTS:
|
Colloquium - Speaker Yoosoon Chang Date: November 30, 2009 - IMU Persimmon Room 3 PM - 4 PM This paper develops a new framework and tools to reexamine Fama-French regressions. For Fama-French portfolios, we consider a continuous-time factor model with a specific error component structure implied by the underlying asset pricing theory. The model is then analyzed as a continuous-time multivariate regression with a general martingale differential error. Our framework is broad enough to accommodate some of the important common features of the errors in this type of regressions. In particular, we allow for time-varying or stochastic volatilities that are persistent and have strong leverage effects. It is well known that such nonstandard features would make the standard inferential procedure invalid. We overcome this difficulty by using samples collected at random intervals, instead of those sampled at fixed intervals such as monthly and yearly, which are set by the clock running inversely to the market volatility. Under our sampling scheme, Fama-French regressions may simply be regarded as the classical regressions having normal errors with variance given by the averaged quadratic variation of the martingale differential error. Various tests, which have been used to evaluate Fama-French factors, are extended and evaluated in the paper.
|
Colloquium Speaker - Chen Yu Date: December 07, 2009 - IMU Persimmon Room 3 PM - 4 PM
|
PAST EVENTS:
|
Colloquium Speaker -Zaichao Du Date: November 16, 2009 - IMU Persimmon Room 3 PM - 4 PM In this paper, we propose a modified Box-Pierce test for conditional goodness-of-fit. Our method is based on the fact that under the correct specification of the conditional distribution the generalized errors obtained after the probability integral transformation are iid U[0,1]. Our test explicitly takes into account the parameter estimation effect, as a result it has a convenient standard chi-square limit distribution. Our test is applicable to a wide class of models, including but not limited to ARMA-GARCH model, Hansen (1994) skewed t model and autoregressive conditional duration model. A simulation study shows that our test has satisfactory size and power performance. An empirical application to the Hang Seng Index data highlights the merits of the proposed test.
|
|
Colloquium - Speaker Tao Shi Date: October 26, 2009 - IMU State Room East 3pm - 4pm In this talk, we focus on obtaining clustering information in a distribution when iid data are given. First, we develop theoretical results for understanding and using clustering information contained in the eigenvectors of data adjacency matrices based on a radial kernel function (with a sufficiently fast tail decay). We study which eigenvectors
should be used and when the clustering information for the distribution can be recovered from the data. Second, we use heuristics from these analyses to design the Data Spectroscopic clustering (DaSpec) algorithm. Our findings not only extend and go beyond the intuitions underlying existing spectral techniques (e.g. spectral clustering and Kernel Principal Components Analysis), but also provide insights about their usability and modes of failure. Simulation studies and experiments on real world data are conducted to show the promise of our proposed data spectroscopy clustering algorithm relative to k-means and one spectral method. In particular, DaSpec seems to be able to handle unbalanced groups and recover clusters of different shapes better than competing methods. This is joint work with Prof. Mikhail Belkin (Ohio State University)and Prof. Bin Yu (University of California, Berkeley).
|
|
Colloquium Speaker - Shankar Bhamidi Date: October 12, 2009 - IMU Persimmon Room 3 PM - 4 PM The last few years have seen an explosion in the amount of data on many real world networks. This has resulted in an interdisciplinary effort in formulating models to understand the data. On the theory side, we shall look at how powerful techniques in modern probability theory can be used in this context via the following three problems:
1. Reconstruction of routing trees: In a number of problems that arise from trying to discover the underlying structure of the Internet, it is often impossible to take direct measurements at the routers. We shall describe progress in trying to reconstruct the "Multicast" tree exactly using only "end-to-end" measurements. Surprisingly, using deep facts from Phylogenetics, we show that this can be done using very few samples.
2. MCMC simulation of exponential random graphs: Exponential random graphs are one of the most used models in social network theory. The basic idea is the following: In social networks we see more triangles cliques etc than we would expect in a random graph, basically because if A is a friend of B and A is a friend of C then it is quite likely that B and C are friends. One way to model such a phenomenon is to attach, for every graph G,a Hamiltonian given by say
H(G) = β#E(G) + γ#T(G)
where E(G) and T(G) are the number of edges and triangles respectively and then looking at the Gibbs distribution induced by this Hamiltonian. Simulating from these models is of paramount interest.
Using the modern day theory of Markov Chains we and in the ferromagnetic setup, exactly when one can simulate from this model effciently and when it would take exponentially long to simulate from this model.
3. Spectral distribution adjacency matrices: How good is the spectral distribution of the adjacency matrix of a network in estimating key features of the network? Given two samples of networks can we always tell them apart just by looking at their spectral distribution? These sorts of questions lead us into analyzing the asymptotics of the spectral distribution of a number of random tree models.
|
|
Colloquium Speaker - Daniel Crichton Date: September 28, 2009 - IMU Persimmon Room 3 PM - 4 PM Modern science research is requiring collaboration amongst geographically distributed scientists and their data assets. Because of this, a new era of scientific discovery exists in which science data is shared and validated across research institutions. More than ever, computing infrastructures that span these multiple institutions must work together to support collaborative research. At NASA’s Jet Propulsion Laboratory, we have been working in multiple disciplines to support the movement towards highly distributed scientific data systems that span, end-to-end, the data pipeline from instruments all the way to analysis. JPL developed a software technology called the Object Oriented Data Technology (OODT) framework that provides building blocks for construction of distributed data systems, addressing the challenges of the distributed, data-intensive domain. OODT has been successfully infused into planetary science, earth science and cancer research programs. This talk will explore the commonalities of developing data system infrastructures in these disciplines as well as our experiences, results and plans for data systems in the future.
|
|
Colloquium Speaker Alan Karr Date: April 13, 2009 - IMU Walnut Room - 3 PM -4 PM Government agencies and businesses face a multitude of tensions between protecting confidentiality of their data (for legal, quality and other reasons) and allowing legitimate uses of the data (for policy, research or other purposes). The technical problems involve the statistical, mathematical and computational sciences, as well as domain science, all immersed in difficult legal and societal issues.
In this talk, I will outline a decision-theoretic formulation of data confidentiality problems as tradeoffs between quantified measures of data disclosure risk and data utility. Then, I will focus on two problems lying at the intersection of statistics and computer science. The first is methods and systems for secure, principled statistical analysis of distributed data. The second is verification servers, which provide users of publicly released data information that have been altered to protect confidentiality about the fidelity of their analyses as compared to analyses of the original data.
|
|
Colloquium Speaker David Marchette Date: April 06, 2009 - IMU Walnut Room - 3 PM -4 PM Implicit translation is the association of documents in different languages which are on the same topic, in the absence of a translation dictionary. The mult-lingual Wikipediae will be used as a test-bed to explore some techniques based on the embeddings of the Wikipedia graphs. I will discuss several standard graph embeddings, and a novel random graph embedding. These are applied to a subset of the English and French Wikipediae, showing that using the graph information alone is sufficient to obtain good results. The incorporation of the language information via word-count histograms or related bag-of-words approaches, in conjunction with the graph information, will be discussed briefly.
|
|
Flury Lecture - Guest Speaker David Scott Date: March 03, 2009 - Swain East 140 4:00 PM - 5:00 PM Modern science relies on ever more complex models to understand data. Presenting the confidence of model predictions is a grand challenge. Faced with potentially hundreds or thousands of parameters, scientists often perform sensitivity analyses in order to assess the robustness of model predictions.
Such one-at-a-time calculations are useful but limited. Visualization techniques can provide a fuller picture, but the availability of immersive technologies is still expensive and not commonplace. We examine some simple data and discuss the presentation of uncertainty. Avenues for research are described.
|
|
Colloquium Speaker Haimeng Zhang Date: February 23, 2009 - IMU Persimmon Room 3 PM - 4 PM The Cox hazard regression model is a popular method used in epidemiological research to quantify the effects of prognostic factors on survival for a cohort of individuals followed over time. In practice, it is often difficult or expensive to collect complete data when dealing with large cohorts. Therefore, a number of practical sampling designs have been put forward. However, it is not always clear that the estimators in those designs use the given sampled data in the most efficient manner. In this presentation, I will discuss the asymptotic efficiency of the estimators from two popular sampling designs - case cohort and nested-case control sampling. In addition, by comparing the theoretical lower bound with the limiting distribution of the estimators, I will indicate in what instances the estimators achieve the lower bound, and what situations make for large efficiency losses.
|
|
Colloquium Speaker Jim Koehler Date: February 09, 2009 - IMU Persimmon Room 3:00 PM - 4:00 PM I'll provide a glimpse into the hidden life of statisticians inside of Google. I'll start by providing a general overview of Google's advertising system and the variety of roles for statisticians. Then I'll describe some specific business problems and how statistical methods contribute to their solutions. Finally, I'll introduce Google's efforts to partner with universities through the Google Online Marketing Challenge (student competition) and the Google University Research Awards.
|
|
Colloquium Guest Speaker George Mohler Date: December 01, 2008 - IMU Maple Room 3:00 - 4:00 PM Self-exciting spatial point process methods are well established in fields such as seismology, where the occurrence of an event increases the likelihood of another event nearby in space and time (in the case of earthquakes, aftershocks often follow a large event). This self-exciting behavior gives rise to particular types of data clustering and, surprisingly, such clustering is also observed in crime data (as it turns out, burglars will often return to the same house, or a house nearby, shortly after a prior offense and commit another burglary).
In this talk we show how self-exciting point processes can be used for the purposes of crime pattern modeling, simulation, and forecasting. We will first discuss the application of standard point process models, which involves background intensity estimation, maximum likelihood estimation of parameters, and model evaluation using tests for clustering. Next we will show how behavioral dynamics, present in recent agent-based models of crime, can be incorporated into the point process framework. For this purpose we use state-dependent stochastic differential equations, which can be viewed as generalizations of kernel-based models. We conclude by discussing several practical applications.
|
|
Colloquium - Speaker Guilherme Rocha Date: November 17, 2008 - IMU Maple Room 3:00 - 4:00 PM A 51-node Wireless Sensor Network has been installed on the Golden Gate Bridge for Structural Health Monitoring. There is a mismatch between the rate at which the data are collected at each node (~4 Kbps, kilo-bytes/second) and the rate at which they can be transmitted (~0.5 Kbps). The ultimate goal is to develop a data reduction scheme so the WSN can perform real time monitoring of the dynamic properties of the bridge. For the temperature data (~0.80 Kbps/sensor), lossless run length coding achieves a significant reduction in the data rate (~0.04 Kbps/ sensor). For the acceleration data, our strategy is to construct a restricted parametric model for the bridge and continuously adjust it as data become available. The restrictions applied to the model reflect both physical considerations and communication constraints. We report the results of such strategies on simulated data sets. Joint work with David Culler, James Demmel, Gregory Fenves, Sukum Kim, Shamim Pakzad, and Bin Yu.
|
|
Colloquium Guest Speaker Chuan Goh Date: November 03, 2008 - IMU Maple Room 12:00 - 1:00 PM This paper proposes a test for the correct specification of a dynamic time-series model that is taken to be stationary about a deterministic linear trend function with no more than a finite number of discontinuities in the vector of trend coefficients. The test avoids the consideration of explicit alternatives to the null of trend stability. The proposal also does not involve the detailed modelling of the data-generating process of the stochastic component, which is simply assumed to satisfy a certain strong invariance principle for weakly dependent processes. As such, the resulting inference procedure is effectively an omnibus specification test for segmented linear trend stationarity. The test is of Wald-type, and is based on an asymptotically linear estimator of the vector of total-variation norms of the trend parameters whose influence function coincides with the efficient influence function.
Simulations illustrate the utility of this procedure to detect discrete breaks or continuous variation in the trend parameter as well as alternatives where the trend coefficients change randomly each period. This paper also includes an application examining the adequacy of a linear trend-stationary specification with infrequent trend breaks for the historical evolution of U.S. real output.
|
|
Colloquium - Speaker Karen Kafadar Date: October 20, 2008 - IMU Maple Room 3:00 - 4:00 PM Microarray technology has made available large data sets that can provide information on gene expression when cells are subjected to various treatments. Before proceeding with a formal statistical analysis, many biological and procedural aspects should be considered. These aspects may guide the analysis and subsequent statistical inference. Several of these issues are discussed in connection with the analysis of oligonucleotide and cDNA microarray experiments. The particular focus in this article is on effects caused by the cDNA slide manufacturing process, appropriate transformations of the data, and on adjustments for background. A prescription for the analysis of microarray data is proposed and demonstrated using data from a cDNA experiment comparing the genetic expressions in two mouse cell lines; a candidate set of genes is identified for further study. The prescription may be modified for oligonucleotide microarray data.
|
|
Colloquium - Speaker Chunfeng Huang Date: October 06, 2008 - IMU Walnut Room - 12 PM -1 PM In the study of isotropic intrinsically stationary spatial processes, a new nonparametric variogram estimator is proposed through its spectral representation. The spectrum estimation is formulated in terms of solving a regularized inverse problem. A numerical implementation is presented through quadratic programming. We demonstrate our method in a simulation study and a dataset of temperature changes over America.
|
|
Colloquium - Speaker Steen Andersson Date: September 15, 2008 - IMU Persimmon Room 12 PM - 1 PM Classical Wishart distributions on the open convex cones of positive definite matrices and their fundamental features are extended to generalized Riesz and Wishart distributions associated with decomposable undirected graphs using the basic theory of exponential families. The families of these distributions are parameterized by their expectations/natural parameter and multivariate shape parameter and have a non-trivial overlap with the generalized Wishart distributions defined in Andersson and Wojnar (2004a,b). This work also extends the Wishart distributions of type I in Letac and Massam (2007) and, more importantly, presents an alternative point of view on the latter paper.
|
|
Fall 2008 Courses Date: May 20, 2008 - A schedule of classes is now available for the Fall 2008 semester.
|
|
Colloquium Guest Speaker Douglas Steinley Date: April 28, 2008 - IMU - Persimmon Room 12pm - 1pm A variance-to-range ratio variable weighting procedure is proposed. The method is theoretically grounded in the inherent variability found in data exhibiting cluster structure. In addition, a variable selection procedure is proposed to operate in conjunction with the weighting technique. The performance of these procedures are compared to existing methods in the literature.
|
|
Colloquium Guest Speaker Richard Charnigo Date: April 24, 2008 - IMU - Walnut Room 4 pm - 5 pm Greater epidemiologic understanding of the relationships among fetal-infant mortality and its prognostic factors, including birthweight, could have vast public health implications. A key step toward that understanding is a realistic and tractable framework for analyzing birthweight distributions and heterogeneity in fetal-infant mortality. We propose describing a birthweight distribution using a normal mixture model in which the number of components is determined from the data, then estimating birthweight-specific mortality curves within each component of the normal mixture. We emphasize both methodological issues (e.g., How should the number of components be determined?) and interpretive issues (e.g., What do the components represent?). Data from the National Center for Health Statistics Public-Use Perinatal Mortality Data Files are used to compare our analytic framework to existing frameworks as well as to assess the reproducibility across repeated sampling of results obtained through our framework. (This talk is based on work with Lorie Wayne Chesnut, Tony LoBianco, and Russell S. Kirby.)
|
|
Colloquium Guest Speaker Michael Carbon Date: April 07, 2008 - IMU - Walnut Room 12 pm - 1 pm The purpose of this talk is to investigate the Frequency Polygon as a density estimator for stationary random fields indexed by multidimensional lattice points in space. Optimal binwidths which asymptotically minimize integrated mean square errors (IMSE) are derived. Under weak conditions, frequency polygons achieve the same rate of optimal uniform rate of convergence under general conditions. Rates of the a.s. convergence are given too. Finally, asymptotic normality of the frequency polygon estimator is established.
|
|
Colloquium Guest Speaker Zongwu Cai Date: March 20, 2008 - TBA - 4:00 pm - 5:00 pm Motivated by forecasting the inflation rate through nonstationary variables and efficient tests of stock return predictability as well as forecasts of the equity premium, this talk will focus on how to use nonparametric or semiparametric regression techniques to analyze nonstationary time series data. Development of a nonparametric approach to estimate the functionals will be discussed, as well as how the consistency and asymptotic normality of the proposed estimators are obtained. The asymptotic results have shown that the asymptotic bias is same for all estimators of functionals, but that the convergence rates are totally different for stationary and nonstationary covariates. These findings seem innovative in the literature.
|
|
Colloquium Guest Speaker Michael Levine Date: March 03, 2008 - IMU Walnut Room 12pm-1pm We consider a new separable nonparametric volatility model that allows for “interactions” in both mean and conditional variance (volatility) function. It can be concisely described as an additive-interactive nonlinear ARCH model. We propose this model as a possible alternative to the generalized additive nonlinear ARCH (GANARCH) model of Kim and Linton (2004), with which it shares the common origin. Unlike the GANARCH model, it does not assume the known link function but includes second-order interaction terms in both mean and variance functions instead. This ensures a much more data-driven model compared to GANARCH of Kim and Linton (2004) since our assumptions do not assume that anything know about the data distribution. This is very beneficial since, in practice, the data distribution has to be selected based on the exploratory data analysis, which is very difficult for multivariate data. Thus, the proposed model is much more flexible compared to GANARCH.
Motivated by the local instrumental variable estimation method (LIVE), also introduced in Kim and Linton (2004), we propose instrumental variable-based estimators of the components of the mean and volatility functions. The estimators are shown to be consistent and asymptotically normal. Explicit expressions for asymptotic means and variances of these estimators are also obtained. Several simulation experiments are conducted that show a very good performance of our algorithm for moderate sample sizes. Finally, the method is applied to the real data set of currency exchange rates where it leads to some interesting conclusions.
Historically, multiple functional component testing in nonparametric models has been a fairly difficult problem. We introduce a novel F-type approach to testing the significance of the two-way interactive terms in the mean function based on the unbalanced design ANOVA with unequal variances. Simulation studies show that the method performs very well for sample sizes of about 5000, which are easily available in financial applications.
|
|
Colloquium Guest Speaker Joanne Peng Date: February 25, 2008 - IMU Walnut Room 12pm-1pm For the past 25 years, advances have been made in missing data methods. Most published work has focused on missing data in the dependent variable under various conditions. The present study sought to fill the void by comparing two approaches for handling missing data in categorical covariates in logistic regression. These two approaches were EM method of weights and multiple imputation.
Sample data were first drawn randomly from a population with known characteristics. Missing data on covariates were subsequently simulated under two conditions: missing completely at random and missing at random with different missing rates. A logistic regression model was fit to each sample using either the EM or the MI approach. The performance of these two approaches was assessed on four criteria: bias, efficiency, coverage, and rejection rate.
Results generally favored MI over EM. Practical issues such as implementation, inclusion of continuous covariates, and interactions between covariates were discussed.
|
|
Colloquium Guest Speaker Vivekananda Roy Date: February 14, 2008 - IMU Walnut Room 4 pm - 5 pm We study Markov chain Monte Carlo algorithms for exploring the intractable posterior density that results when a probit regression likelihood is combined with a flat prior on the regression coefficient. We prove that the data augmentation algorithm of Albert and Chib (1993) and the PX-DA algorithm of Liu and Wu (1999) both converge at a geometric rate, which ensures the existence of central limit theorems (CLTs) for ergodic averages under a second moment condition. While these two algorithms are essentially equivalent in terms of computational complexity, we show that the PX-DA algorithm is theoretically more efficient in the sense that the asymptotic variance in the CLT under the PX-DA algorithm is no larger than that under Albert and Chib's algorithm. A simple, consistent estimator of the asymptotic variance in the CLT is constructed using regeneration. As an illustration, we apply our results to the lupus data from van Dyk and Meng (2001). In this particular example, the estimated asymptotic relative efficiency of the PX-DA algorithm with respect to Albert and Chib's algorithm is about 65, which demonstrates that huge gains in efficiency are possible by using PX-DA algorithm.
|
|
Colloquium Guest Speaker Jien Chen Date: February 04, 2008 - IMU Walnut Room 12pm-1pm As a non-parametric method, Empirical Likelihood (EL) has been attracting serious attention from researchers in statistics, econometrics, engineering and biostatistics. By defining the estimation equations in EL appropriately, we can extend EL to various data settings, especially those in which parametric likelihoods are absent. In this talk, I will provide two examples of such extensions: quantile estimation and longitudinal data analysis. Quantile estimation for discrete data analysis has not been well studied. For a given 0 < p < 1, the commonly used sample quantile may or may not be consistent for the pth quantile, depending on whether or not the underlying distribution has a plateau at the level of p. I propose an EL-based categorization procedure that not only helps determine the shape of the true distribution at level p, but also provides a way of formulating a new estimator that is consistent in any case. For non-Gaussian longitudinal data, generalized estimating equations (GEE) are a popular class of marginal models. While the GEE estimator is consistent and robust, it may suffer significant loss of efficiency if the working correlation structure is misspecified. I consider the use of EL to select working correlations for GEE models, for which parametric likelihoods are absent and quasi-likelihoods are difficult to construct.
|
|
Colloquium Guest Speaker Brian Reich Date: January 31, 2008 - IMU Walnut Room 4 pm - 5 pm Storm surge, the onshore rush of sea water caused by the high winds and low pressure associated with a hurricane, can compound the effects of inland flooding caused by rainfall, leading to loss of property and loss of life for residents of coastal areas. Numerical ocean models are essential for creating storm surge forecasts for coastal areas. These models are driven primarily by the surface wind forcings. Currently, the gridded wind fields used by ocean models are specified by deterministic formulas that are based on the central pressure and location of the storm center. While these equations incorporate important physical knowledge about the structure of hurricane surface wind fields, they cannot always capture the asymmetric and dynamic nature of a hurricane. A new Bayesian multivariate spatial statistical modeling framework is introduced combining data with physical knowledge about the wind fields to improve the estimation of the wind vectors. Many spatial models assume the data follow a Gaussian distribution. However, this may be overly-restrictive for wind fields data which often display erratic behavior, such as sudden changes in time or space. In this paper we develop a semiparametric multivariate spatial model for these data. Our model builds on the stick-breaking prior, which is frequently used in Bayesian modeling to capture uncertainty in the parametric form of an outcome. The stick-breaking prior is extended to the spatial setting by assigning each location a different, unknown distribution, and smoothing the distributions in space with a series of kernel functions. This semiparametric spatial model is shown to improve prediction compared to usual Bayesian Kriging methods for the wind field of Hurricane Ivan.
|
|
Colloquium Guest Speaker Guilherme Rocha Date: January 28, 2008 - IMU Maple Room - 12pm - 1pm Extracting useful information from high-dimensional data is an important focus of today's statistical research and practice. Penalized loss function minimization has been shown to be effective for this task. Quasi-norms on model parameters are frequently used as a penalty. Classical examples are AIC and BIC where the L0 quasi-norm (model dimension) is used as a penalty.
More recently, penalization by the L1-norm (lasso) has enjoyed a lot of attention. L1-penalized estimates are cheaper to compute (convex optimization) and lead to more stable model estimates than their L0 counterparts.
In this talk, I will present the Composite Absolute Penalties (CAP) family of penalties. CAP penalties allow given grouping and hierarchical relationships between the predictors to be expressed. They are built by defining groups of variables and combining the properties of norm penalties at the across group and within group levels. Grouped selection occurs for non-overlapping groups. Hierarchical variable selection is reached by defining groups with particular overlapping patterns.
Under easily verifiable assumptions, CAP penalties are convex: an attractive property from a computational stand-point. Within this subfamily, unbiased estimates of the degrees of freedom (df) exist so the regularization parameter is selected without cross-validation.
Simulation results show that CAP improves on the predictive performance of the LASSO for cases with p>>n and mis-specified groupings.
This is joint work with Peng Zhao and Bin Yu.
|
|
Colloquium Guest Speaker Junhui Wang Date: January 14, 2008 - IMU Maple Room - 12pm - 1pm Hierarchical classification is critical to knowledge and context management as well as knowledge exploration, as in gene function classification and discovery and document categorization. In hierarchical classification, an input is classified by a structured hierarchy. In a situation as such, the central issue is how to effectively utilize inter-class relationship to improve the generalization performance of flat classification ignoring such dependency. In this talk, a novel large margin method based on constraints characterizing multi-path hierarchy is presented within the framework of regularization. In particular, I will discuss three aspects: (1) the idea and methodology development; (2) computational tools; (3) a statistical learning theory. Numerical examples will be provided to demonstrate the advantage of our proposed methodology against other existing competitors. An application to gene function prediction and discovery will be discussed.
|
|
Colloquium Guest Speaker Guang Cheng Date: January 10, 2008 - IMU Walnut Room 4 pm - 5 pm Semiparametric modeling is an excellent framework due to its flexibility to model some features parametrically without making assumptions on the other features. However, the infinite-dimensional nuisance parameter in the semiparametric models generally poses several challenges for making maximum likelihood inference for the parameter of interest at both theoretical and methodological levels. We will consider a series of profile likelihood based semiparametric inference procedures either based on numerical methods, i.e. K-step MLE, or through MCMC sampling, i.e. the Profile Sampler and the Penalized Profile Sampler. All the above profile likelihood based methods avoid evaluation of the infinite-dimensional operator and are easy to implement. Furthermore, we investigate their second order asymptotic behaviors, which are proven to be related to the convergence rate of the nuisance parameter and thus adjustable.
|
|
Guest Colloquium Speaker William Cleveland Date: November 26, 2007 - IMU Walnut Room 12pm-1pm Large, complex data sets are ubiquitous, the standard now rather than the exception. They present challenging problems of analysis because of their size and the complexity of their data structures and patterns. One approach is to compute summary statistics at the outset to reduce the complexity, but this expedient risks losing important information in the data. The goal should be lossless analysis: analyze the data at a level of detail and comprehensiveness that does not sacrifice
information.
Achieving lossless analysis of complex data today is immensely challenging. New fundamental approaches and methods are needed for each of the different areas that come into play in the analysis of the data --- databases, data processing, data structures, statistical models and methods, machine learning algorithms, data
visualization, computational algorithms, software environments, and hardware environments. In fact, it has never been harder to achieve lossless analysis because complexity has increased faster than our innovations in these areas.
Nothing serves lossless analysis better than data visualization, the only practical way to absorb large amounts of information in detail. But for today's complex sets we must visualize far larger amounts than in the past. We must be ready to accept large displays each covering tens or even hundreds of screensful (pages). For a single data set it is reasonable to have hundreds of such displays. These displays become a new database produced from the data that is queried and studied. For a display of 500 pages, we might query and study all or just a few of the pages depending on the task.
Producing, querying, and studying a visualization database needs new ideas. There are different modes of viewing the many pages and panels per page of a large display, from slow focused study to very rapid scans. We need creative interfaces to facilitate the different modes. We cannot fuss with very large displays, interacting with the micro-elements to get them right, because there is too much; instead there should be smart automation algorithms that get the large display right the first time. We must consider the physical screen space, its size and resolution, to make it work most effectively for the visual study. We need methods of display that result in pre-attentive visual formation of gestalts that show instantaneously the relevant patterns in the data. This necessitates, strangely, more displays, starting with broad brush looks to derivative displays whose redesigns show specific aspects of the broad brush more effectively. It also requires the study of visual perception.
|
|
Colloquium Guest Speaker Tonu Kollo Date: November 12, 2007 - IMU Walnut Room 12pm-1pm In the last ten years, remarkable development has occurred in the area of skewed multivariate distributions. Skew normal distribution was introduced in 1996 by Azzalini & Dalla Valle (Biometrika). Azzalini’s construction of the distribution was very fruitful and was later successfully applied to many other elliptical families of distributions. Random vector X is skew normally distributed with parameters α – a p-vector as the shape parameter and Σ - a positive definite p×p-matrix as the scale parameter when the density function of X is the product of the density function of N(0,Σ) and the distribution function of the standard normal distribution with the shape parameter appearing in the argument of the distribution function.
In the talk, basic properties of the skew normal family will be discussed and other more often used families of skewed elliptical distributions will be examined (multivariate skew t-distribution, for instance). With new families of distributions, new estimation and testing problems have risen. Classical estimation methods do not often work: maximum likelihood method can give wrong estimates and much bias is possible using moments’ method.
Another type of skewed multivariate distributions is presented by asymmetric Laplace distribution, which was carefully examined by T. Kozubowski in a series of papers at the end of 1990s. In this case, we do not have explicit expression for the density function and estimation, testing and fitting problems have to be solved on the basis of the characteristic function. This distribution will also be considered in more detail.
|
|
Colloquium Guest Speaker Victor Goodman Date: November 05, 2007 - IMU Walnut Room 12pm-1pm Forward Interest rates are simultaneously measured using up-to-the-minute bond trading quotes. The time variation of these rates determines a high-dimensional covariance matrix that might be used to model bond yields within a country's government bond market. Several PC analyses, with 1989-92 data in the U.K. market, 1887-94 data in the U.S. market, and 2001-05 data in the U.S. market, reveal a striking pattern involving the first three eigenvectors of each covariance matrix. In this talk I describe the pattern and make the (well-known) case for using a three-factor Gaussian model to describe bond trading.
It is difficult to implement models based on PC estimates for the first eigenvector. An initial attempt to produce a model in 1988 ended in failure since the model had a financial inconsistency. A more recent model behaves better; its covariance has the desired eigenvector and the model is arbitrage-free. Surprisingly, the new model appears when we condition prices not to collapse in the old model. This suggests that an issue of survivorship may arise even in no-default bond markets.
|
|
Colloquium Guest Speaker Jerome Busemeyer Date: October 29, 2007 - IMU Walnut Room 12pm-1pm Social and behavioral scientists face some of the same measurement problems that forced physicists to abandon classical probability theory. Their measurements are often incompatible, and the first measurement may disturb a second measurement. Thus only partial information about a complex system can be obtained at any point in time. Combining partial information about a system into a coherent understanding of the entire system is the hallmark of quantum theory. Quantum theory provides a fundamentally different approach to logic, reasoning, and probabilistic inference. For example, quantum logic does not always follow the distributive axiom of Boolean logic; quantum probabilities do not always obey the Kolmogorov law of total probability; quantum reasoning does not always obey the principle of monotonic reasoning.
For this talk, I will present a tutorial of the basic assumptions of classic versus quantum probability theories. These basic assumptions will be examined, side by side, in a parallel and elementary manner. Classic theory will emerge as a possibly overly restrictive case of the more general quantum theory. The fundamental implications of these contrasting assumptions for measurement in the social and behavioral sciences will be examined.
|
|
Colloquium Guest Speaker Juan Carlos Escanciano Date: October 22, 2007 - IMU Redbud Room SPECIAL TIME: 11:30AM - 12:30PM A general method for testing the martingale difference hypothesis is proposed. The new tests are data-driven smooth tests based on the principal components of certain marked empirical processes that are asymptotically distribution-free, with critical values that are already tabulated. The smooth tests are shown to be optimal in a semiparametric sense discussed in the paper, and they are robust to conditional heteroscedasticity of unknown form. A simulation study shows that the data-driven smooth tests perform very well for a wide range of realistic alternatives and have more power than omnibus and other competing tests. Finally, two empirical examples highlight the merits of our approach.
|
|
Spring 2008 Courses Date: October 05, 2007 - Click here to view list of Spring 2008 Courses.
|
|
|