Statistical Learning with Sparsity

The Lasso and Generalizations

Author: Trevor Hastie,Robert Tibshirani,Martin Wainwright

Publisher: CRC Press

ISBN: 1498712177

Category: Business & Economics

Page: 367

View: 8835

Discover New Methods for Dealing with High-Dimensional Data A sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data. Top experts in this rapidly evolving field, the authors describe the lasso for linear regression and a simple coordinate descent algorithm for its computation. They discuss the application of l1 penalties to generalized linear models and support vector machines, cover generalized penalties such as the elastic net and group lasso, and review numerical methods for optimization. They also present statistical inference methods for fitted (lasso) models, including the bootstrap, Bayesian methods, and recently developed approaches. In addition, the book examines matrix decomposition, sparse multivariate analysis, graphical models, and compressed sensing. It concludes with a survey of theoretical results for the lasso. In this age of big data, the number of features measured on a person or object can be large and might be larger than the number of observations. This book shows how the sparsity assumption allows us to tackle these problems and extract useful and reproducible patterns from big datasets. Data analysts, computer scientists, and theorists will appreciate this thorough and up-to-date treatment of sparse statistical modeling.

Statistical Learning with Sparsity

The Lasso and Generalizations

Author: Trevor Hastie,Robert Tibshirani,Martin Wainwright

Publisher: Chapman and Hall/CRC

ISBN: 9781498712163

Category: Business & Economics

Page: 367

View: 8848

Discover New Methods for Dealing with High-Dimensional Data A sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data. Top experts in this rapidly evolving field, the authors describe the lasso for linear regression and a simple coordinate descent algorithm for its computation. They discuss the application of l1 penalties to generalized linear models and support vector machines, cover generalized penalties such as the elastic net and group lasso, and review numerical methods for optimization. They also present statistical inference methods for fitted (lasso) models, including the bootstrap, Bayesian methods, and recently developed approaches. In addition, the book examines matrix decomposition, sparse multivariate analysis, graphical models, and compressed sensing. It concludes with a survey of theoretical results for the lasso. In this age of big data, the number of features measured on a person or object can be large and might be larger than the number of observations. This book shows how the sparsity assumption allows us to tackle these problems and extract useful and reproducible patterns from big datasets. Data analysts, computer scientists, and theorists will appreciate this thorough and up-to-date treatment of sparse statistical modeling.

The Elements of Statistical Learning

Data Mining, Inference, and Prediction

Author: Trevor Hastie,Robert Tibshirani,Jerome Friedman

Publisher: Springer Science & Business Media

ISBN: 0387216065

Category: Mathematics

Page: 536

View: 7748

During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book’s coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for “wide” data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

An Introduction to Statistical Learning

with Applications in R

Author: Gareth James,Daniela Witten,Trevor Hastie,Robert Tibshirani

Publisher: Springer Science & Business Media

ISBN: 1461471389

Category: Mathematics

Page: 426

View: 2733

An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform. Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.

Introduction to High-Dimensional Statistics

Author: Christophe Giraud

Publisher: CRC Press

ISBN: 1482237954

Category: Business & Economics

Page: 270

View: 5814

Ever-greater computing technologies have given rise to an exponentially growing volume of data. Today massive data sets (with potentially thousands of variables) play an important role in almost every branch of modern human activity, including networks, finance, and genetics. However, analyzing such data has presented a challenge for statisticians and data analysts and has required the development of new statistical methods capable of separating the signal from the noise. Introduction to High-Dimensional Statistics is a concise guide to state-of-the-art models, techniques, and approaches for handling high-dimensional data. The book is intended to expose the reader to the key concepts and ideas in the most simple settings possible while avoiding unnecessary technicalities. Offering a succinct presentation of the mathematical foundations of high-dimensional statistics, this highly accessible text: Describes the challenges related to the analysis of high-dimensional data Covers cutting-edge statistical methods including model selection, sparsity and the lasso, aggregation, and learning theory Provides detailed exercises at the end of every chapter with collaborative solutions on a wikisite Illustrates concepts with simple but clear practical examples Introduction to High-Dimensional Statistics is suitable for graduate students and researchers interested in discovering modern statistics for massive data. It can be used as a graduate text or for self-study.

Computer Age Statistical Inference

Algorithms, Evidence, and Data Science

Author: Bradley Efron,Trevor Hastie

Publisher: Cambridge University Press

ISBN: 1108107958

Category: Mathematics

Page: N.A

View: 7894

The twenty-first century has seen a breathtaking expansion of statistical methodology, both in scope and in influence. 'Big data', 'data science', and 'machine learning' have become familiar terms in the news, as statistical methods are brought to bear upon the enormous data sets of modern science and commerce. How did we get here? And where are we going? This book takes us on an exhilarating journey through the revolution in data analysis following the introduction of electronic computation in the 1950s. Beginning with classical inferential theories - Bayesian, frequentist, Fisherian - individual chapters take up a series of influential topics: survival analysis, logistic regression, empirical Bayes, the jackknife and bootstrap, random forests, neural networks, Markov chain Monte Carlo, inference after model selection, and dozens more. The distinctly modern approach integrates methodology and algorithms with statistical inference. The book ends with speculation on the future direction of statistics and data science.

Statistics for High-Dimensional Data

Methods, Theory and Applications

Author: Peter Bühlmann,Sara van de Geer

Publisher: Springer Science & Business Media

ISBN: 364220192X

Category: Mathematics

Page: 558

View: 8021

Modern statistics deals with large and complex data sets, and consequently with models containing a large number of parameters. This book presents a detailed account of recently developed approaches, including the Lasso and versions of it for various models, boosting methods, undirected graphical modeling, and procedures controlling false positive selections. A special characteristic of the book is that it contains comprehensive mathematical theory on high-dimensional statistics combined with methodology, algorithms and illustrations with real data examples. This in-depth approach highlights the methods’ great potential and practical applicability in a variety of settings. As such, it is a valuable resource for researchers, graduate students and experts in statistics, applied mathematics and computer science.

Generalized Additive Models

Author: T.J. Hastie

Publisher: Routledge

ISBN: 1351445960

Category: Mathematics

Page: 352

View: 3477

This book describes an array of power tools for data analysis that are based on nonparametric regression and smoothing techniques. These methods relax the linear assumption of many standard models and allow analysts to uncover structure in the data that might otherwise have been missed. While McCullagh and Nelder's Generalized Linear Models shows how to extend the usual linear methodology to cover analysis of a range of data types, Generalized Additive Models enhances this methodology even further by incorporating the flexibility of nonparametric regression. Clear prose, exercises in each chapter, and case studies enhance this popular text.

An Introduction to the Bootstrap

Author: Bradley Efron,R.J. Tibshirani

Publisher: CRC Press

ISBN: 9780412042317

Category: Mathematics

Page: 456

View: 6444

Statistics is a subject of many uses and surprisingly few effective practitioners. The traditional road to statistical knowledge is blocked, for most, by a formidable wall of mathematics. The approach in An Introduction to the Bootstrap avoids that wall. It arms scientists and engineers, as well as statisticians, with the computational techniques they need to analyze and understand complicated data sets.

Sparse and Redundant Representations

From Theory to Applications in Signal and Image Processing

Author: Michael Elad

Publisher: Springer Science & Business Media

ISBN: 1441970118

Category: Mathematics

Page: 376

View: 1052

A long long time ago, echoing philosophical and aesthetic principles that existed since antiquity, William of Ockham enounced the principle of parsimony, better known today as Ockham’s razor: “Entities should not be multiplied without neces sity. ” This principle enabled scientists to select the ”best” physical laws and theories to explain the workings of the Universe and continued to guide scienti?c research, leadingtobeautifulresultsliketheminimaldescriptionlength approachtostatistical inference and the related Kolmogorov complexity approach to pattern recognition. However, notions of complexity and description length are subjective concepts anddependonthelanguage“spoken”whenpresentingideasandresults. The?eldof sparse representations, that recently underwent a Big Bang like expansion, explic itly deals with the Yin Yang interplay between the parsimony of descriptions and the “language” or “dictionary” used in them, and it became an extremely exciting area of investigation. It already yielded a rich crop of mathematically pleasing, deep and beautiful results that quickly translated into a wealth of practical engineering applications. You are holding in your hands the ?rst guide book to Sparseland, and I am sure you’ll ?nd in it both familiar and new landscapes to see and admire, as well as ex cellent pointers that will help you ?nd further valuable treasures. Enjoy the journey to Sparseland! Haifa, Israel, December 2009 Alfred M. Bruckstein vii Preface This book was originally written to serve as the material for an advanced one semester (fourteen 2 hour lectures) graduate course for engineering students at the Technion, Israel.

Inferential Models

Reasoning with Uncertainty

Author: Ryan Martin,Chuanhai Liu

Publisher: CRC Press

ISBN: 1439886512

Category: Mathematics

Page: 256

View: 1457

A New Approach to Sound Statistical Reasoning Inferential Models: Reasoning with Uncertainty introduces the authors’ recently developed approach to inference: the inferential model (IM) framework. This logical framework for exact probabilistic inference does not require the user to input prior information. The authors show how an IM produces meaningful prior-free probabilistic inference at a high level. The book covers the foundational motivations for this new IM approach, the basic theory behind its calibration properties, a number of important applications, and new directions for research. It discusses alternative, meaningful probabilistic interpretations of some common inferential summaries, such as p-values. It also constructs posterior probabilistic inferential summaries without a prior and Bayes’ formula and offers insight on the interesting and challenging problems of conditional and marginal inference. This book delves into statistical inference at a foundational level, addressing what the goals of statistical inference should be. It explores a new way of thinking compared to existing schools of thought on statistical inference and encourages you to think carefully about the correct approach to scientific inference.

Subset Selection in Regression

Author: Alan Miller

Publisher: CRC Press

ISBN: 1420035932

Category: Mathematics

Page: 256

View: 889

Originally published in 1990, the first edition of Subset Selection in Regression filled a significant gap in the literature, and its critical and popular success has continued for more than a decade. Thoroughly revised to reflect progress in theory, methods, and computing power, the second edition promises to continue that tradition. The author has thoroughly updated each chapter, incorporated new material on recent developments, and included more examples and references. New in the Second Edition: A separate chapter on Bayesian methods Complete revision of the chapter on estimation A major example from the field of near infrared spectroscopy More emphasis on cross-validation Greater focus on bootstrapping Stochastic algorithms for finding good subsets from large numbers of predictors when an exhaustive search is not feasible Software available on the Internet for implementing many of the algorithms presented More examples Subset Selection in Regression, Second Edition remains dedicated to the techniques for fitting and choosing models that are linear in their parameters and to understanding and correcting the bias introduced by selecting a model that fits only slightly better than others. The presentation is clear, concise, and belongs on the shelf of anyone researching, using, or teaching subset selecting techniques.

Statistical Learning and Data Science

Author: Mireille Gettler Summa,Leon Bottou,Bernard Goldfarb,Fionn Murtagh,Catherine Pardoux,Myriam Touati

Publisher: CRC Press

ISBN: 143986764X

Category: Business & Economics

Page: 243

View: 2012

Data analysis is changing fast. Driven by a vast range of application domains and affordable tools, machine learning has become mainstream. Unsupervised data analysis, including cluster analysis, factor analysis, and low dimensionality mapping methods continually being updated, have reached new heights of achievement in the incredibly rich data world that we inhabit. Statistical Learning and Data Science is a work of reference in the rapidly evolving context of converging methodologies. It gathers contributions from some of the foundational thinkers in the different fields of data analysis to the major theoretical results in the domain. On the methodological front, the volume includes conformal prediction and frameworks for assessing confidence in outputs, together with attendant risk. It illustrates a wide range of applications, including semantics, credit risk, energy production, genomics, and ecology. The book also addresses issues of origin and evolutions in the unsupervised data analysis arena, and presents some approaches for time series, symbolic data, and functional data. Over the history of multidimensional data analysis, more and more complex data have become available for processing. Supervised machine learning, semi-supervised analysis approaches, and unsupervised data analysis, provide great capability for addressing the digital data deluge. Exploring the foundations and recent breakthroughs in the field, Statistical Learning and Data Science demonstrates how data analysis can improve personal and collective health and the well-being of our social, business, and physical environments.

Algebraic Geometry and Statistical Learning Theory

Author: Sumio Watanabe

Publisher: Cambridge University Press

ISBN: 0521864674

Category: Computers

Page: 286

View: 4264

Sure to be influential, Watanabe's book lays the foundations for the use of algebraic geometry in statistical learning theory. Many models/machines are singular: mixture models, neural networks, HMMs, Bayesian networks, stochastic context-free grammars are major examples. The theory achieved here underpins accurate estimation techniques in the presence of singularities.

Support Vector Machines

Optimization Based Theory, Algorithms, and Extensions

Author: Naiyang Deng,Yingjie Tian,Chunhua Zhang

Publisher: CRC Press

ISBN: 1439857938

Category: Business & Economics

Page: 363

View: 2185

Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions presents an accessible treatment of the two main components of support vector machines (SVMs)—classification problems and regression problems. The book emphasizes the close connection between optimization theory and SVMs since optimization is one of the pillars on which SVMs are built. The authors share insight on many of their research achievements. They give a precise interpretation of statistical leaning theory for C-support vector classification. They also discuss regularized twin SVMs for binary classification problems, SVMs for solving multi-classification problems based on ordinal regression, SVMs for semi-supervised problems, and SVMs for problems with perturbations. To improve readability, concepts, methods, and results are introduced graphically and with clear explanations. For important concepts and algorithms, such as the Crammer-Singer SVM for multi-class classification problems, the text provides geometric interpretations that are not depicted in current literature. Enabling a sound understanding of SVMs, this book gives beginners as well as more experienced researchers and engineers the tools to solve real-world problems using SVMs.

Nonparametric and Semiparametric Models

Author: Wolfgang Härdle,Marlene Müller,Stefan Sperlich,Axel Werwatz

Publisher: Springer Science & Business Media

ISBN: 364217146X

Category: Mathematics

Page: 300

View: 620

The statistical and mathematical principles of smoothing with a focus on applicable techniques are presented in this book. It naturally splits into two parts: The first part is intended for undergraduate students majoring in mathematics, statistics, econometrics or biometrics whereas the second part is intended to be used by master and PhD students or researchers. The material is easy to accomplish since the e-book character of the text gives a maximum of flexibility in learning (and teaching) intensity.

Gaussian Markov Random Fields

Theory and Applications

Author: Havard Rue,Leonhard Held

Publisher: CRC Press

ISBN: 9780203492024

Category: Mathematics

Page: 280

View: 3446

Gaussian Markov Random Field (GMRF) models are most widely used in spatial statistics - a very active area of research in which few up-to-date reference works are available. This is the first book on the subject that provides a unified framework of GMRFs with particular emphasis on the computational aspects. This book includes extensive case-studies and, online, a c-library for fast and exact simulation. With chapters contributed by leading researchers in the field, this volume is essential reading for statisticians working in spatial theory and its applications, as well as quantitative researchers in a wide range of science fields where spatial data analysis is important.

High-Dimensional Covariance Estimation

With High-Dimensional Data

Author: Mohsen Pourahmadi

Publisher: John Wiley & Sons

ISBN: 1118573668

Category: Mathematics

Page: 208

View: 6565

Methods for estimating sparse and large covariance matrices Covariance and correlation matrices play fundamental roles in every aspect of the analysis of multivariate data collected from a variety of fields including business and economics, health care, engineering, and environmental and physical sciences. High-Dimensional Covariance Estimation provides accessible and comprehensive coverage of the classical and modern approaches for estimating covariance matrices as well as their applications to the rapidly developing areas lying at the intersection of statistics and machine learning. Recently, the classical sample covariance methodologies have been modified and improved upon to meet the needs of statisticians and researchers dealing with large correlated datasets. High-Dimensional Covariance Estimation focuses on the methodologies based on shrinkage, thresholding, and penalized likelihood with applications to Gaussian graphical models, prediction, and mean-variance portfolio management. The book relies heavily on regression-based ideas and interpretations to connect and unify many existing methods and algorithms for the task. High-Dimensional Covariance Estimation features chapters on: Data, Sparsity, and Regularization Regularizing the Eigenstructure Banding, Tapering, and Thresholding Covariance Matrices Sparse Gaussian Graphical Models Multivariate Regression The book is an ideal resource for researchers in statistics, mathematics, business and economics, computer sciences, and engineering, as well as a useful text or supplement for graduate-level courses in multivariate analysis, covariance estimation, statistical learning, and high-dimensional data analysis.

Machine Learning and Medical Imaging

Author: Guorong Wu,Dinggang Shen,Mert Sabuncu

Publisher: Academic Press

ISBN: 0128041145

Category: Technology & Engineering

Page: 512

View: 839

Machine Learning and Medical Imaging presents state-of- the-art machine learning methods in medical image analysis. It first summarizes cutting-edge machine learning algorithms in medical imaging, including not only classical probabilistic modeling and learning methods, but also recent breakthroughs in deep learning, sparse representation/coding, and big data hashing. In the second part leading research groups around the world present a wide spectrum of machine learning methods with application to different medical imaging modalities, clinical domains, and organs. The biomedical imaging modalities include ultrasound, magnetic resonance imaging (MRI), computed tomography (CT), histology, and microscopy images. The targeted organs span the lung, liver, brain, and prostate, while there is also a treatment of examining genetic associations. Machine Learning and Medical Imaging is an ideal reference for medical imaging researchers, industry scientists and engineers, advanced undergraduate and graduate students, and clinicians. Demonstrates the application of cutting-edge machine learning techniques to medical imaging problems Covers an array of medical imaging applications including computer assisted diagnosis, image guided radiation therapy, landmark detection, imaging genomics, and brain connectomics Features self-contained chapters with a thorough literature review Assesses the development of future machine learning techniques and the further application of existing techniques

Neural Networks and Statistical Learning

Author: Ke-Lin Du,M. N. S. Swamy

Publisher: Springer Science & Business Media

ISBN: 1447155718

Category: Computers

Page: 824

View: 3890

Providing a broad but in-depth introduction to neural network and machine learning in a statistical framework, this book provides a single, comprehensive resource for study and further research. All the major popular neural network models and statistical learning approaches are covered with examples and exercises in every chapter to develop a practical working understanding of the content. Each of the twenty-five chapters includes state-of-the-art descriptions and important research results on the respective topics. The broad coverage includes the multilayer perceptron, the Hopfield network, associative memory models, clustering models and algorithms, the radial basis function network, recurrent neural networks, principal component analysis, nonnegative matrix factorization, independent component analysis, discriminant analysis, support vector machines, kernel methods, reinforcement learning, probabilistic and Bayesian networks, data fusion and ensemble learning, fuzzy sets and logic, neurofuzzy models, hardware implementations, and some machine learning topics. Applications to biometric/bioinformatics and data mining are also included. Focusing on the prominent accomplishments and their practical aspects, academic and technical staff, graduate students and researchers will find that this provides a solid foundation and encompassing reference for the fields of neural networks, pattern recognition, signal processing, machine learning, computational intelligence, and data mining.