23º SINAPE - Simpósio Nacional de Probabilidade e Estatística

Dados do Trabalho


Título

SPARSE BAYESIAN MODEL OF BINARY RESPONSE WITH ASYMMETRIC LINK FUNCTION FOR TEXT CATEGORIZATION

Resumo

A typical problem when dealing datasets with a large amount of covariates compared to
small sample sizes is to satisfactorily estimate the parameters associated with each covariate.
When the number of covariates greatly exceeds the sample size, the parameter estimation be-
comes very dicult. In various areas of application such as text categorization, it is necessary
the task of selecting important covariates and avoiding the over tting of the model.
In this work, we developed a Sparse Bayesian binary regression model with asymmetric
link function for text categorization. In addition, we assign a sparse prior distribution (double
exponential) for regression parameters to favor sparsity and to reduce the number of covariates
in the model. The performance of the proposed model is demonstrated with real data set, the
Reuters R8 corpus. The dataset contains the eight most frequent classes from the Reuters-
21578 collection of newswire articles. The eight classes consist of a minimum of 51 up to 3923
documents and sum up to a total of 7674 texts.
Parameter estimation is performed considering Hamiltonian Monte Carlo estimation method
on No-U-Turn Sampler (NUTS) extension, using the Stan software in the R package.

Palavras-chave

Bayesian lasso, Skew link, Sparsity, Text categorization.

Área

Inferência Bayesiana

Autores

Hugo Miguel Agurto Mejía, Márcia D'Elia Branco