Stata User Group Meeting 2016 – Sydney

29 - 30 September 2016

Oceania Stata Conference 2024

Stata User Group Meeting Presentations

Robert Borotkanics, Macquarie University

Meta-analysis of self-control study design: Methods and associated application of metan

Self-control designs are necessitated in situations where there is a desire to assess the effectiveness of an intervention in a small study population. One such situation is the assessment of a treatment modality called abdominal functional electrical stimulation and whether or not its application results in improved respiratory function in patients suffering from paralysis. Further, meta-analysis of studies applying a self-control study design with repeat measures require adaptation of established methods in order to perform scientifically sound analyses. In this study, we applied a methodology using a specific adaptation of metan to carry out this complex statistical analysis. Studies that met inclusion criteria were classiffied into two broad categories: acute and chronic. Acute studies compared respiratory function prior to and during Abdominal FES. Chronic studies measured the chronic effect of Abdominal FES training. For both acute and chronic studies, analyses were carried out using either fixed effects models, using the Inverse of the Variance (I-V) approach, or random effects models, using the DerSimonian and Laird (D-L) approach. Model choice was determined by the between study heterogeneity of pooled results, using the I2 statistic. Due to differences in baseline function between studies, estimates of effect were made using the Standardised Mean Difference (SMD), applying Glass’s D. This method is preferred where the intervention may potentially alter observed variability and is less susceptible to small sample bias than other SMD techniques. Multiple models were applied to compare time points in the self-control chronic studies, with similar analyses applied to RCTs at equal time points. A descriptive approach was used to analyse trends observed in the chronic studies, with data normalised based on minimum within study values for each measure of respiratory function. Publication bias was assessed using the Begg and Mazumdar test and the Eggar approach. All statistical analyses were carried out using Stata 14. This methodology was successfully applied and in press: McCaughey, et al. Abdominal functional electrical stimulation to improve respiratory function after spinal cord injury: a systematic review and meta-analysis. Spinal Cord [in press, accepted 2015]. The methodology, applying computational methods enabled by Stata represents an important approach to the meta-analysis of self-control study designs.
See full presentation

Demetris Christodoulou, The University of Sydney

optaspect: Heuristic rules for finding the optimal aspect ratio in a two-variable line plot

Line plots encode a series of slopes from adjoining coordinates and aim to reveal suggestive patterns in the sequential rates of change. The judged prevalence of patterns in the bivariate series and the degree of steepness in the rates of change is largely determined by the choice of aspect ratio that is imposed on the line plot. Choosing an appropriate aspect ratio is key in designing informative line plots. The command optaspect calculates the optimal aspect ratio in a two-variable line graph using a number of heuristic criteria. This paper has been accepted for publication in The Stata Journal in 2017. Anyone interested in obtaining the paper, the related ad-files, help files and supprting datasets should contact Demetris directly at

Demetris Christodoulou, The University of Sydney

visflow: Workflow for data visualisation

visflow describes a practical workflow for data visualization using Stata. The workflow applies a principle-based approach that is informed by graph theory and standards of practice. Although it offers a conceptual framework it retains a strictly practical approach to data visualization. {cmd:visflow} brings structure to graph syntax by reconciling graph tools and encoding approaches under a unifying framework that can be applied in a systematic manner across a wide range of graphing questions. visflow comprises of a collection of help files that can be navigated using cross-referenced hyperlinks, and offers more than 100 graph examples whose code is provided in the accompanying {cmd:visflowgr} ado-files. Note: There is no paper for this presentation. The presentation was based on a collection of help files. Please contact Demetris directly at

Joanna Dipnall, Deakin University

The development of a risk index for depression using Stata’s GSEM with complex survey data

Depression is a common mental illness worldwide. The World Health Organization (WHO) estimate that 350 million people of all ages suffer from depression globally. This illness affects a person’s wellbeing, ability to work and social interactions. However, many suffer undiagnosed. The aim of this analysis was to develop a risk index for depression using a well-known US population-based sample. Depression was measured using a self-report diagnostic and dichotomised into those with and without depression. A number of generalized structural equation models (GSEM) using Stata 14 were developed with depression as the outcome to form a final path model for the index. SEM models utilised a set of statistical techniques to measure and analyze relationships between a set of observed biomarker, lifestyle and medical symptom indicators (path analysis) and a latent diet variable (confirmatory factor analysis) with depression. Linear causal relationships among variables were examined, while simultaneously accounting for measurement error. Using Stata’s GSEM with the complex multi-stage survey sample meant the point estimates, standard errors and tests were adjusted accordingly. The final model consisted of more than one dependent variable with multiple direct and indirect effects. The model was tested across certain key demographic groups to ensure configurable invariance.

Susan Donath, Murdoch Children's Research Institute

table1: A program to create a customisable table of summary statistics

>table1 is a Stata ado program which produces one- and two-way tables of summary statistics for a list of numeric variables. The rows of the table are formed from the list of specified variables. If no byvariable is specified, the table has only one column of results. If a byvariable is specified, the table has a column of results for each level of the byvariable, with an optional additional totals column. Unlike other Stata tabulation commands (such as tabulate, table, tabstat), the row variables can be a mixture of continuous variables (summarized by mean, standard deviation, etc) and categorical variables (summarized by percentages and frequencies). Additional features include: (i) several different options for displaying missing and non-missing counts; (ii) considerable flexibility in the way the results are displayed, in particular, the summary statistics and their presentation can be different for each row variable; (iii) results can be restricted to subgroups of the data for individual row variables; and (iv) the contents of the table can be saved as a Stata data file or text file, or exported to Excel. The motivation for table1 is the descriptive table commonly seen in health research publications in which the baseline characteristics of two or more groups are compared. This descriptive table usually has only one column for each group, generally with at least 2 summary statistics in each column (for example, mean and standard deviation for continuous variables, or percentage and frequency for categorical variables). The output of table1 therefore differs from that of tabout in that there is only a single column for each group. The aim of table1 is to assist with reproducible research by enabling creation of a table whose contents can be used unchanged in publications.
See full presentation

Le Ma, University of Technology Sydney

getpatent: Web scraping patent data into Stata

The command getpatent crawls relevant websites that store patent-related information to store the source code, and then uses regular expressions to web-scrape key patent data into Stata, gradually building a database. The database holds observations on official patent application numbers and dates, the granting date, inventors and patent’s name, classification codes and patent claims, plus cross-referencing data on the number of patent backward and forward citations.
See full presentation

Con Menictas, University of Newcastle

Using Stata for segmentation

Tagging a segmentation solution for large data sets is problematic when the segmentation is built on soft variables such as attitudes, interests and opinions, because the tagging variables are usually demographics. We examine an alternative approach to increasing the tagging success of soft variable segmentations for large data sets.

Philip S. Morrison, Victoria University of Wellington

Estimating contextual effects in social science using multi-level modelling in Stata

An increasing number of social sciences are now paying much closer attention to the effect of context on behaviour: how the characteristics of the neighbourhood moderate the behaviour of residents, for example, or the degree to which characteristics of the workplace condition job satisfaction. The classic application in the social sciences is how the performance of pupils is moderated by characteristics of their class and their school. In each case level 1 units of analysis (usually individuals) are nested within level 2 or level 3 categories. In each of the above examples individuals are clustered either spatially or organisationally (or both). Multi-level modelling is now a standard way of addressing not only the need to recognise the lack of statistical independence which joint membership of given contexts usually brings, but also the relationship that context plays theoretically. My presentation will introduce the capabilities of two commands, mixed-effects linear regression (mixed) and mixed-effects binary regression (melogit). Special attention will be paid to post-estimation and the graphical representation of intercept and slope effects including the use of margins. I will reflect how much additional information about specific behaviours I have learned by applying these applications in Stata 14 in my home discipline of human geography.
See full presentation

Paul Mwebaze, CSIRO

Socio-economic factors influencing productivity among cassava farmers in East Africa

Cassava is the second most important food crop in Africa after maize. It is a major staple crop for more than 200 million people in East and Central Africa, most of them living in poverty in rural areas. However, its production is undermined by several factors, particularly the problem of emerging and endemic pests and diseases. We conducted a comprehensive socio-economic study covering Uganda, Tanzania and Malawi to determine the status of cassava production with the following specific objectives and research questions: What is the present status of cassava production/productivity? How efficient are cassava producers? What is the current adoption rate of improved cassava production technologies? What is the economic impact of B. tabaci complex on smallholder farmers? Primary data for this study was collected from cassava farmers in Uganda, Tanzania and Malawi, using a pre-tested survey questionnaire which was orally administered to individual farmers. A total of 800 respondents were selected and interviewed using a multi-stage random sampling technique. Using Stata, the data is analysed with a stochastic frontier production model to evaluate the costs, returns and productivity of cassava farmers in this region. Here we present some of the preliminary results, discuss the implications, and further work required. Further research is being conducted in this area by Paul and the CSIRO. The results will be available directly from the author when the are complete.

Enrique Pinzon, StataCorp

Dealing with endogeneity using Stata

Stata has multiple estimators that account for endogeneity. I will briefly discuss these estimators and their assumptions. My main focus however, will be to talk about estimators that account for endogeneity that are not in Stata and can be implemented using gsem and gmm.
See full presentation

Steve Quinn, Swinburne University of Technology

The unweighted sum of squares goodness-of-fit statistic for binary regression

The statistic most commonly used to evaluate the adequacy of logistic regression model is the Hosmer-Lemeshow statistic. The authors proposed a goodness-of-fit test based on partitioning the fitted probabilities into a number of groups and compared observed events to expected events within each group. They showed via simulations that the resulting statistic follows a chi-squared distribution with degrees of freedom approximately equal to the number of groups minus two. The normalised unweighted sum of squares (USOS) test also assesses model adequacy and is based on a statistic originally proposed by Copas. In this talk the Hosmer-Lemeshow and USOS statistic are compared in binary regression models with the complementary log-log regression, and we describe the usos command that calculates the statistic.
See full presentation

Bill Rising, StataCorp

Dynamic documents in Stata: Many routes to the same goal

Do you suffer from the tedium of moving statistical results by hand from Stata into your research documents or reports? Have you ever had the nightmare of updating a document because of changes to your analysis only to find that you missed some results? Have you ever dreamed of automating production of otherwise brain-numbing standardized reports? If so, you need dynamic documents. Dynamic documents get their name from their ability to update their statistical results when they are created, ensuring complete reproducibility and minimal maintenance. In the world of Stata, there are quite a few user-written packages for creating dynamic documents, both from within Stata and from within other applications which call back to Stata. In this talk, I’ll briefly demonstrate a few different packages, each with their own strengths. You can then choose your package, get more done, and sleep more easily at night. Additional data files available here.
See full presentation

Philip Morrison, Victoria University of Wellington, Joanna Sikora, Australian National University, and Bill Rising, StataCorp

Teaching Stata to students taking their first steps in methodological training at the university level

In this seminar we talk about the challenges of teaching Stata to students from non-science (and science) backgrounds who are taking first steps in their methodological training at the university level. We are newcomers to Stata as a teaching tool, although we have used it for years for our research. In social sciences, such as sociology or criminology, a typical introductory course covers the rudiments of statistical theory and analytical methods ranging from cross-tabulations, through Pearson Product Moment correlations to Ordinary Least Square regressions. Stata offers simple command language to execute analyses needed to generate the relevant tables, but the output for these procedures is not easy to control in Stata. More advanced users of Stata employ user-written procedures such as tabout or estout to produce publication quality tables. However, for our students these procedures are too complex to use, or so we believe at the moment, having perused the standard documentation and examples for these procedures. We would like to start a conversation about best ways of creating publication quality tables easily, using Stata output. In our experience even the standard “right-click” and COPY TABLE solution often does not work in practice as it should in theory. We start the conversation by showing three examples of tables we need to easily generate in Stata.
See full presentation

Ian Watson, Macquarie University

Publication quality tables in Stata using tabout

Ian Watson presents a comprehensive overview of his tabout module, a Stata ado program for the batch production of publication quality tables. He explains the philosophy behind the program, touching on issues of aesthetics, functionality and reproducible research. Ian demonstrates the use of tabout to show how easy it is to produce publication quality multidimensional tables in a number of different formats and styles. tabout does not cover estimation tables. Extending tabout by incorporating more advanced Stata features, such as macros and loops, is also explained and Stata users are encouraged to extend their skills in this area. In the final part of the presentation Ian will provide an overview of some forthcoming changes in tabout. These incorporate a number of new advanced features, as well as some long overdue enhancements, such as removing unwanted columns. Many of these new features are designed to make tabout more efficient and flexible. These include the use of configuration files, where users can save customised sets of tabout options in files which can be loaded when tabout runs. Better integration with word processors, such as Microsoft Word, will also be incorporated into the new version of tabout. This will allow users to streamline their exporting of tabout output into their word processor and prescribe the formatting of that output. While word processors will never be as versatile as LaTeX, some of the efficiencies of the latter can be realised within a word processor environment and Ian’s presentation of the new version of tabout will illustrate this. Ian will conclude by inviting existing users of tabout to provide feedback on their use of the program and to suggest enhancements they would like to see in future versions.
See full presentation