USEFUL STATA COMMANDS: inspect
By Andre Lee (Georgetown MIDP 18)
On a recent project, the client wanted an idea of the skew of each of a large number of variables. The data originated from a satisfaction survey (1=very dissatisfied; 5=very satisfied). On our Excel presentation sheet, we were to choose from the following options to describe the population’s view regarding each variable: right-skewed (generally very dissatisfied), left-skewed (generally very satisfied), U-shaped (most were either very dissatisfied or very satisfied, with few being neutral), or normal-shaped (most were neutral, with few being either very dissatisfied or very satisfied).
Our first gut reaction was to graph the density and distribution of the data into a histogram or box plot. Cue the lines of coding and looping to generate a folder full of .png files that we subsequently had to click through and match to our Excel presentation sheet.
In retrospect, a much simpler way to solve this problem was to use the “inspect” command. The “inspect” command in Stata provides detailed information on the numeric variables, which makes it great for a quick observation of Likert-scaled data. It provides negative, positive, zero, missing, unique values, integer and non-integer values, and a histogram of the variables. A caveat: “inspect” is not a replacement or substitute for “summarize” and “tabulate”—its purpose is not analytical but it does allow you to quickly gain familiarity with unknown data and gain insight into the values stored in a variable.
Observe, for example, the histograms that are quickly generated by inspecting the variables of price, mpg, headroom, and trunk:
All histograms are readily available in one output, and the need for opening a slew of .png files is alleviated.
In addition, the “inspect” function also allows a cursory check for when something may be wrong with the dataset:
The last line resulting from the above “inspect” function warns us that something may be wrong with this dataset.
Using the “tab” function confirms this suspicion:
The region variable takes on five unique values (1-5). The variable has a value label (North East, North Central, South, and West) but one of the observed values is not documented in the label and is marked only with a “5”. Perhaps there is a typographical error.
In this hypothetical example provided by the STATA manual, the census data handler was notified of the error, and he fixes it:
He returns the data to us, and the problem is now solved:
In conclusion, the “inspect” function is great for consolidating large numbers of histograms looking for skew and for conducting a cursory check for potential data errors.
Reference:
StataCorp. 2013. Stata Statistical Software: Release 13. College Station, TX: StataCorp LP.
Retrieved June 21, 2017, from http://www.stata.com/manuals13/dinspect.pdf#dinspect
*Stata commands in manual recreated in Stata 14.2 and additional commands were created to add clarity.