Aggregate Dialect Difference

Some preliminary results from my final project for Applied Spatial Statistics, taught by Brian Reich.

Starting with the point-referenced data from each of the 122 questions in the Harvard Dialect Survey, by Bert Vaux and Scott Golder, we used a k-nearest neighbor smoothing algorithm to estimate the probability of seeing a particular answer—eg, whether a person would say soda, pop, or coke—at every point in the continental US.

For a particular question, we can quantify the difference in dialect between two locations as one minus the overlap in each category. Summing these per-question differences then gives a rough measure of the aggregate dialect difference, which is plotted in the map at right.

Note: The “most similar” and “least similar” cities are limited to those with a population of at least 200,000. (City data from R:maps.) Other dialect maps and further details regarding the model's construction can be found in the accompanying poster.

All coding was done in R / Shiny.

NEW! What kind of dialect do you have? Take the survey and find out.

Joshua Katz

Dept. of Statistics

NC State University