In a previous post I analyzed the Social Security Administration’s (SSA) name dataset to determine what infomation could be obtained from just a person’s first name. It was interesting to discover examples of how pop culture and historical events shaped the evolution of particular names. In this follow-up I expanded that post into an interactive application. Type in a name to get started. The four panels illustrate how the submitted name’s popularity compares over time, geographic region, and to other names. Tap on the help button for more information.
How does it work?
The application combines three datasets:
- The original name dataset referenced in the previous blog post. This data lists the number of individuals born with any given name, for the years 1910-2020.
- An expanded dataset that breaks down the first dataset by state.
- US actuarial tables, from which I derived survival curves. This let me compute the number of individuals alive with a given name from any birth year (for the 'Age' panel)
One interesting outcome we can derive is the most prominent name for each state. If we were to look at the most popular names for each state, they would likely all be similar (in recent years: ‘Noah’ and ‘Emma’). In contrast, prominence looks at how unique a name is for a given state, irrespective of its overall popularity. You can think of a state’s most prominent name as the one that best distinguishes it from the rest of the country.
Most Prominent Name (Per State)
Some of these promiment names relate to a state’s natural environment, such as ‘Aspen’ in Colorado, or ‘Orion’ and ‘Aurora’ in Alaska. Others prominences are associated with a state’s demographics: New York has ‘Chaim’ and ‘Chaya’ oweing to their large Orthodox Jewish population. Similarly, Michigan (‘Hassan’) and Texas (‘Santos’) have names corresponding to their Muslim and Hispanic populations, respectively.
- Only the top 1000 male and female names (as measured from 2000-2020) are considered due to performance constraints. As this website is being served statically, keeping download sizes small were a challenge in developing this application. For comparison, there are approximately 250,000 unique names in the name database, half of which have fewer than five individuals.
- As the names come from the SSA, the dataset only covers names of individuals born in the US
- For privacy reasons, any names with fewer than five individuals in a given year are omitted. This means that if you have a particularly obscure name, you might not see any births registered in your birth year (assuming it's in the dataset at all)
Want to learn more? Join our meetup group! Bethesda Data Science Meetup