In preparation for tomorrow, Election Day in the United States, I took the time to learn how to build a dashboard in Tableau. Having only used Seaborn/Matplotlib libraries for my previous data analyses, I thought the best way to learn was to jump in the deep end.

Why North Carolina? North Carolina is a swing state, and in addition to the Presidential election, has competitive House races in their 8th, 9th, and 11th congressional districts, where once safe House Republicans find themselves suddenly in competitive races following the redistricting of Congressional Districts in 2019.

Additionally, North Carolina is one of the most transparent states when it comes to public election data with multiple CSV files available with both county and congressional districts (the hardest data to find as they are subject to change with redistricting which often contain split counties). …

A Quick Guide to Upgrade Your Data Visualizations using Seaborn and Matplotlib

“If I see one more basic blue bar plot…”

After completing the first module in my studies at Flatiron School NYC, I started playing with plot customizations and design using Seaborn and Matplotlib. Much like doodling during class, I started coding other styled plots in our jupyter notebooks.

After reading this article, you’re expected to have at least one quick styled plot code in mind for every notebook.

No more default, store brand, basic plots, please!

If you can do nothing else, use Seaborn.

You have five seconds to make a decent looking plot or the world will implode; use Seaborn!

Seaborn, which is build using Matplotlib can be an instant design upgrade. It automatically assigns the labels from your x and y values and a default color scheme that’s less… basic. ( — IMO: it rewards good, clear, well formatted column labeling and through data cleaning) Matplotlib does not do this automatically, but also does not ask for x and y to be defined at all times depending on what you are looking to plot. …

Image for post
Image for post
Source

Once upon a time, in a Pandas DataFrame far far away, lived observant little objects, interesting integers, and scintillating strings that lived together in perfect harmony. But one day, the Jupyter notebook grew dark and cold as they realized many of their friends were missing. In their place they found wild and ominous Nan and null values creeping into their perfect kingdom. King Pandas was outraged! At once all the king’s cursors and all the king’s modifiers were dispatched to root out the null values with their frighteningly powerful swords of .isna() and their commander, the wielder of domain knowledge, Sir Np.Where. Fighting pd.to_datetime, AM & PM, they won their exhaustive fight, imputing new values and shedding light to fill in the lurking null values. The ‘Cleaning War’ was over, and the DataFrame was once again whole. In the days after, lovingly called the Days of EDA, features started coming out to interact again; some in linear ways and other’s relationships more shown by their growing positive coefficients and perfectly modeled classifications. …

a Small Intro to ML Project

Image for post
Image for post
Zebra, Botlierskop Game Reserve, South Africa, Image Source copyright Dolci Key Photography 2020

As I am currently enrolled in Flatiron NYC’s Data Science intensive, I prepared a project for the end of the second module. It was a brutal module with more information to process than a 15 hour credit year of college neatly packaged into 1.5 weeks. The module focuses primarily on statistics, statistical testing, and introduces machine learning with linear regression closer to the end, the cherry on top.

For this solo project, we were asked to find data sets of over 10,000 rows with over 10 features. Good free data is hard to find. After many exciting sets were not meeting the criteria needed (no time series, 10 features, etc), I settled for using Inside AirBNB, I used a data set from Cape Town, South Africa as I have some domain knowlege on what the tourism is like there. …

Image for post
Image for post
Statistics is a crucial tool for data scientists. Image Source

In data science, one of the most important tools we use is statistical testing and hypothesis testing. Generally, we want to know if one measure we have is has a statistically significant effect on our target, and if it is, maybe we want to use that feature in a machine learning model. Statistical testing is an important tool we can use for feature selection, especially if we have many features and are unsure what really matters to our target variable (variable of interest/what we are looking to predict).

Before we do the statistical tests themselves, we need to know what we are testing. This is where the hypothesis testing set up comes in to play. Sometimes this can be very intuitive, but sometimes not so much. Espcially in a bootcamp environment where you are learning all of these in a day or two. …

Image for post
Image for post
Invisible Women by Caroline Criado Perez, one of the many sources mentioned in this article, image from Amazon.com

A few weekends ago, I watched Hamilton with the original cast — the first time I was lucky enough to see the Broadway musical by Lin-Manuel Miranda. In what has become one of the most popular numbers from the show, the character Aaron Burr sings “No one else was in the room where it happened,” referring to a historic meeting known as the Compromise of 1790. This ‘Dinner Table Bargain’ between three founding fathers exchanged the passage of the Assumption Act for the passage of the Residence Act, responsible for moving the US capital from New York City to Washington D.C. …

About

dolcikey

Curve model and Data Scientist based in New York City

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store