Rule 1: Include a properly scaled y-axis.
This week I used some natural language processing techniques and built a web app for exploring the tweets of the U.S. presidential candidates, or, as I call them, candidential tweets.
After scraping data on over 4000 films from IMDB, it was time to start making predictions. I set out to predict the quality of films. I wanted to know what attributes of a film were predictive of quality. Are some film studios better than others? Do studios release their best films during certain times during the year?
For my second data science project at Metis, I used web scraping to gather data on over 4000 films released in the past 15 years. BeautifulSoup is a great python package for web scraping which parses the html of a website and makes the elements on the page selectable so that they can be stored as data. A well designed website is generally easy to scrape because it includes many classes, ids and metadata in its html which can be used as selectors. I scraped my data from The Internet Movie Database, aka IMDB.
Working with data can be messy. Very messy. This week at Metis we worked on a project using MTA turnstile data. The bulk of my time was spent processing, cleaning, and reshaping the data to see what sorts of stories it could tell.
Yes, that's a typo in the title. Hello, wrold! I'm Emily Schuch, currently studying data science at Metis, an intensive twelve week bootcamp program in NYC.