Find & fill missing values in R | Python

R

Load a dataset into R. Note that budget has some NAs, while mpaa is sometimes blank.

How many missing values are in each column?

Note that the blanks are not counted, for that we could do:

Fill all NA (in any column) values with 0

Fill mpaa with unknown:

Python

Load dataset into Python:

How many missing values are in each column? Note this counts the missing values in budget and mpaa:

Fill the missing values with 0. Note that this fills both budget and mpaa:

 

 

Dealing with “ValueError: feature_names mismatch” using XGBoost in Python

Possible causes for this error:

  1. The test data set has new values in the categorical variables, which become new columns and these columns did not exist in your training set
  2. The test data set does not have some values in the categorical variables, and is therefore missing columns
  3. The test and train sets have the same variables but they are not in a matching order

The following example throws:

ValueError: feature_names mismatch: […] expected … in input data

training data did not have the following fields: …

We can fix it by deleting the columns in the test set which did not exist in the training set:

And by adding columns to the test set which exist in the training set but are missing from test:

Finally we need to make sure that the columns are in the same order in both sets:

Adding these in the appropriate places gives us a working script:

 

Make a histogram with Python | R

Make some data in R:

A quick and dirty histogram in R:

A quick and slightly less dirty histogram in R:

Make some data in Python:

A quick and dirty histogram in Python:

A quick and slightly less dirty histogram in python:

What is the best code syntax highlighter for WordPress?

After an exhaustive search (25 minutes of googling and installation of 2 WordPress plugins), the winner for best code syntax highlighter is … Crayon.  I gave a shot at using WP Code Prettify however it did not work (possibly because it is not compatible with the most current version of WordPress). Crayon looks good enough for the time being:

Python:

more Python:

How does R code look?

How does SQL look?