
Myscatter.replot(mydata > 0.5, color="red", s=0.5) # Plot in red, with smaller size, all the points that For example: mydata = df.dropna(how="any", subset=) what if you wanted to automatically plot the labels of the points that meet a certain cutoff on col1, col2 alongside them (where the labels are stored in another column of the df), or color these points differently, like people do with dataframes in R. Similarly, imagine that you wanted to filter or color each point differently depending on the values of some of its columns. # plot a scatter of col1 by col2, with sizes according to col3 Is there a way to plot while preserving the dataframe? For example: mydata = df.dropna(how="any", subset=)

#Scatter plot matplotlib dataframe full
The problem with converting everything to array before plotting is that it forces you to break out of dataframes.Ĭonsider these two use cases where having the full dataframe is essential to plotting:įor example, what if you wanted to now look at all the values of col3 for the corresponding values that you plotted in the call to scatter, and color each point (or size) it by that value? You'd have to go back, pull out the non-na values of col1,col2 and check what their corresponding values.

# and drop na rows if any of the columns are NA What is the best way to make a series of scatter plots using matplotlib from a pandas dataframe in Python?įor example, if I have a dataframe df that has some columns of interest, I find myself typically converting everything to arrays: import matplotlib.pylab as plt
