I have a df that looks like this: time val 0 1 1 1 2 2 3 3 4 1 5 2 How do I create new columns that hold the cumulative sum of occurance of a condition? In this case, I want to create a column for each unique value in val that holds ..
I am doing a feature selection for a linear regression model. I have the training data and the test data in seperate dataframes. The training data frame ends at the end of january and the test data frame at the end of february. The goal is to select a promising set of features to train ..
I have a dataset with some missing data. I would like to maintain the missingness within the data while performing pd.get_dummies(). Here is an example dataset: Table 1. someCol A B NA C D I would expect pd.get_dummies(df, dummy_na=True)) to transform the data into something like this: Table 2. someCol_A someCol_B someCol_NA someCol_C someCol_D 1 ..
I am trying to do a Multi level Grouping with end result in List and Dictionary building. So for example if my Dataframe looks like this blow X Y Z AAAA BBBB CCCCC AAAA BBBB DDDDD AAAA BBBB EEEEE FFFF GGGG HHHHH FFFF GGGG IIIII JJJJ KKKK LLLLL I am trying to merge Y and ..
I would like to read just csv last 7 days createds csv files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I have so far: from datetime import datetime, timedelta import pandas as pd import glob fileday = ..
I have two columns separated in my dataframe, one have month and the other have the year when the store received a competitor. What I am trying to do is join those columns and then subtract from the date to get the values day by day. But running my code that I leave below, I ..
I have multiple files containing unique ids. How do I write a program in python utilizing regex that will read the files, identify the unique ids and store them as ‘fileName’ and ‘ID’ in a separate variable/dataframe that will be outputed as a csv file? Source: Python..
I have following code that prints the output on console, But along with follower and friend account name, I need the screenname of the user account. Following code works well for one screen_name provided I need to collect follower and following account for multiple users along with screenname of user. for user in tweepy.Cursor(api.followers, screen_name="Billgates").items(): ..
I have a dataframe: event job_id_num JOB_START 12345 — — JOB_END 12345 RETOOL 99999 JOB_START 12346 — — JOB_END 12346 Between the JOB_START and JOB_END events there can be x number of rows depending on what job steps take place for a given job. The JOB_START and END events are marked with the job_id_number, but ..
my graph couldn’t change. when I try to use y_sel and x_sel in the plot it throws error. sometimes it says value error ‘Countries’ and some time too many inputs for one value something like that. ydata = online_act_major.loc[ : , [‘socionet’, ‘info_good_serv’,’banking’,’selling’,’news’,’learn_act’] ] xdata = online_act_major.index.values # gca stands for ‘get current axis’ dropdown ..