![]() ![]() recreate a new (shuffled) pandas df from the shuffled np.array.apply the method shown below to shuffle the np.array by row or column.get the values of the dataframe with values = df.values,.If you panda data frame is named df, maybe you can: If that is acceptable then this would be helpful, note it is easy to switch the axis along which the data is shuffled. I know the question is for a pandas df but in the case the shuffle occurs by row (column order changed, row order unchanged), then the columns names do not matter anymore and it could be interesting to use an np.array instead, then np.apply_along_axis() will be what you are looking for. Your final function then uses a trick to bring the result in line with the expectation for applying a function to an axis: def shuffle(df, n=1, axis=0):Īxis = int(not axis) # pandas.DataFrame is always 2Dįor view in numpy.rollaxis(df.values, axis): Out: (2, 10) # we can iterate over 2 arrays with shape (10,) (columns) Out: (10, 2) # we can iterate over 10 arrays with shape (2,) (rows) ![]() Note that numpy.rollaxis brings the specified axis to the first dimension and then let's us iterate over arrays with the remaining dimensions, i.e., if we want to shuffle along the first dimension (columns), we need to roll the second dimension to the front, so that we apply the shuffling to views over the first dimension. In : %timeit df.apply(, axis=1)įor view in numpy.rollaxis(df.values, 0): : for view in numpy.rollaxis(df.values, 1): Shuffled_df.apply(np.random.shuffle(shuffled_df.values),axis=axis)ĭf = pandas.DataFrame() This does not work for me: def shuffle(df, n, axis=0): Something like: for 1.n:īut hopefully more efficient than naive looping. So if you have two columns a and b, I want each row shuffled on its own, so that you don't have the same associations between a and b as you do if you just re-order each row as a whole. When I say shuffle the rows, I mean shuffle each row independently. I want the resulting df to be the same as the original except with the order of rows or order of columns different.Įdit2: My question was unclear. If you just shuffle df.index that loses all that information. how to write a function shuffle(df, n, axis=0) that takes a dataframe, a number of shuffles n, and an axis ( axis=0 is rows, axis=1 is columns) and returns a copy of the dataframe that has been shuffled n times.Įdit: key is to do this without destroying the row/column labels of the dataframe. What's a simple and efficient way to shuffle a dataframe in pandas, by rows or by columns? I.e. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |