Should group be an optional input to e.g. mean?
There are currently 3 group functions: group_ranking, group_mean, group_median. But there are many more we could add: group_zscore, group_sum, group_std, etc.
Instead of filling up the namespace, would it be better to get rid of the group methods and instead add group as an optional input to the corresponding non-group methods? For example, we could get rid of group_mean and the signature for mean could become:
larry.mean(self, axis=None, group=None)
One implementation detail is that the group methods do not currently allow an axis argument. That would have to be fixed.
Is this design better or worse?
Another possibility is to add group_reduce as an optional input. For example if y is N by T and there are G groups, then
larry.mean(self, axis=None, group=group, group_reduce=True)
would be G by T.
Groups are not a central feature of larry. So I don't know how I feel about cluttering up the signatures of a lot of larry methods with group options.
Another design alternative: Remove all group methods from larry. Convert the group methods to functions and place them in a module called group. So
la.group.mean(lar, group, axis=None, group_reduce=False)
see also: extend-
Blueprint information
- Status:
- Not started
- Approver:
- None
- Priority:
- Undefined
- Drafter:
- None
- Direction:
- Needs approval
- Assignee:
- None
- Definition:
- New
- Series goal:
- None
- Implementation:
- Unknown
- Milestone target:
- None
- Started by
- Completed by
Related branches
Related bugs
Sprints
Whiteboard
mean, demean, ... don't look so complicated that an additional keyword would be confusing
some statistical packages (which ?) have a "by" keyword, or groupby
eg.
lar.demean(axis=1, groupby=None)
lar.demean(axis=1, groupby=
default groupby=None means no groups
groupby larray is checked for dimension, 1d or same as the larry (for other blueprint on enhancement for changing group membership)
--------
A generic function like
apply_along_
might help. The attr in your example would be 'demean'. Or
apply_along_
We could then use apply_along_groups for all methods that only take an axis. We could add a **kwargs input to pass to lar.method for methods, like movingsum, that take more input than just axis.
Would that work?
---
I think this should work and would be very flexible. Do you want it as substitute for a groupby keyword, or for the generic implementation?
I like it if the most common groupby methods are available directly e.g. mean, demean, maybe sum, count
I expect that some of the implementations for specific groupby methods can be made faster than a generic function.