Sorter¶
-
class
bonobo_trans.sorter.
Sorter
(*args, **kwargs)¶ The Sorter transformation sorts rows and can de-duplicate data.
Description of the options:
- keys_sort
The
sort_keys
option is a dictionary where the keys refer to the keys in the incoming row. The direction indicates an ascending or descending sort.Direction can be one of the following:
- ‘ASC’, ‘ASCENDING’, True, 1
- ‘DESC’, ‘DESCENDING’, False, any number except 1
Example:
{'year':'ASC', 'month':'DESC', 'day':'ASC'}
- name
- Name of the transformation. Mainly used for identification in logging.
- distinct, keys_dedup
The sorter transformation allows for removal of duplicate rows. There are different strategies to choose from:
distinct
Description SRT_DUP_KEEP Don’t remove duplicates SRT_DUP_DISTINCT_ROW Remove identical rows SRT_DUP_KEY_FIRST Remove duplicate key, keep first SRT_DUP_KEY_LAST Remove duplicate key, keep last By default duplicates are not removed (SRT_DUP_KEEP).
SRT_DUP_DISTINCT_ROW
Remove identical rows. This is similar to the SQL “DISTINCT” keyword. This setting will remove rows in which all rows are similar.
SRT_DUP_KEY_FIRST, SRT_DUP_KEY_LAST
Remove rows that have duplicate keys. This behaviour is more akin to an aggregator’s FIRST and LAST-functions. It will remove rows with an identical key. You can specify to keep the first or last row.
You can specify the de-duplication key as a subset of the sort key using then
keys_dedup
-option. It accepts a list of keys (str). If you don’t specify a ‘keys_dedup’ the first row will be kept, but this will give you less control and security as it will depend on how the rows enter this transformation.Example:
'distinct' = SRT_DUP_KEY_FIRST 'keys_sort' = {'year':'ASC', 'month':'ASC', 'day':'ASC'} 'keys_dedup' = ['year', 'month'] Input rows: 2019,02,15,'Friday' 2019,02,16,'Saturday' 2019,02,17,'Sunday' Output rows: 2019,02,15,'Friday'
- case_sensitive
- TODO!
- null_is_last
- This option will determine if the None/Null will be on top or on bottom of the sorted output. By default it’s True and the None value will be on the bottom.
- ToDo:
- [Q] (How) could we create a Deduplicator Class transformation as subclass of the sorter? Would that be nice?
- Args:
- d_row_in (dict)
d_row_in is a dictonary containing row data.
- Returns:
- d_row_out (dict)
d_row_out contains all the keys of the incoming dictionary without any changes or additions.
Only the order of the rows will change.
Parameters: - keys_sort (dict) –
- name (str) –
- distinct (int) –
- keys_dedup (list) –
- case_sensitive (bool) –
- null_is_last (bool) –