Source

class bonobo_trans.source.DbSource(*args, **kwargs)

The DbSource transformation extracts data from a SQL database.

Note

Database connectivity is provided by SQLAlchemy. The engine must be provided via a Bonobo Service, containing an ‘sqlalchemy.engine’.

Tip

From a Bonobo point of view this transformation is not a true “extractor”, the tranformation allows (but does not require) to pass one(!) row to it. The keys of this rows will be appended to all outgoing rows.

Configuration options

Required:

  • table_name (str)

Optional:

  • sql_select (str or sqlalchemy.sql)
  • sql_pre (str or sqlalchemy.sql)
  • sql_filter (str or sqlalchemy.where)
  • bp_select (dict)
  • bp_pre_sql (dict)
  • distinct (bool)
  • ordered_cols (int or list of str)
  • verbose_sql (bool), default: False
  • verbose_data (bool), default: False
  • streaming (bool), default: False
  • row_counters (bool), default: True
  • keep_alive (bool), default: False

Description of the selection options:

sql_select, sql_pre
Either provide a ‘raw’ SQL statement or a SQLAlchemy sql object. All columns and records will be selected if no selection is provided.
sql_filter

Source filters can be specified in the sql_select parameter and/or in the sql_filter parameter. (Additional) filters specified in the sql_filter will be appended to the where clause (AND?).

Important

It is not possible to specify an sql_filter when the sql_select contains an ORDER BY clause!

distinct
When True, applies the SQL distinct to source query. Not available if the sql_select is a plain string.
ordered_cols

An ORDER BY should ideally be specified via the ordered_cols option rather than via an “ORDER BY”-clause in the sql_select option. The ordered_cols parameter can take any of the following formats:

  • a list of columns (str)
  • a list of column positions (int)
  • an integer specifying how many columns are sorted, eg. “3”, would order by the first three columns.

Important

When using the ordered_cols, you cannot already have an order by clause in the sql_select.

bp_select, bp_pre_sql

Bind parameters for the selection and pre-sql statements.

Attention

Bind parameters must be provided using a dictionary. The format for this is: {'parameter': 'value'}

The name of the parameter depends on the your wether you’re using a ‘raw’ SQL statement or an SQLAlchemy.sql object.

Raw SQL:

For ‘raw’ SQL, bound parameters are specified by name. For example:

sql_select = 'SELECT * FROM users WHERE id=:user_id'
bp_select = { 'user_id':555 }
SQLAlchemy:

For an sqlalchemy.sql object, bound parameters can be specified in various different manners. Make sure the key in the dictionary matches. For example:

sql_select = select([users]).where(users.c.id == bindparam('user_id'))
bp_select = { 'user_id':555 }
name
Name of the transformation, for identification and logging purposes.
verbose_sql
If ‘verbose_sql’ is True, the generated SQL statement will be printed.
verbose_data
If ‘verbose_data’ is True, every outgoing row is printed.
streaming
If ‘streaming’ is False, the query is executed and fetched. Only after it is received completely will it start ‘yielding’ the data. This enables use of the __ROW_NR_TOTAL__ field. If this setting is True, a total row count will not be available. Set to True for large datasets. Set to False to use the total row count, or to avoid reading and writing the same database table.
row_counters
If ‘row_counters’ is True, the additional keys __ROW_NR__ and __ROW_NR_TOTAL__ will be added to the outgoing row.
keep_alive
If ‘keep_alive’ is True it will keep the connection open. This might block access on single user databases like SQLite.
Args:
  • d_row_in (dict)
Returns:
  • d_row_out (dict)

d_row_out is a dictionary containing all the keys of the incoming dictionary plus keys (columns) of the selected data. Any keys in the incoming record that are identical to selected columns are overwritten.

Additionally the following special keys may be added:

  • __ROW_NR__ (int) Row counter.
  • __ROW_NR_TOTAL__ (int) Total number of selected rows.
  • __ORDERED__ (bool) Data is sorted (not implemented yet).
Parameters:
  • engine (str) –
  • table_name (str) –
  • sql_select
  • sql_pre
  • sql_filter
  • bp_select (dict) –
  • bp_pre_sql (dict) –
  • distinct (bool) –
  • ordered_cols
  • name (str) –
  • verbose_sql (bool) –
  • verbose_data (bool) –
  • streaming (bool) –
  • row_counters (bool) –
  • keep_alive (bool) –