Python API Reference¶

This page gives the Python API reference of xgboost, please also refer to Python Package Introduction for more information about python package.

Core Data Structure
Learning API

Core Data Structure¶

Core XGBoost Library.

class securexgboost.DMatrix(data, encrypted=False, label=None, missing=None, weight=None, silent=False, feature_names=None, feature_types=None, nthread=None)¶

Bases: object

Data Matrix used in XGBoost.

DMatrix is a internal data structure that used by XGBoost which is optimized for both memory efficiency and training speed. You can construct DMatrix from numpy.arrays

Parameters:

data (string/numpy.array/scipy.sparse/pd.DataFrame/dt.Frame) – Data source of DMatrix. When data is string type, it represents the path libsvm format txt file, or binary file that xgboost can read from.
label (list or numpy 1-D array, optional) – Label of the training data.
missing (float, optional) – Value in the data which needs to be present as a missing value. If None, defaults to np.nan.
weight (list or numpy 1-D array , optional) –
Weight for each instance.

Note

For ranking task, weights are per-group.

In ranking task, one weight is assigned to each group (not each data point). This is because we only care about the relative ordering of data points within each group, so it doesn’t make sense to assign weights to individual data points.
silent (boolean, optional) – Whether print messages during construction
feature_names (list, optional) – Set names for features.
feature_types (list, optional) – Set types for features.
nthread (integer, optional) – Number of threads to use for loading data from numpy array. If -1, uses maximum threads available on the system.

feature_names¶

Get feature names (column labels).

Returns:	feature_names
Return type:	list or None

feature_types¶

Get feature types (column types).

Returns:	feature_types
Return type:	list or None

num_col()¶

Get the number of columns (features) in the DMatrix.

Returns:	number of columns
Return type:	int

num_row()¶

Get the number of rows in the DMatrix.

Returns:	number of rows
Return type:	int

class securexgboost.Booster(params=None, cache=(), model_file=None)¶

Bases: object

A Booster of XGBoost.

Booster is the model of xgboost, that contains low level routines for training, prediction and evaluation.

Parameters:	params (dict) – Parameters for boosters. cache (list) – List of cache items. model_file (string) – Path to the model file.

eval(data, name='eval', iteration=0)¶

Evaluate the model on mat.

Parameters:	data (DMatrix) – The dmatrix storing the input. name (str, optional) – The name of the dataset. iteration (int, optional) – The current iteration number.
Returns:	result – Evaluation result string.
Return type:	str

eval_set(evals, iteration=0, feval=None)¶

Evaluate a set of data.

Parameters:	evals (list of tuples (DMatrix, string)) – List of items to be evaluated. iteration (int) – Current iteration. feval (function) – Custom evaluation function.
Returns:	result – Evaluation result string.
Return type:	str

predict(data, output_margin=False, ntree_limit=0, pred_leaf=False, pred_contribs=False, approx_contribs=False, pred_interactions=False, validate_features=True)¶

Predict with data.

Note

This function is not thread safe.

For each booster object, predict can only be called from one thread. If you want to run prediction using multiple thread, call bst.copy() to make copies of model object and then call predict().

Note

Using predict() with DART booster

If the booster object is DART type, predict() will perform dropouts, i.e. only some of the trees will be evaluated. This will produce incorrect results if data is not the training data. To obtain correct results on test sets, set ntree_limit to a nonzero value, e.g.

preds = bst.predict(dtest, ntree_limit=num_round)

Parameters:

data (DMatrix) – The dmatrix storing the input.
output_margin (bool) – Whether to output the raw untransformed margin value.
ntree_limit (int) – Limit number of trees in the prediction; defaults to 0 (use all trees).
pred_leaf (bool) – When this option is on, the output will be a matrix of (nsample, ntrees) with each record indicating the predicted leaf index of each sample in each tree. Note that the leaf index of a tree is unique per tree, so you may find leaf 1 in both tree 1 and tree 0.
pred_contribs (bool) – When this is True the output will be a matrix of size (nsample, nfeats + 1) with each record indicating the feature contributions (SHAP values) for that prediction. The sum of all feature contributions is equal to the raw untransformed margin value of the prediction. Note the final column is the bias term.
approx_contribs (bool) – Approximate the contributions of each feature
pred_interactions (bool) – When this is True the output will be a matrix of size (nsample, nfeats + 1, nfeats + 1) indicating the SHAP interaction values for each pair of features. The sum of each row (or column) of the interaction values equals the corresponding SHAP value (from pred_contribs), and the sum of the entire matrix equals the raw untransformed margin value of the prediction. Note the last row and column correspond to the bias term.
validate_features (bool) – When this is True, validate that the Booster’s and data’s feature_names are identical. Otherwise, it is assumed that the feature_names are the same.

Returns:

prediction (numpy array)
num_preds (number of predictions)

set_param(params, value=None)¶

Set parameters into the Booster.

Parameters:	params (dict/list/str) – list of key,value pairs, dict of key to value or simply str key value (optional) – value of the specified parameter, when params is str key

update(dtrain, iteration, fobj=None)¶

Update for one iteration, with objective function calculated internally. This function should not be called directly by users.

Parameters:	dtrain (DMatrix) – Training data. iteration (int) – Current iteration number. fobj (function) – Customized objective function.

class securexgboost.Enclave(enclave_image=None, flags=3, create_enclave=True, log_verbosity=0)¶

Bases: object

An enclave.

A trusted execution environment used for secure XGBoost.

get_remote_report_with_pubkey()¶: Get remote attestation report and public key of enclave

get_report_attrs()¶

Get the enclave public key and remote report

To be called by the RPC service

Must be called after get_remote_report_with_pubkey() is called

Returns:	pem_key (proto.NDArray) key_size (int) remote_report (proto.NDArray) remote_report_size (int)

set_report_attrs(pem_key, key_size, remote_report, remote_report_size)¶

Set the enclave public key and remote report

To be used by RPC client during verification

verify_remote_report_and_set_pubkey()¶: Verify the received attestation report and set the public key

class securexgboost.CryptoUtils¶

Bases: object

Crypto utils class

add_client_key(data, data_len, signature, sig_len)¶

Add client symmetric key used to encrypt file fname

Parameters:	data (proto.NDArray) – key used to encrypt client files data_len (int) – length of data signature (proto.NDArray) – signature over data, signed with client private key sig_len (int) – length of signature

decrypt_predictions(key, encrypted_preds, num_preds)¶

Decrypt encrypted predictions

Parameters:	key (byte array) – key used to encrypt client files encrypted_preds (c_char_p) – encrypted predictions num_preds (int) – number of predictions
Returns:	preds – plaintext predictions
Return type:	numpy array

encrypt_data_with_pk(data, data_len, pem_key, key_size)¶

Parameters:

data (byte array) –
data_len (int) –
pem_key (proto) –
key_size (int) –

Returns:

encrypted_data (proto.NDArray)
encrypted_data_size_as_int (int)

encrypt_file(input_file, output_file, key_file)¶

Encrypt a file

Parameters:	input_file (str) – path to file to be encrypted output_file (str) – path to which encrypted file will be saved key_file (str) – path to key used to encrypt file

generate_client_key(path_to_key)¶

Generate a new key and save it to path_to_key

Parameters:	path_to_key (str) – path to which key will be saved

sign_data(keyfile, data, data_size)¶

Parameters:

keyfile (str) –
data (proto.NDArray) –
data_size (int) –

Returns:

signature (proto.NDArray)
sig_len_as_int (int)

Learning API¶

Training Library containing training routines.

securexgboost.train(params, dtrain, num_boost_round=10, evals=(), early_stopping_rounds=None, evals_result=None, verbose_eval=True, callbacks=None, learning_rates=None)¶

Train a booster with given parameters.

Parameters:	params (dict) – Booster params. dtrain (DMatrix) – Data to be trained. num_boost_round (int) – Number of boosting iterations. evals (list of pairs (DMatrix, string)) – List of items to be evaluated during training, this allows user to watch performance on the validation set. early_stopping_rounds (int) – Activates early stopping. Validation error needs to decrease at least every early_stopping_rounds round(s) to continue training. Requires at least one item in evals. If there’s more than one, will use the last. Returns the model from the last iteration (not the best one). If early stopping occurs, the model will have three additional fields: `bst.best_score`, `bst.best_iteration` and `bst.best_ntree_limit`. (Use `bst.best_ntree_limit` to get the correct value if `num_parallel_tree` and/or `num_class` appears in the parameters) evals_result (dict) – This dictionary stores the evaluation results of all the items in watchlist. Example: with a watchlist containing `[(dtest,'eval'), (dtrain,'train')]` and a parameter containing `('eval_metric': 'logloss')`, the evals_result returns {'train': {'logloss': ['0.48253', '0.35953']}, 'eval': {'logloss': ['0.480385', '0.357756']}} verbose_eval (bool or int) – Requires at least one item in evals. If verbose_eval is True then the evaluation metric on the validation set is printed at each boosting stage. If verbose_eval is an integer then the evaluation metric on the validation set is printed at every given verbose_eval boosting stage. The last boosting stage / the boosting stage found by using early_stopping_rounds is also printed. Example: with `verbose_eval=4` and at least one item in evals, an evaluation metric is printed every 4 boosting stages, instead of every boosting stage. learning_rates (list or function (deprecated - use callback API instead)) – List of learning rate for each boosting round or a customized function that calculates eta in terms of current number of round and the total number of boosting round (e.g. yields learning rate decay)
Returns:	Booster
Return type:	a trained booster model

Table Of Contents

Python API Reference¶

Core Data Structure¶

Learning API¶