Python API Reference
This page gives the Python API reference of xgboost, please also refer to Python Package Introduction for more information about python package.
Core Data Structure
Core XGBoost Library.
-
class
securexgboost.DMatrix(data, encrypted=False, label=None, missing=None, weight=None, silent=False, feature_names=None, feature_types=None, nthread=None)
Bases: object
Data Matrix used in XGBoost.
DMatrix is a internal data structure that used by XGBoost
which is optimized for both memory efficiency and training speed.
You can construct DMatrix from numpy.arrays
-
feature_names
Get feature names (column labels).
| Returns: | feature_names |
| Return type: | list or None |
-
feature_types
Get feature types (column types).
| Returns: | feature_types |
| Return type: | list or None |
-
num_col()
Get the number of columns (features) in the DMatrix.
| Returns: | number of columns |
| Return type: | int |
-
num_row()
Get the number of rows in the DMatrix.
| Returns: | number of rows |
| Return type: | int |
-
class
securexgboost.Booster(params=None, cache=(), model_file=None)
Bases: object
A Booster of XGBoost.
Booster is the model of xgboost, that contains low level routines for
training, prediction and evaluation.
| Parameters: |
- params (dict) – Parameters for boosters.
- cache (list) – List of cache items.
- model_file (string) – Path to the model file.
|
-
eval(data, name='eval', iteration=0)
Evaluate the model on mat.
| Parameters: |
- data (DMatrix) – The dmatrix storing the input.
- name (str, optional) – The name of the dataset.
- iteration (int, optional) – The current iteration number.
|
| Returns: | result – Evaluation result string.
|
| Return type: | str
|
-
eval_set(evals, iteration=0, feval=None)
Evaluate a set of data.
| Parameters: |
- evals (list of tuples (DMatrix, string)) – List of items to be evaluated.
- iteration (int) – Current iteration.
- feval (function) – Custom evaluation function.
|
| Returns: | result – Evaluation result string.
|
| Return type: | str
|
-
predict(data, output_margin=False, ntree_limit=0, pred_leaf=False, pred_contribs=False, approx_contribs=False, pred_interactions=False, validate_features=True)
Predict with data.
Note
This function is not thread safe.
For each booster object, predict can only be called from one thread.
If you want to run prediction using multiple thread, call bst.copy() to make copies
of model object and then call predict().
Note
Using predict() with DART booster
If the booster object is DART type, predict() will perform dropouts, i.e. only
some of the trees will be evaluated. This will produce incorrect results if data is
not the training data. To obtain correct results on test sets, set ntree_limit to
a nonzero value, e.g.
preds = bst.predict(dtest, ntree_limit=num_round)
| Parameters: |
- data (DMatrix) – The dmatrix storing the input.
- output_margin (bool) – Whether to output the raw untransformed margin value.
- ntree_limit (int) – Limit number of trees in the prediction; defaults to 0 (use all trees).
- pred_leaf (bool) – When this option is on, the output will be a matrix of (nsample, ntrees)
with each record indicating the predicted leaf index of each sample in each tree.
Note that the leaf index of a tree is unique per tree, so you may find leaf 1
in both tree 1 and tree 0.
- pred_contribs (bool) – When this is True the output will be a matrix of size (nsample, nfeats + 1)
with each record indicating the feature contributions (SHAP values) for that
prediction. The sum of all feature contributions is equal to the raw untransformed
margin value of the prediction. Note the final column is the bias term.
- approx_contribs (bool) – Approximate the contributions of each feature
- pred_interactions (bool) – When this is True the output will be a matrix of size (nsample, nfeats + 1, nfeats + 1)
indicating the SHAP interaction values for each pair of features. The sum of each
row (or column) of the interaction values equals the corresponding SHAP value (from
pred_contribs), and the sum of the entire matrix equals the raw untransformed margin
value of the prediction. Note the last row and column correspond to the bias term.
- validate_features (bool) – When this is True, validate that the Booster’s and data’s feature_names are identical.
Otherwise, it is assumed that the feature_names are the same.
|
| Returns: |
- prediction (numpy array)
- num_preds (number of predictions)
|
-
set_param(params, value=None)
Set parameters into the Booster.
| Parameters: |
- params (dict/list/str) – list of key,value pairs, dict of key to value or simply str key
- value (optional) – value of the specified parameter, when params is str key
|
-
update(dtrain, iteration, fobj=None)
Update for one iteration, with objective function calculated
internally. This function should not be called directly by users.
| Parameters: |
- dtrain (DMatrix) – Training data.
- iteration (int) – Current iteration number.
- fobj (function) – Customized objective function.
|
-
class
securexgboost.Enclave(enclave_image=None, flags=3, create_enclave=True, log_verbosity=0)
Bases: object
An enclave.
A trusted execution environment used for secure XGBoost.
-
get_remote_report_with_pubkey()
Get remote attestation report and public key of enclave
-
get_report_attrs()
Get the enclave public key and remote report
To be called by the RPC service
Must be called after get_remote_report_with_pubkey() is called
| Returns: |
- pem_key (proto.NDArray)
- key_size (int)
- remote_report (proto.NDArray)
- remote_report_size (int)
|
-
set_report_attrs(pem_key, key_size, remote_report, remote_report_size)
Set the enclave public key and remote report
To be used by RPC client during verification
-
verify_remote_report_and_set_pubkey()
Verify the received attestation report and set the public key
-
class
securexgboost.CryptoUtils
Bases: object
Crypto utils class
-
add_client_key(data, data_len, signature, sig_len)
Add client symmetric key used to encrypt file fname
| Parameters: |
- data (proto.NDArray) – key used to encrypt client files
- data_len (int) – length of data
- signature (proto.NDArray) – signature over data, signed with client private key
- sig_len (int) – length of signature
|
-
decrypt_predictions(key, encrypted_preds, num_preds)
Decrypt encrypted predictions
| Parameters: |
- key (byte array) – key used to encrypt client files
- encrypted_preds (c_char_p) – encrypted predictions
- num_preds (int) – number of predictions
|
| Returns: | preds – plaintext predictions
|
| Return type: | numpy array
|
-
encrypt_data_with_pk(data, data_len, pem_key, key_size)
| Parameters: |
- data (byte array) –
- data_len (int) –
- pem_key (proto) –
- key_size (int) –
|
| Returns: |
- encrypted_data (proto.NDArray)
- encrypted_data_size_as_int (int)
|
-
encrypt_file(input_file, output_file, key_file)
Encrypt a file
| Parameters: |
- input_file (str) – path to file to be encrypted
- output_file (str) – path to which encrypted file will be saved
- key_file (str) – path to key used to encrypt file
|
-
generate_client_key(path_to_key)
Generate a new key and save it to path_to_key
| Parameters: | path_to_key (str) – path to which key will be saved |
-
sign_data(keyfile, data, data_size)
| Parameters: |
- keyfile (str) –
- data (proto.NDArray) –
- data_size (int) –
|
| Returns: |
- signature (proto.NDArray)
- sig_len_as_int (int)
|
Learning API
Training Library containing training routines.
-
securexgboost.train(params, dtrain, num_boost_round=10, evals=(), early_stopping_rounds=None, evals_result=None, verbose_eval=True, callbacks=None, learning_rates=None)
Train a booster with given parameters.
| Parameters: |
- params (dict) – Booster params.
- dtrain (DMatrix) – Data to be trained.
- num_boost_round (int) – Number of boosting iterations.
- evals (list of pairs (DMatrix, string)) – List of items to be evaluated during training, this allows user to watch
performance on the validation set.
- early_stopping_rounds (int) – Activates early stopping. Validation error needs to decrease at least
every early_stopping_rounds round(s) to continue training.
Requires at least one item in evals.
If there’s more than one, will use the last.
Returns the model from the last iteration (not the best one).
If early stopping occurs, the model will have three additional fields:
bst.best_score, bst.best_iteration and bst.best_ntree_limit.
(Use bst.best_ntree_limit to get the correct value if
num_parallel_tree and/or num_class appears in the parameters)
- evals_result (dict) –
This dictionary stores the evaluation results of all the items in watchlist.
Example: with a watchlist containing
[(dtest,'eval'), (dtrain,'train')] and
a parameter containing ('eval_metric': 'logloss'),
the evals_result returns
{'train': {'logloss': ['0.48253', '0.35953']},
'eval': {'logloss': ['0.480385', '0.357756']}}
- verbose_eval (bool or int) – Requires at least one item in evals.
If verbose_eval is True then the evaluation metric on the validation set is
printed at each boosting stage.
If verbose_eval is an integer then the evaluation metric on the validation set
is printed at every given verbose_eval boosting stage. The last boosting stage
/ the boosting stage found by using early_stopping_rounds is also printed.
Example: with
verbose_eval=4 and at least one item in evals, an evaluation metric
is printed every 4 boosting stages, instead of every boosting stage.
- learning_rates (list or function (deprecated - use callback API instead)) – List of learning rate for each boosting round
or a customized function that calculates eta in terms of
current number of round and the total number of boosting round (e.g. yields
learning rate decay)
|
| Returns: | Booster
|
| Return type: | a trained booster model
|