src package¶

Subpackages¶

Submodules¶

src.encoders module¶

class src.encoders.ALBERTEncoder(max_seq_length=512)¶

Bases: src.encoders.Encoder

encode(text, context=None, string_type='response')¶: Encode an iterable of strings

finetune_weights(question, answer, margin=0.3, loss='triplet', context=[], neg_answer=[], neg_answer_context=[], label=[])¶

Finetune the model with GradientTape

Parameters:

question (list of str) – List of string queries
answer (list of str) – List of string responses
context (list of str) – List of string response contexts, this is applicable to the USE model
neg_answer (list of str) – List of string responses that do not match with the queries. This is applicable for triplet / contrastive loss.
neg_answer_context (list of str) – Similar to neg_answer for the USE model to ingest
label (list of int) – List of int
margin (float) – Marrgin tuning parameter for triplet / contrastive loss
loss (str) – Specify loss function

Returns:

numpy array of mean loss value

init_signatures()¶: Re-init references to layers and model attributes When restoring the model, the references to the vocab file / layers would be lost.

restore_weights(save_dir=None)¶: Load weights from savepath

save_weights(save_dir=None)¶: Save the BERT model weights into a directory

class src.encoders.BERTEncoder(max_seq_length=512)¶

Bases: src.encoders.Encoder

encode(text, context=None, string_type='response')¶

Return the tensor representing embedding of input text. Type can be ‘query’ or ‘response’

Parameters:	text (str or iterable of str) – This contains the text that is required to be encoded type (str) – Either ‘response’ or ‘query’. Default is ‘response’. In the case of BERT, this argument is ignored
Returns:	a tf.tensor that contains the 768 dim encoding of the input text

finetune_weights(question, answer, margin=0.3, loss='triplet', context=[], neg_answer=[], neg_answer_context=[], label=[])¶

Finetune the model with GradientTape

Parameters:

question (list of str) – List of string queries
answer (list of str) – List of string responses
context (list of str) – List of string response contexts, this is applicable to the USE model
neg_answer (list of str) – List of string responses that do not match with the queries. This is applicable for triplet / contrastive loss.
neg_answer_context (list of str) – Similar to neg_answer for the USE model to ingest
label (list of int) – List of int
margin (float) – Marrgin tuning parameter for triplet / contrastive loss
loss (str) – Specify loss function

Returns:

numpy array of mean loss value

init_signatures()¶: Re-init references to layers and model attributes When restoring the model, the references to the vocab file / layers would be lost.

restore_weights(save_dir=None)¶: Load saved model from savepath

save_weights(save_dir=None)¶: Save the BERT model into a directory

class src.encoders.Encoder¶

Bases: abc.ABC

a shared encoder interface Each encoder should provide an encode() method

encode()¶

finetune_weights()¶

restore_weights()¶

save_weights()¶

class src.encoders.USEEncoder(max_seq_length=None, **kwargs)¶

Bases: src.encoders.Encoder

encode(text, context=None, string_type=None)¶

finetune_weights(question, answer, margin=0.3, loss='triplet', context=[], neg_answer=[], neg_answer_context=[], label=[])¶

Finetune the model with GradientTape

Parameters:

question (list of str) – List of string queries
answer (list of str) – List of string responses
context (list of str) – List of string response contexts, this is applicable to the USE model
neg_answer (list of str) – List of string responses that do not match with the queries. This is applicable for triplet / contrastive loss.
neg_answer_context (list of str) – Similar to neg_answer for the USE model to ingest
label (list of int) – List of int
margin (float) – Marrgin tuning parameter for triplet / contrastive loss
loss (str) – Specify loss function

Returns:

numpy array of mean loss value

init_signatures()¶

restore_weights(save_dir=None)¶: Signatures need to be re-init after weights are loaded.

save_weights(save_dir=None)¶: Save model weights in folder directory

src.loss_functions module¶

src.loss_functions.triplet_loss(anchor_vector, positive_vector, negative_vector, metric='cosine_dist', margin=0.009)¶

Computes the triplet loss with semi-hard negative mining. The loss encourages the positive distances (between a pair of embeddings with the same labels) to be smaller than the minimum negative distance among which are at least greater than the positive distance plus the margin constant (called semi-hard negative) in the mini-batch. If no such negative exists, uses the largest negative distance instead. See: https://arxiv.org/abs/1503.03832.

Parameters:

anchor_vector (tf.Tensor) – The anchor vector in this use case should be the encoded query.
positive_vector (tf.Tensor) – The positive vector in this use case should be the encoded response.
negative_vector (tf.Tensor) – The negative vector in this use case should be the wrong encoded response.
metric (str) – Specify loss function
margin (float) – Margin parameter in loss function. See link above.

Returns:

the triplet loss value, as a tf.float32 scalar.

src.minio_handler module¶

class src.minio_handler.MinioClient(url_endpoint, access_key, secret_key)¶

Bases: object

download_emb_index(bucket_name, emb_obj_name, emb_file_path)¶

download_model_weights(bucket_name, model_obj_name, model_file_path)¶

make_bucket(bucket_name)¶

rm_bucket(bucket_name)¶

upload_emb_index(bucket_name, emb_obj_name, emb_file_path)¶

upload_model_weights(bucket_name, model_obj_name, model_file_path)¶

src.models module¶

class src.models.GoldenRetriever(encoder)¶

Bases: src.models.Model

export_encoder(save_dir)¶: Path should include partial filename. https://www.tensorflow.org/api_docs/python/tf/saved_model/save

finetune(question, answer, margin=0.3, loss='triplet', context=[], neg_answer=[], neg_answer_context=[], label=[])¶: finetunes encoder

load_kb(kb_)¶

Load the knowledge base or bases

Parameters:	kb – kb object as defined in kb_handler

make_query(querystring, top_k=5, index=False, predict_type='query', kb_name='default_kb')¶

Make a query against the stored vectorized knowledge.

Parameters:	type (str) – can be ‘query’ or ‘response’. Use to compare statements kb_name (str) – the name of knowledge base in the knowledge dictionary index (boolean) – Choose index=True to return sorted index of matches.
Returns:	Top K vectorized answers and their scores

predict(text, context=None, string_type='response')¶: encode method of encoder will be used to vectorize texts

restore_encoder(save_dir)¶: Signatures need to be re-init after weights are loaded.

class src.models.Model¶

Bases: abc.ABC

a shared model interface where each model should provide finetune, predict, make_query, export_encoder, restore_encoder methods

export_encoder()¶: export finetuned weights

finetune()¶: finetunes encoder

load_kb()¶: load and encode knowledge bases to return predictions

make_query()¶: uses predict method to vectorize texts and provides relevant responses based on given specifications (eg. num responses) to user

predict()¶: encode method of encoder will be used to vectorize texts

restore_encoder()¶: restores encoder with finetuned weights

src.prebuilt_index module¶

class src.prebuilt_index.SimpleNNIndex(emb_dim_size, metric='angular')¶

Bases: simpleneighbors.SimpleNeighbors

Simple Neighbors Index for calculating similarity between queries and reponses vectorized by Golden Retriever

This class wraps the SimpleNeighbors python package. SimpleNeighbors will select a backend implementation depending on what packages are available in your environment. Therefore it is recommended that you install Annoy pip install annoy to enable the Annoy backend.

Parameters:	emb_dim_size – number of dimensions in the data (eg. 512) metric – distance metric to use. Default is ‘angular’, which is an approximation of cosine distance

build(sentences, sentence_embeddings)¶

builds precomputed vector index from QA responses. uses the Annoy library by default.

Parameters:	sentences – responses in string form sentence_embeddings – responses in embedding form
Returns:	simpleneighbors index for nearest neighbors vector lookup

classmethod load(prefix)¶

restores a previously-saved index

Parameters:	prefix – prefix used when saving index
Returns:	SimpleNNIndex object restored from specified files

query(query_embeddings, num_nbrs)¶

finds response closest to the query vector

The query vector should have the same number of dimensions as the dimensions of the index. Search is limited to the given number of items. Results are given in order of proximity. :param query_embeddings: query in embedding form :param num_nbrs: number of results to return :return: list of items sorted by pro

save(index_prefix)¶

saves index to disk. With the Annoy backend, there are two files produced: the serialized Annoy index and a pickle with other data from the object

Parameters:	prefix – filename prefix for the Annoy index and object data
Returns:	None

src package¶

Subpackages¶

Submodules¶

src.encoders module¶

src.loss_functions module¶

src.minio_handler module¶

src.models module¶

src.prebuilt_index module¶

Module contents¶