text_classification_query()


Automatically fits a text classification model to your dataset. All standard text modification procedures are applied automatically if applicable. Stored as 'text_classification' in models dictionary.

Dataset Guidelines


Arguments
instruction=None An English language statement that represents the task you would like to be completed. eg: 'predict the median house value' or 'please estimate the number of households'. Should correspond to the column of text that you want to classify.
label_column='label' Represents the column name in which your label exists. If the name is already similar to 'label' then this parameter is not required.
preprocess=True Whether you want your dataset to be intelligently preprocessed.
test_size=0.2 The proportion of your entire dataset that is used for testing.
random_state=49 The randomization channel that you want to be set at.
learning_rate=1e-2 The default rate at which your model learns based on gradient descent.
epochs=20 Number of epochs. This is for every model that's created in the process.
epochs=50 Number of epochs for every model attempted
monitor='val_loss' The parameter that you want the query to minimize/maximize. For example, the default setting will try to minimize your validation loss.
generate_plots=True Whether you want libra to create accuracy and loss plots for you.
batch_size=32 The number of dataset points that will be provided to your model in every pass.
max_text_length=200 The maximum amount of text that can be used to classify. If larger, it will cut off the rest.
max_features=20000 The size of the input embedding layer in the model.
generate_plots=True Whether you want libra to create accuracy and loss plots for you.
save_model=False Do you want the model weights and architecture to be saved as a .json and .h5 file.
save_path=os.getcwd() Where do you want the save_model information to be stored. Default is current working directory.
new_client = client('path_to_csv')
new_client.text_classification_query('Please estimate the sentiment')
new_client.classify_text('new text to classify')

summarization_query()


Automatically fits a transfer-learning Document Summarization model to your dataset. This model will have frozen layers with pretrained weights to help with small dataset sizes. Stored as 'doc_summarization' in models dictionary.

Dataset Guidelines


Arguments
drop=None Columns to drop manually, drop columns with links, weirdly formatted numbers, and others.
epochs=10 Number of epochs. This is for every model that's created in the process.
batch_size=32 The number of dataset points that will be provided to your model in every pass.
learning_rate=1e-2 The default rate at which your model learns based on gradient descent.
max_text_length=512 The maximum amount of text that can be summarized
max_summary_length=150 Maximum outputted summary length. The longer this is, the less accurate it will be.
gpu=False Determines whether a built in cpu or gpu will be used.
generate_plots=True Whether you want libra to create accuracy and loss plots for you.
save_model=False Do you want the model weights and architecture to be saved as a .json and .h5 file.
save_path=os.getcwd() Where do you want the save_model information to be stored. Default is current working directory.
newClient = client('path_to_csv')
newClient.summarization_query("Please summarize original text")
newClient.get_summary('new text to summarize')

image_caption_query()


Automatically fits an caption generation transfer learning model to your dataset. This model will have frozen layers with pretrained weights to help with small dataset sizes. Stored as 'image_caption' in models dictionary.

Dataset Guidelines


Arguments
instruction An English language statement that represents the task you would like to be completed. eg: 'predict the median house value' or 'please estimate the number of households'. Should correspond to the column of captions in the dataset.
drop=None Columns to drop manually, drop columns with links, weirdly formatted numbers, and others.
epochs=10 Number of epochs. This is for every model that's created in the process.
preprocess=True Whether you want your dataset to be intelligently preprocessed.
random_state=49 he randomization channel that you want to be set at.
buffer_size=1000 Maximum number of elements that will be buffered to be selected for the caption.
embedding_dim=256 Sets the size of the word embedding mapping.
units=512 The exact number of recurrent units present in the decoder.
gpu=False Determines whether a built in cpu or gpu will be used.
generate_plots=True Whether you want libra to create accuracy and loss plots for you.
newClient = client('path_to_csv')
newClient.image_caption_query('Generate image captions')
newClient.generate_caption('path to image')

generate_text()


Automatically generates text of specified length based on initial prefix text. Stored as ‘generated_text’ in models dictionary.

Dataset Guidelines


Arguments
file_data=True Whether or not you want the client file you provided to be the prefix used in generating text
prefix=None If file_data is false then what is the prefix you would like to use - string
tempreture=0.3 The temperature to make the next word probability distribution sharper (float).
maxLength=512 What length do you want your generated text to be.
top_k=50 He randomization channel that you want to be set at.
top_p=0.9 Maximum number of elements that will be buffered to be selected for the caption.
return_sequences=2 How many different variations do you want to be returned
newClient = client('path_to_txt’)
newClient.generate_text(“generate text” file_data=False, prefix=“Hello there!”)

named_entity_query()


Automatically detects name entities like persons name, geographic locations, organization/companies and addresses from label column containing text. Stored as 'named_entity_recognition' in models dictionary.

Dataset Guidelines


Arguments
instruction An English language statement that represents the task you would like to be completed. eg: 'predict the median house value' or 'please estimate the number of households'. Should correspond to the column of captions in the dataset.
newClient = client('path_to_txt’)
newClient.named_entity_query('detect from text')