text_classification_query()

Automatically fits a text classification model to your dataset. All standard text modification procedures are applied automatically if applicable. Stored as 'text_classification' in models dictionary.

Dataset Guidelines

One column in the file should contain the text to be classified
One column should contain the label of each text and SHOULD BE NAMED LABEL. If is named something else, please specify the name with the label_column parameter.
Your instruction should be about the text you want to classify, for example in a problem where you have a tweet column (with the text) and a sentiment column (with 0-2 representing mood) your instruction should be about the tweet: 'please perform analysis on tweets'.

Arguments
instruction=None	An English language statement that represents the task you would like to be completed. eg: 'predict the median house value' or 'please estimate the number of households'. Should correspond to the column of text that you want to classify.
label_column='label'	Represents the column name in which your label exists. If the name is already similar to 'label' then this parameter is not required.
preprocess=True	Whether you want your dataset to be intelligently preprocessed.
test_size=0.2	The proportion of your entire dataset that is used for testing.
random_state=49	The randomization channel that you want to be set at.
learning_rate=1e-2	The default rate at which your model learns based on gradient descent.
epochs=20	Number of epochs. This is for every model that's created in the process.
epochs=50	Number of epochs for every model attempted
monitor='val_loss'	The parameter that you want the query to minimize/maximize. For example, the default setting will try to minimize your validation loss.
generate_plots=True	Whether you want libra to create accuracy and loss plots for you.
batch_size=32	The number of dataset points that will be provided to your model in every pass.
max_text_length=200	The maximum amount of text that can be used to classify. If larger, it will cut off the rest.
max_features=20000	The size of the input embedding layer in the model.
generate_plots=True	Whether you want libra to create accuracy and loss plots for you.
save_model=False	Do you want the model weights and architecture to be saved as a .json and .h5 file.
save_path=os.getcwd()	Where do you want the save_model information to be stored. Default is current working directory.

new_client = client('path_to_csv')
new_client.text_classification_query('Please estimate the sentiment')
new_client.classify_text('new text to classify')

summarization_query()

Automatically fits a transfer-learning Document Summarization model to your dataset. This model will have frozen layers with pretrained weights to help with small dataset sizes. Stored as 'doc_summarization' in models dictionary.

Dataset Guidelines

The data that you want to summarized should be the target of the instruction. So if you want to summarize tweets, the instruction could be 'summarize long textual tweets'.
The result, or the summary should be in a column called 'summary'. THIS IS ESSENTIAL.
Your instruction should be about the label column, not the text.

Arguments
drop=None	Columns to drop manually, drop columns with links, weirdly formatted numbers, and others.
epochs=10	Number of epochs. This is for every model that's created in the process.
batch_size=32	The number of dataset points that will be provided to your model in every pass.
learning_rate=1e-2	The default rate at which your model learns based on gradient descent.
max_text_length=512	The maximum amount of text that can be summarized
max_summary_length=150	Maximum outputted summary length. The longer this is, the less accurate it will be.
gpu=False	Determines whether a built in cpu or gpu will be used.
generate_plots=True	Whether you want libra to create accuracy and loss plots for you.
save_model=False	Do you want the model weights and architecture to be saved as a .json and .h5 file.
save_path=os.getcwd()	Where do you want the save_model information to be stored. Default is current working directory.

newClient = client('path_to_csv')
newClient.summarization_query("Please summarize original text")
newClient.get_summary('new text to summarize')

image_caption_query()

Automatically fits an caption generation transfer learning model to your dataset. This model will have frozen layers with pretrained weights to help with small dataset sizes. Stored as 'image_caption' in models dictionary.

Dataset Guidelines

One column in the file should be that path to the images. This will be found automatically.
The target of your instruction should be the caption column. So, maybe if your caption column is called short tweets, have your instruction be 'please shorten this text into short tweets'.

Arguments
instruction	An English language statement that represents the task you would like to be completed. eg: 'predict the median house value' or 'please estimate the number of households'. Should correspond to the column of captions in the dataset.
drop=None	Columns to drop manually, drop columns with links, weirdly formatted numbers, and others.
epochs=10	Number of epochs. This is for every model that's created in the process.
preprocess=True	Whether you want your dataset to be intelligently preprocessed.
random_state=49	he randomization channel that you want to be set at.
buffer_size=1000	Maximum number of elements that will be buffered to be selected for the caption.
embedding_dim=256	Sets the size of the word embedding mapping.
units=512	The exact number of recurrent units present in the decoder.
gpu=False	Determines whether a built in cpu or gpu will be used.
generate_plots=True	Whether you want libra to create accuracy and loss plots for you.

newClient = client('path_to_csv')
newClient.image_caption_query('Generate image captions')
newClient.generate_caption('path to image')

generate_text()

Automatically generates text of specified length based on initial prefix text. Stored as ‘generated_text’ in models dictionary.

Dataset Guidelines

A text file with the initial part of the text that you want to use to generate the next set of text.
OR just type in the prefix that you want yourself for the prefix argument

Arguments
file_data=True	Whether or not you want the client file you provided to be the prefix used in generating text
prefix=None	If file_data is false then what is the prefix you would like to use - string
tempreture=0.3	The temperature to make the next word probability distribution sharper (float).
maxLength=512	What length do you want your generated text to be.
top_k=50	He randomization channel that you want to be set at.
top_p=0.9	Maximum number of elements that will be buffered to be selected for the caption.
return_sequences=2	How many different variations do you want to be returned

newClient = client('path_to_txt’)
newClient.generate_text(“generate text” file_data=False, prefix=“Hello there!”)

named_entity_query()

Automatically detects name entities like persons name, geographic locations, organization/companies and addresses from label column containing text. Stored as 'named_entity_recognition' in models dictionary.

Dataset Guidelines

The data that you want to extract name entities from should be the target of the instruction. For example, if you want to extract name entities from text column, the instruction could be 'extract ner from text'. or simple 'text'
Your instruction should be about the label column, not the text.

Arguments
instruction	An English language statement that represents the task you would like to be completed. eg: 'predict the median house value' or 'please estimate the number of households'. Should correspond to the column of captions in the dataset.

newClient = client('path_to_txt’)
newClient.named_entity_query('detect from text')