Documentation
Installion
To install the SDK, run the following command in terminal:
pip install viturka
To install the deep learning library, i.e. viturka_nn, run the following command in terminal:
pip install viturka_nn
viturka.FM_als
1. handle_missing_values
Handles missing values in a DataFrame by dropping or imputing columns based on the percentage of NaN values. Includes options for advanced imputation.
Parameters:
df
(pd.DataFrame): The input DataFrame.drop_threshold
(float): Percentage of missing values above which columns are dropped (default: 0.7).fill_threshold
(float): Percentage of missing values below which columns are imputed (default: 0.3).advanced_imputation
(str or None): Type of advanced imputation ('knn', 'regression', or None).
Returns:
A DataFrame with missing values handled.
2. _regression_imputation
Performs regression-based imputation for a column with missing values.
Parameters:
df
(pd.DataFrame): The DataFrame containing the column to be imputed.target_column
(str): The column to be imputed.
Returns:
The DataFrame with the target column imputed.
3. correlation_matrix
Plots the correlation matrix for selected features in a CSV file.
Parameters:
file
(str): Path to the CSV file.selected_columns
(list): List of columns for which to compute correlations.
4. find_similar_items
Finds the most similar items to a given item using cosine similarity and latent vectors from a trained model.
Parameters:
model
: A trained factorization machine model.dict_vectorizer
: A vectorizer for categorical features.item_features
(pd.DataFrame): Dataset containing item features.item_id
: The ID of the target item.item_id_column
(str): Column name for item IDs.numerical_columns
(list): List of numerical feature columns.categorical_columns
(list): List of categorical feature columns.top_k
(int): Number of similar items to return (default: 5).
Returns:
A list of top-k similar items with their similarity scores.
5. train_model
Trains a factorization machine model on a dataset while avoiding data leakage.
Parameters:
file
(str): Path to the CSV dataset.target_column
(str): Column name for the target variable.numerical_columns
(list): List of numerical feature columns.categorical_columns
(list): List of categorical feature columns.item_id_column
(str): Column name for item IDs.n_iter
(int): Number of iterations (default: 100).- Other parameters for model configuration.
Returns:
Trained model, combined training/test data, test target values, vectorizer, scaler, and processed DataFrame.
6. evaluate_model
Evaluates the FM model using mean squared error (MSE).
Parameters:
model
: Trained FM model.X_test
: Test feature matrix.y_test
: True target values for the test set.scaler_target
: Scaler used for the target variable.
Returns:
Predicted values and MSE score.
viturka.client
This document provides an overview of the ModelUploader
class in the client module, which facilitates uploading models to a server, performing aggregation, and integrating the received global model.
Class: ModelUploader
The ModelUploader
class is responsible for handling model uploads, interacting with the server, and performing local aggregation with global models.
Attributes
api_key
: The API key for authenticating requests to the server.server_url
: The endpoint for uploading models to the server. Default ishttps://example.com/upload_model
.
Methods
__init__(self, api_key)
Initializes the ModelUploader
object with the provided API key.
Parameters:
api_key
(str): The API key for authenticating server requests.
pad_to_match_shape(self, model1_weights, model2_weights)
Pads the smaller weight array with zeros to match the size of the larger array.
Parameters:
model1_weights
(array): The first model's weights.model2_weights
(array): The second model's weights.
Returns:
- A tuple of two arrays with matching shapes.
upload_model(self, model, model_type, vectorizer=None)
Uploads a model to the server, retrieves the global model, and performs local aggregation.
Parameters:
model
: The local model to be uploaded.model_type
(str): The type of the model (e.g., "linear").vectorizer
(optional): Additional vectorizer object for preprocessing, if applicable.
Returns:
- The locally aggregated model after merging with the global model.
Process:
- Serializes the local model and optional vectorizer using
pickle
. - Sends the serialized data to the server via a POST request.
- Deserializes the received global model and performs local aggregation on weights, latent factor matrix, and bias term.
Usage Example
from viturka.client import ModelUploader
# Initialize the uploader
uploader = ModelUploader(api_key="your_api_key")
# Define your model and optional vectorizer
local_model = ... # Example: A scikit-learn or custom model object
vectorizer = ... # Example: A vectorizer
# Upload and aggregate the model
aggregated_model = uploader.upload_model(local_model, model_type="linear", vectorizer=vectorizer)
Viturka API
Following are the examples of API Payloads:
Bash (using curl)
curl -X POST \
"https://viturka.com/upload_model" \
-H "Authorization: Bearer ${api_key}" \
-F "model=@model.pkl;filename=model.pkl" \
-F "model_type=${model_type}"
cURL (command-line tool)
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "api_key=${api_key}" \
-F "model_type=${model_type}" \
-F "model=@model.pkl;filename=model.pkl" \
"https://viturka.com/upload_model"
JavaScript (using fetch API)
const formData = new FormData();
formData.append('api_key', api_key);
formData.append('model_type', model_type);
formData.append('model', new Blob([model_data]), 'model.pkl');
fetch("https://viturka.com/upload_model", {
method: 'POST',
body: formData
})
.then(response => response.json())
.then(data => {
// Process response data
})
.catch(error => {
console.error(error);
});
Python (using requests)
import requests
response = requests.post(
"https://viturka.com/upload_model",
files={'model': ('model.pkl', model_data)},
data={'api_key': api_key, 'model_type': model_type}
)
JSON
{
"api_key": "${api_key}",
"model_type": "${model_type}",
"model": "data:application/octet-stream;base64,"
}
Following is the list of collaborative filtering recommender model types offered by Viturka along with required parameters:
For more collaborative models, check out Model RegistryParameter | Description |
---|---|
User ID | Unique identifier for each user interacting with the platform. |
City | The location of the user during interaction, which can help in geo-targeted recommendations. |
Age | The age of the user, important for personalized product recommendations based on age groups. |
Gender | The gender of the user, which can influence product recommendations in certain categories. |
Product Name | The name or title of the product that the user interacts with. |
Product Category | Classification of the product into categories such as electronics, clothing, etc. |
Purchase Status | Indicates whether the user purchased the product (yes/no). |
View Time | The time the user spent viewing or interacting with the product page. |
Added to Cart | Whether the product was added to the user’s shopping cart (yes/no). |
Interaction Date | The date the user interacted with the product, useful for time-sensitive recommendations. |
Interaction Time | The time of day when the user interacted with the product, helpful for analyzing browsing habits. |
Parameter | Description |
---|---|
User ID | Unique identifier for each user watching videos on the platform. |
City | The location of the user during streaming, useful for regional content recommendations. |
Age | The age of the user, important for recommending age-appropriate content. |
Gender | The gender of the user, which may influence genre recommendations. |
Video Title | The title of the video content that the user watches. |
Genre | The genre of the video, such as action, drama, comedy, etc. |
Watch Completion | Indicates whether the user watched the video to the end (yes/no). |
Paused | Indicates if the user paused the video while watching (yes/no). |
Watch Time | The total time the user spent watching the video. |
Interaction Date | The date the user interacted with the video, useful for analyzing viewing patterns. |
Interaction Time | The time of day when the user watched the video, useful for predicting peak streaming hours. |
Parameter | Description |
---|---|
User ID | Unique identifier for each user listening to music on the platform. |
City | The location of the user during music streaming, useful for regional or local music recommendations. |
Age | The age of the user, relevant for tailoring music recommendations. |
Gender | The gender of the user, which may influence personalized music suggestions. |
Track Name | The name of the song or track that the user listens to. |
Artist | The name of the artist or band that performed the track. |
Listened Completely | Indicates whether the user listened to the entire song (yes/no). |
Skipped | Indicates if the user skipped the track before it finished (yes/no). |
Play Count | Total number of times the user played the track. |
Interaction Date | The date the user interacted with the song, useful for seasonal recommendations. |
Interaction Time | The time of day when the user listened to the track, useful for mood-based playlists. |
Parameter | Description |
---|---|
User ID | Unique identifier for each user interacting with the fashion platform. |
City | The location of the user, useful for suggesting region-specific trends. |
Age | The age of the user, relevant for recommending fashion styles appropriate for their demographic. |
Gender | The gender of the user, essential for recommending gender-specific clothing. |
Clothing Item | The name of the clothing item the user interacts with. |
Category | The type of clothing, such as shirts, dresses, etc. |
Added to Wishlist | Indicates if the user added the clothing item to their wishlist (yes/no). |
Purchase Status | Indicates whether the user bought the item (yes/no). |
Interaction Date | The date the user interacted with the clothing item, useful for seasonal trends analysis. |
Interaction Time | The time of day when the user browsed the clothing item. |
Parameter | Description |
---|---|
User ID | Unique identifier for each user on the learning platform. |
City | The location of the user, useful for language-based or region-specific courses. |
Age | The age of the user, which can influence course recommendations. |
Gender | The gender of the user, which may inform certain career-based recommendations. |
Course Name | The name of the course the user enrolled in or interacted with. |
Category | The course category, such as data science, business, personal development, etc. |
Course Completion | Indicates whether the user completed the course (yes/no). |
Lesson Watched | Indicates whether the user watched a specific lesson (yes/no). |
Quiz Attempted | Whether the user attempted the course quiz or exam (yes/no). |
Interaction Date | The date the user interacted with the course content. |
Interaction Time | The time of day when the user engaged with the course. |
Parameter | Description |
---|---|
User ID | Unique identifier for each user on the dating platform. |
City | The location of the user, useful for local match recommendations. |
Age | The age of the user, crucial for suggesting age-appropriate matches. |
Gender | The gender of the user, essential for matching preferences. |
Match Occurred | Indicates whether the user successfully matched with another user (yes/no). |
Message Sent | Whether the user sent a message to their match (yes/no). |
Date Scheduled | Indicates whether the user scheduled a date with their match (yes/no). |
Interaction Date | The date the user interacted with another profile. |
Interaction Time | The time of day when the user engaged with a match. |