January 18, 2024(January 18, 2024)
An overview of using Python’s pickle module for object serialization and deserialization, with file paths specified using pathlib.
Object Serialization # Serialize(save) objects to a file using pickle.
import pickle from pathlib import Path # Create an object to serialize my_data = {'key': 'value', 'number': 42} # Define the save path and serialize the object path = Path('data.pkl') with path.open('wb') as file: pickle.dump(my_data, file) Object Deserialization # Deserialize (load) objects from a file using pickle.
...
December 7, 2023(December 8, 2023)
The split_list_into_n_chunks function in Python allows for dividing a given list into n equally sized sublists. If the list’s length is not perfectly divisible by n, it adjusts some sublist sizes slightly to achieve as even a division as possible.
Function Definition # def split_list_into_n_chunks(original_list: list, split_num: int) -> list: chunk_size, remainder = divmod(len(original_list), split_num) chunks = [] start = 0 for _ in range(split_num): end = start + chunk_size + (1 if remainder > 0 else 0) chunks.
...
July 14, 2023(August 23, 2023)
In Machine Learning, a common task is to generate images for metrics such as the confusion matrix and the classification report, which are useful for evaluating model performance.
Here, I will demonstrate how to generate and save these metrics as images using Python’s scikit-learn, matplotlib and seaborn.
Confusion matrix # First, here’s the function for the confusion matrix:
import matplotlib.figure import matplotlib.pyplot as plt import pandas as pd import seaborn as sns from sklearn.
...
January 4, 2023(September 12, 2023)
Polars # Read # import polars as pl # Eager Evaluaiton data_df = pl.read_ndjson("file.jsonl") print(data_df.describe()) # Lazy Evaluaiton data_df = pl.scan_ndjson("file.jsonl") ## Need to evaluation before describe() when lazy evaluation data_df = data_df.fetch() print(data_df.describe()) Write # import polars as pl # sample data list[dict] data_list = [{"name": "alice", "age": "18"}, {"name": "bob", "age": "17"}] data_df = pl.DataFrame(data_list) data_df.write_ndjson("file.jsonl") Pandas # Read # import pandas as pd data_df = pd.read_json("file.jsonl", orient="records", lines=True) print(data_df.
...