Skip to main content
Data Science Wizardry Blog by Attila Vajda

Shelter from the storm

You gotta get up, before you get down #

"Do it, do it, do it, disco highlife!"

6 students can sit at six desks in 6! ways, because applying the fundamental counting principle, _6P_6 = 6! = 6 × 5 × 4 × 3 × 2 × 1 = 720

I'm trying to set up a multiagent LLM synergy with AutoGenStudio.

$9! = \int_{0}^{∞}​e^{-t}{t^9}dt$

StratifiedGroupKFold not ensuring Stratified splits #28218 #

StratifiedGroupKFold not ensuring Stratified splits #28218

baker trying to distribute different types of cookies 🍪 (classes) among several cookie boxes 📦 (folds)

By Mixtral:

Below I list some key technical terms and corresponding code snippets associated with the StratifiedGroupKFold class, accompanied by relatable analogies to clarify their roles in achieving successful cross-validation.

  1. Cross-Validation: Think of it as taste-testing dishes prepared during a cooking competition before judging begins. Cross-validation helps assess machine learning algorithms' generalizability by repeatedly partitioning data into smaller chunks called folds, then evaluating performance on those partitions.

    Term: n_splits – specifies the number of unique ways to divide the dataset into folds.

    Code Snippet:

    sgkf = StratifiedGroupKFold(n_splits=2, ...)
  2. Stratification: Maintaining a constant ratio of target labels throughout the folds resembles serving tasting portions containing identical proportions of ingredients present in the full dish. By doing so, we obtain more robust estimates of model performance since each fold reflects the global label distribution.

    Terms: strata_key, y – define the variable responsible for preserving balance across folds.

    Code Snippet:

    sgkf = StratifiedGroupKFold(..., strata_key='y', y=y, ...)
  3. Groups: Keeping observations together based on shared characteristics mirrors organizing food items by common traits, such as vegan, vegetarian, or non-vegetarian dishes. Preserving these relationships allows us to evaluate how well our model handles complex structures found in real-world scenarios.

    Term: groups – defines the array determining which elements belong together.

    Code Snippet:

    sgkf = StratifiedGroupKFold(..., groups=groups, ...)
  4. Random State: Introducing controlled variability through seeded pseudo-random processes corresponds to adding spices randomly to batches of dishes served to judges. Doing so introduces slight variations while allowing repeatable experiments.

    Term: random_state – initializes the pseudorandom generator controlling reshuffling.

    Code Snippet:

    sgkf = StratifiedGroupKFold(..., random_state=42, ...)

Understanding these concepts and their respective code counterparts enables users to fine-tune the behavior of the StratifiedGroupKFold class effectively, ultimately yielding more informative insights about their chosen machine learning algorithms.



## Code puzzle puzzling

```python
wibble = ['xjy987']
wozzle = ['x', 'j', 'y', '9', '8', '7']
mixed_up = list(zip(wibble, wozzle))
print(len(mixed_up))

I asked Mixtral to make the logic of this puzzle more challenging, it is a variation of a puzzle from the "99 Tricky Python Puzzles Coffee Break" book, an excellent way to learn Python. It returned this puzzle, it took 24 minutes to solve:

import random

def generate_strings():
    length = random.randint(5, 10)
    chars = []
    for _ in range(length):
        char = chr(random.randint(ord('A'), ord('Z')))
        chars.append(char)
    return ''.join(chars)

wobbles = [generate_strings() for _ in range(10)]
flibbles = set([' '.join([w[i], w[i + 1]]) for i, w in enumerate(wobbles[:-1])])

result = []
for fibble in flibbles:
    pair = [word for word in wobbles if fibble in word][0].split(fibble)
    combined = pair[0][:-1] + fibble + pair[1][1:]
    result.append((combined, len(list(filter(lambda x: x == combined, wobbles))))

max_occurrences = max(result, key=lambda x: x[-1])[1]
common_combinations = sorted([pair for pair, count in result if count == max_occurrences], reverse=True)

print("Maximum occurrence:", max_occurrences)
print("Common combinations:", common_combinations)

Computer science books in baby steps #

Flibble = RandomJungleEstimator(numberTrees=97).mojo(Zorg[:,startIndex:endIndex], Zorg[:,"finalColumn"])

Before guessing the value of the original, I try to guess the value of a gibberish version of the puzzle.

>>> import numpy as np
>>> Gobbledegook = np.array([[4, 3, 2, "wibblyWobbly"],
...                          [3, 1, 2, "wibblyWobbly"],
...                          [4, 4, 4, "wibblyWobbly"],
...                          [1, 5, 4, "woozleWhiffler"],
...                          [1, 4, 1, "woozleWhiffler"],
...                          [3, 4, 5, "frobnitzFrippery"],
...                          [1, 1, 4, "frobnitzFrippery"]])
>>> Gobbledegook[:, :-1]
array([['4', '3', '2'],
       ['3', '1', '2'],
       ['4', '4', '4'],
       ['1', '5', '4'],
       ['1', '4', '1'],
       ['3', '4', '5'],
       ['1', '1', '4']], dtype='<U21')