Skip to main content
Data Science Wizardry Blog by Attila Vajda

Using data tools on data tools.

How to understand the workings of /numpy/, /scikit-learn/, /pandas/, /matplotlib/ and /sympy/?

grep -r -o -w -h -E "\b([A-Za-z_][A-Za-z_0-9]*)\b" /sympy | sort | uniq -c | sort -nr | head -n 10

Writing lines of code is an actionable leading measure of a programmer. Writing scientific papers is an actionable leading measure of a scientist.[^1] I am writing code, and I am learning to write.

Now that I am learning data science, I find bash and regular expressions useful. I always liked the feeling of these languages, but I felt blocked to continue learning them. It is awesome to be able to use them meaningfully.

Ah, this is very cool. I typed the incantation into the cli, and there was a wait, a suspense. In the dark, early winter silence, sounded the computer's ventilation. Then the results emerged.

~/PROJECTS/Datasci $ grep -r -o -w -h -E "\b([A-Za-z_][A-Za-z_0-9]*)\b" ./numpy | sort | uniq -c | sort -nr | head -n 10
74777 np
67060 the
53041 a
39175 if
36411 of
31063 to
30144 is
27425 self
27380 numpy
26478 for

I wonder if these values can be expressed proportionately by stars with a one liner!

[^1]: Christian Mayer, The Art of Clean Code - Best Practices to Eliminate Complexity and Simplify Your Life (No Starch Press, 2022)