
Pythons Advent Calendar
Data Science Edition
Next task coming out in
Day 1!
Let's start with a warm-up task! If you are not familiar with NumPy yet, go to the resources section and check them out.
Write a function called generate_arr(start, end)
that takes as input two integers start, end
and returns a NumPy array (1 dimensional) containing
all integers between those values including start
and end
.
If you are not a beginner, provide a one-line solution here :)
Note that you can test your function using check_funcion()
.
Day 2!
Missing data is an everyday problem that data scientists need to deal with.
Write a function called replace_nans(array)
that takes as input a NumPy array
and returns it after replacing all np.nan (numpy.nan)
values with -1.
Day 3!
Write a funtion called generate_matrix_9x9()
that generates a 9x9 matrix with all
elements equal 2 except the one in the middle which should be equal 0. If you are not a beginner,
set only the boundary entries to 2 and the remaining ones to 0 :)
Day 4!
Write a funtion called get_elements(arr)
that returns the median of all the elements of the input
array which are greater than 2. If you are able to, make sure that this function works well even if one of the elements is numpy.nan
:)
Day 5!
Write a funtion called compute_percentage_unique_elements(arr)
that returns the percentage of all unique entries of an array (as float).
For example, the percentage of all unique enries of the 2-dimensional array [[1,2], [3,1]]
is 75.
Saint Nicholas Day!
Write a function called half_xmas_tree(depth)
that takes as argument an integer and prints half of a christmas tree of given depth
using 1s (for the xmas tree) and 0s (for the background). For example, calling half_xmas_tree(6)
one should get the following output:
[[1, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 1]]
If you are able to, write additionally a function called whole_xmas_tree(depth) that prints an entire christmas tree of a given depth :)
Day 7!
Write a function called weighted_average(arr)
that returns the weighted average of
an input array where all elements besides the first and the last have the same weight and the weight of the 2
remaining ones is 5x bigger. For example, the weighted average for
arr=np.array([1,2,3,4])
should be (5*1+1*2+1*3+5*4)/12 = 2.5
.
If you are a beginner - assume that the input array is 1-dimensional and it's length is >2.
If you are NOT a beginner, try to make sure that this function also works for multi-dimensional arrays and doesn't
break if the length of the input array is for example 1 (of course this is optional! :) )
Day 8!
Data Scientists sometimes implement their own sophisticated metrics :) Write a funtion called
calculate_magda_metric(arr)
that takes as input a matrix and computes
magda_metric
which is the mean of the values in the first column multiplied by
the sum of the values in the last row. For example, magda_metric
for the following matrix:
[[1,2], [3,5]]
should be equal ((1+3)/2 * (3+5)) = 2*8 = 16
.
If you know how - make sure this function doesn't break if the input matrix has only one row :)
-
1) NumPy mean
-
2) NumPy sum
Day 9!
If you have ever heard about Neural Networks (NN), you might already know that weights are very important there. If you haven't yet, now is the time: weight is the parameter within a neural network that transforms input data within the network's hidden layers. No worries - this task is not about implementing a neural network :)
One of the important choices which have to be made before training a neural network consists in initializing the weight matrices since we don't know anything about the possible weights when we start. Write a function called initialize_weights(rows, columns)
that returns a rows x columns
matrix containing weights which are random uniform distributed over the half-open interval [0, 2)
.
Day 10!
Matrix multiplication is an important component of many scientific computing and machine learning tasks, and it is often the performance bottleneck for those tasks. Write a function called can_multiply(m1, m2)
that takes as input a x b
matrix m1
and c x d
matrix m2
and returns true
if it is possible to multiply m1
and m2
and false
otherwise. For example, can_multiply(m1 = np.array([[1,2], [3,4]]), m2 = np.array([[1,2, 3], [4, 5, 6]]))
should return true
since it is possible to multiply m1
with the transpose of m2
(2x2 matrix multiplied by a 2x3 matrix s possible). If you are a beginner, assume that the input matrices cannot are 2-dimensional. If you are not a beginner, make sure your function doesn't break if one of the inputs is a 1-dim matrix.
-
1) NumPy shape
-
2) Remember, you can multiply two matrices if and only if the number of columns in the first matrix equals the number of rows in the second matrix.
Day 11!
Write a function called multiply_with_transpose(m)
that returns matrix m
multiplied with its transpose. Think about what is special about M*M^T
:)
Day 12!
We are half way there with the challenge and it's high time to play a bit with Pandas which is one of my favourites Python packages! :) Write a function called create_series(data)
that takes as input a dictionary and converts it to a Pandas series.
Day 13!
Write a function called create_and_sort_series(data)
that takes as input a list of values,
converts it to a Series, sorts it descending, adds value 2
to the series and returns it.
Day 14!
Write a function called occurrences(data)
that converts data (a dictionary) into series and returns the frequency count of number 2 of this series :)
Day 15!
Write a function called where_is_2(series)
that returns the position (integer) of the number 2 in a given input series. If you know how, handle also cases where the input series has more then 1 occurrance of the number 2 - in this case you should output the LAST position.
Day 16!
Write a function called get_joining_date(dict)
that takes as input a dictionary (see example below), turns it into Pandas dataframe, sets name
as index, and returns the date on which Peter has joined the challenge :)
-
2) Pandas loc
Day 17!
Write a function called get_joining_date(dict)
that takes as input a dictionary (see example below, same as yesterday), turns it into Pandas dataframe, converts the column date_of_joining
to datetime type, sets date_of_joining
as index, and returns the name of the participant who joined the challenge on Dec 1st :)
Day 18!
Write a function called challenge_dates_left()
that returns a Pandas DatetimeIndex containing all challenge dates left - inclusive today and Dec 24th :)
Day 19!
Write a function called get_best_participant(dict)
that takes as input a dictionary (see example below), turns it into Pandas dataframe, and returns the name of the participant who has solved the most tasks so far (task_finished
column) :)
-
1) Pandas loc
-
2) Pandas max
Day 20!
Write a function called get_participants(dict)
that takes as input a dictionary (see example below), turns it into Pandas dataframe, and returns the names of the participants (a list) who have solved 16 or 17 tasks :)
Day 21!
Write a function called sort_df(dict)
that takes as input a dictionary (see example below), turns it into Pandas dataframe, sorts the data frame first by 'attempts' in descending order, then by 'name' in ascending order and returns it :)
-
1) Pandas sort
Day 22!
Write a function called count_values(df, value)
that takes as input a Pandas dataframe (see example below) and an integer value
, and returns the frequency of the given input value in the entire dateframe. If you know how, make sure the function doesn't crash if value
is not included in the dataframe at all (e.g. value = 666
in the example below).
Day 23!
Today's task is for wine-lovers! Consider the dataframe below called wine_consumption
. Write a function called get_average_wine_consumption_europe(wine_consumption)
that returns how much wine on average do people in European countries drink :)
-
2) Pandas mean
Day 24!
One last task to solve! Write a function called xmas_function(df)
that takes as input a Pandas dataframe, sorts it ascending by value
and returns a string which is an union of all the values in the description
column (no spaces). merry christmas!
-
1) Pandas sort
-
2) Python join