Pythons Advent Calendar

Data Science Edition

Next task coming out in

Sun

Mon

Tue

Wed

Thu

Fri

Sat

Day 1!

Let's start with a warm-up task! If you are not familiar with NumPy yet, go to the resources section and check them out.
Write a function called generate_arr(start, end) that takes as input two integers start, end and returns a NumPy array (1 dimensional) containing all integers between those values including start and end. If you are not a beginner, provide a one-line solution here :) Note that you can test your function using check_funcion().

Day 2!

Missing data is an everyday problem that data scientists need to deal with. Write a function called replace_nans(array) that takes as input a NumPy array and returns it after replacing all np.nan (numpy.nan) values with -1.

1) numpy.isnan() in Python

Day 3!

Write a funtion called generate_matrix_9x9() that generates a 9x9 matrix with all elements equal 2 except the one in the middle which should be equal 0. If you are not a beginner, set only the boundary entries to 2 and the remaining ones to 0 :)

Day 4!

Write a funtion called get_elements(arr) that returns the median of all the elements of the input array which are greater than 2. If you are able to, make sure that this function works well even if one of the elements is numpy.nan :)

1) NumPy WHERE function
2) NumPy median

Day 5!

Write a funtion called compute_percentage_unique_elements(arr) that returns the percentage of all unique entries of an array (as float). For example, the percentage of all unique enries of the 2-dimensional array [[1,2], [3,1]] is 75.

1) NumPy unique() funtion

Saint Nicholas Day!

Write a function called half_xmas_tree(depth) that takes as argument an integer and prints half of a christmas tree of given depth using 1s (for the xmas tree) and 0s (for the background). For example, calling half_xmas_tree(6) one should get the following output:
[[1, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 1]]
If you are able to, write additionally a function called whole_xmas_tree(depth) that prints an entire christmas tree of a given depth :)

1) NumPy array indexing.

Day 7!

Write a function called weighted_average(arr) that returns the weighted average of an input array where all elements besides the first and the last have the same weight and the weight of the 2 remaining ones is 5x bigger. For example, the weighted average for arr=np.array([1,2,3,4]) should be (5*1+1*2+1*3+5*4)/12 = 2.5. If you are a beginner - assume that the input array is 1-dimensional and it's length is >2. If you are NOT a beginner, try to make sure that this function also works for multi-dimensional arrays and doesn't break if the length of the input array is for example 1 (of course this is optional! :) )

1) NumPy average

Day 8!

Data Scientists sometimes implement their own sophisticated metrics :) Write a funtion called calculate_magda_metric(arr) that takes as input a matrix and computes magda_metric which is the mean of the values in the first column multiplied by the sum of the values in the last row. For example, magda_metric for the following matrix: [[1,2], [3,5]] should be equal ((1+3)/2 * (3+5)) = 2*8 = 16 . If you know how - make sure this function doesn't break if the input matrix has only one row :)

1) NumPy mean
2) NumPy sum

Day 9!

If you have ever heard about Neural Networks (NN), you might already know that weights are very important there. If you haven't yet, now is the time: weight is the parameter within a neural network that transforms input data within the network's hidden layers. No worries - this task is not about implementing a neural network :)
One of the important choices which have to be made before training a neural network consists in initializing the weight matrices since we don't know anything about the possible weights when we start. Write a function called initialize_weights(rows, columns) that returns a rows x columns matrix containing weights which are random uniform distributed over the half-open interval [0, 2).

1) Numpy uniform distribution

Day 10!

Matrix multiplication is an important component of many scientific computing and machine learning tasks, and it is often the performance bottleneck for those tasks. Write a function called can_multiply(m1, m2) that takes as input a x b matrix m1 and c x d matrix m2 and returns true if it is possible to multiply m1 and m2 and false otherwise. For example, can_multiply(m1 = np.array([[1,2], [3,4]]), m2 = np.array([[1,2, 3], [4, 5, 6]])) should return true since it is possible to multiply m1 with the transpose of m2 (2x2 matrix multiplied by a 2x3 matrix s possible). If you are a beginner, assume that the input matrices cannot are 2-dimensional. If you are not a beginner, make sure your function doesn't break if one of the inputs is a 1-dim matrix.

1) NumPy shape
2) Remember, you can multiply two matrices if and only if the number of columns in the first matrix equals the number of rows in the second matrix.

Day 11!

Write a function called multiply_with_transpose(m) that returns matrix m multiplied with its transpose. Think about what is special about M*M^T :)

1) Matrix multiplication
2) Matrix transpose

Day 12!

We are half way there with the challenge and it's high time to play a bit with Pandas which is one of my favourites Python packages! :) Write a function called create_series(data) that takes as input a dictionary and converts it to a Pandas series.

1) Creating Pandas series

Day 13!

Write a function called create_and_sort_series(data) that takes as input a list of values, converts it to a Series, sorts it descending, adds value 2 to the series and returns it.

1) Sort values of a Series
2) Append to a Series

Day 14!

Write a function called occurrences(data) that converts data (a dictionary) into series and returns the frequency count of number 2 of this series :)

1) Series value count

Day 15!

Write a function called where_is_2(series) that returns the position (integer) of the number 2 in a given input series. If you know how, handle also cases where the input series has more then 1 occurrance of the number 2 - in this case you should output the LAST position.

1) Pandas get location of an element

Day 16!

Write a function called get_joining_date(dict) that takes as input a dictionary (see example below), turns it into Pandas dataframe, sets name as index, and returns the date on which Peter has joined the challenge :)

1) Pandas set index
2) Pandas loc

Day 17!

Write a function called get_joining_date(dict) that takes as input a dictionary (see example below, same as yesterday), turns it into Pandas dataframe, converts the column date_of_joining to datetime type, sets date_of_joining as index, and returns the name of the participant who joined the challenge on Dec 1st :)

1) Pandas set index
2) Pandas convert to datetime

Day 18!

Write a function called challenge_dates_left() that returns a Pandas DatetimeIndex containing all challenge dates left - inclusive today and Dec 24th :)

1) Pandas date range

Day 19!

Write a function called get_best_participant(dict) that takes as input a dictionary (see example below), turns it into Pandas dataframe, and returns the name of the participant who has solved the most tasks so far (task_finished column) :)

1) Pandas loc
2) Pandas max

Day 20!

Write a function called get_participants(dict) that takes as input a dictionary (see example below), turns it into Pandas dataframe, and returns the names of the participants (a list) who have solved 16 or 17 tasks :)

1) Pandas between

Day 21!

Write a function called sort_df(dict) that takes as input a dictionary (see example below), turns it into Pandas dataframe, sorts the data frame first by 'attempts' in descending order, then by 'name' in ascending order and returns it :)

1) Pandas sort

Day 22!

Write a function called count_values(df, value) that takes as input a Pandas dataframe (see example below) and an integer value, and returns the frequency of the given input value in the entire dateframe. If you know how, make sure the function doesn't crash if value is not included in the dataframe at all (e.g. value = 666 in the example below).

1) Pandas value count
2) Pandas series ravel

Day 23!

Today's task is for wine-lovers! Consider the dataframe below called wine_consumption. Write a function called get_average_wine_consumption_europe(wine_consumption) that returns how much wine on average do people in European countries drink :)

1) Pandas groupby
2) Pandas mean

Day 24!

One last task to solve! Write a function called xmas_function(df) that takes as input a Pandas dataframe, sorts it ascending by value and returns a string which is an union of all the values in the description column (no spaces). merry christmas!

1) Pandas sort
2) Python join