ENGG1811 Lab 09: File handling, numpy

Please watch the lecture Week 08B (Video): File Handling.
You can find the lecture notes and exercises under "Week 08B", see Lectures.

Objectives

After completing this lab, students should be able to:

Assessment

This lab has three parts: Parts A to C. You need to show your tutors all three parts. 

For all the programs, we expect that you choose informative variable names and document your program.

There is also an online multiple choice question which is worth 1 mark. We suggest that you attempt this question after completing Parts A-C.

Organising your work

You should make a directory called lab09 to store your files for this lab.


Part A: File handling

The aim of the exercise is to practice using Python to read data files. For this exercise, you need to download the zip-file lab09A.zip [Note: right click to download] and move it into the directory lab09 that you created. This file contains 30 files zipped together. If you unzip this file on a Windows or an Apple machine, the computer will create a directory called lab09A within the directory lab09.

(In case you are doing your lab by remotely logging into a CSE computer using Vlab, these instructions on this page will apply to you. Note that the instructions were written assuming lab09, but you should still be able to follow.)

If you use your file explorer (on Windows) or finder (on Apple), you should find 30 files in the directory lab09A. These files have names temp00.txt, temp01.txt, temp02.txt, ..., temp29.txt. All these 30 files have the same format. If you double click on a file, a text editor window will pop up and you will be able to see the contents of the file. You need to take care not to edit the files because you do not wish to change the format of the file. The contents of temp00.txt are:

A    4
23.31    20.89    27.04    27.50    29.70

The contents of temp01.txt are:

B    2
29.65    25.46    29.44    21.81    28.75

The format of the files are:

We ask you to write a Python program, with the name file_proc.py, whose task is to read in these files and then use their contents to:

Note that you will need to put your Python file file_proc.py in the directory lab09A so that it can read the files temp00.txt, temp01.txt etc. This is because, by default, a Python program will look for files in the directory that it is in. However, it is possible to put the Python program file in one directory and the data files in a different directory. If you want to learn how to do this, you can consult this text, in particular Section 11.2.

If you want to check some answers for this question, see here.

Hints:

Part B: numpy computation 

This exercise is based on the Python file distance.py. We suggest that you download the file distance.py and open it in Spyder because it will make it easier for you to follow the description below. 

The Python file distance.py defines two numpy arrays with the name pos and ref. The array pos has a shape of (6,2). You can consider each row of pos is used to store the position of an object. For example, pos[0,0] and pos[0,1] store, respectively, the x- and y-coordinates of the first object.

The array elements ref[0] and ref[1] store, respectively, the x- and y-coordinates of a reference point.

Your task is to compute the distance between each of the 6 objects in pos from the reference point. If the co-ordinates of the object is (x,y) and those of the reference point are (a,b), then their distance is given by the formula:

We require that you complete this task using numpy functions and arithmetic operators. You are not allowed to use any loops. You can also find the expected answers in the file distance.py.

Hints: You will need to use numpy broadcasting discussed in Week 7A's lecture and some numpy mathematical functions (link to numpy manual page on maths functions)

Part C: numpy data analysis

In Part B of Week 7's lab, you used numpy to analyse the data on the sea ice extent. In this exercise, you will use the same data set to perform some additional data analysis using numpy.

You will need to download these two files: sea_ice.txt (the data file, which was also used in Week 7) and sea_ice_lab09.py which is the Python file that you will use to complete this exercise.

The existing code in sea_ice_lab09.py does exactly the same preliminary processing on the data in sea_ice.txt that we asked you to do in Week 7. You had to type the code yourselves in Week 7 but we have done it for you this time. If you run the existing code in sea_ice_lab09.py, it will create three numpy arrays and plot a graph. The numpy arrays created are: years, months, data_sea_ice. Some information that you need to know are:

The graph plotted by sea_ice_lab09.py contains 35 curves which show the annual variation of sea ice extent. You can see that the sea ice extent is larger in the beginning of the year (which corresponds to the Northern winter) than the later part of the year.

The following is a number of questions on the data set which you should answer using numpy. There is a restriction that you must not use any loops in your answer. If you want to check your answers, some of them are on this page.

  1. Compute the mean sea ice extent using all the measurements in years 1987 to 1999 inclusively. There are multiple methods to do this but we ask you to use Boolean indexing.
  2. Compute the mean sea ice extent using all the data in the last 3 months of all the years in the data set. We would like you to use two different methods to answer this question. The first method is to use the colon (:) notation to select the appropriate columns and the second one is to use Boolean indexing. Both methods should give you the same answer. If you get the answer 10.663 or 9.477, then something is incorrect.
  3. Compute the mean sea ice extent using all the data in the first 6 months of the years 2000-2009 inclusively. We ask you to do it with Boolean indexing.
  4. The 2-dimensional array data_sea_ice contains half-monthly data, but you want to work with monthly data. You want to obtain a 2-dimensional array sea_ice_monthly which contains monthly data. The matrix should have 35 rows and 12 columns. Column 1 contains the average of the two measurements in January, Column 2 for February, etc. The problem is to compute sea_ice_monthly from data_sea_ice without using any loops. This question is identical to Question 8 in Week 7's lab on sea ice. You were asked to use the reshape function in that exercise. In this question, we ask you to use the double colon (::) notation to select the correct columns. 
  5. What is the largest decrease in sea ice extent between two consecutive half-monthly measurements?
  6. Each year, the sea ice extent peaked in a certain half-month. Which half-month was the annual peak most frequently found? 

At the End of the Lab

You should be able to show your tutor the exercises. You should be comfortable with using: Python file processing; numpy broadcasting, indexing.

Finally, do not forget to complete your online multiple choice question if you have not done it yet.  

If you have completed everything, please do not forget to logout. Simply double click on the "Log Out" icon