Python
The main program language we will use in this course to handle atmospheric data is python. Before getting into “What is Python?”, here is one of the statements that shows why we want to learn it
I have used a combination of Perl, Fortran, NCL, Matlab, R and others for routine research, but found out this general- purpose language, Python, can handle almost all in an efficient way from requesting data from remote online sites to statistics, and graphics.
What Is Python?
These features, perhaps, come with a minor cost of reduced language performance, but this is a trade-off the vast majority of users are willing to make in order to gain all the advantages Python has to offer.
what you can do with Python?
Python is not just for atmospheric sciences. It has a wide area of application, and earth science is one of them. Largely, you may categorize applications into:
- Web development
- Data science
1. Web development
Python can be used to create the webpage. We all know that webpage is basically html files (not Python). Although you can edit html files for the webpage, you can also use python to create html files for you. This is what I did to create the ATM2106 class webpage.
Some sites wait for the input from the users and process the job before delivering the results back to users (Like Amazon). Python can play an important role in these dynamic websites. For example, one of my friends build the website called Trevii which helps you organize the trip after gathering informations online. The backbone of this website is also python!
2. Data science
In some sense, the purpose of using Python in this course is to do data science. Python is efficient when handling a large dataset. It does not necessarily faster than other programming languages like Fortran or C, as mentioned above. This is because Python has to figure out the data type while users specify it for Fortran or C. If you tell Python the data type, then it can process the data with much higher speed (approaching the speed of Fortran or C).
Here, Data science includes machine learning! Here is an example from Towards Data Science.
The goal of this machine learning is to find out how to combine three numbers we provide. The first task is to generate training set.
from random import randint
TRAIN_SET_LIMIT = 1000
TRAIN_SET_COUNT = 100
TRAIN_INPUT = list()
TRAIN_OUTPUT = list()
for i in range(TRAIN_SET_COUNT):
a = randint(0, TRAIN_SET_LIMIT)
b = randint(0, TRAIN_SET_LIMIT)
c = randint(0, TRAIN_SET_LIMIT)
op = a + (2*b) + (3*c)
TRAIN_INPUT.append([a, b, c])
TRAIN_OUTPUT.append(op)
The training set consists of 100 sets of three numbers, a
, b
, and c
, and op = a + 2*b + 3*c
.
You can adjust the size of the training set by modifying TRAIN_SET_COUNT
.
Now, we will train the machine with this dataset. The package scikit-learn
allows us to do machine learning easily.
from sklearn.linear_model import LinearRegression
predictor = LinearRegression()
predictor.fit(X=TRAIN_INPUT, y=TRAIN_OUTPUT)
The machine learning is done, and predictor
will compute op
with three inputs.
X_TEST = [[10, 20, 30]]
outcome = predictor.predict(X=X_TEST)
coefficients = predictor.coef_
print('Outcome : {}\nCoefficients : {}'.format(outcome, coefficients))
We see that the coefficients from machine learning are exactly same as the one used in the training set. There are online courses from Stanford and Caltech if you are interested in learning it more.