Hi there,

I am a PhD student in Operations Research and developed an Introduction to Python & Programming course for the Bachelor program at a business school.

Due to Corona, I got a chance to video tape it thoroughly (should have done that earlier).

Maybe someone here finds my lectures (playlist: https://www.youtube.com/playlist?list=PL-2JV1G3J10lQ2xokyQow…) useful or knows someone who does. The recordings are 25 hours in total. If you do the readings and exercises, you should allocate 90 – 120 hours roughly.

The GitHub repo is really an interactive book in Jupyter notebooks: https://github.com/webartifex/intro-to-python

As I didn’t study CS myself, I think I provide another angle for newbies.

Pull requests with improvements are highly welcome. The materials contain lots of references to the Python community as well. I love this community: I started without any knowledge of Python in 2014 and the many conference talks and repos really help when learning.

Stay safe everybody, Alex

Forgive me if I missed this, but I would highly recommend talking about virtual environments, pypi, and the existence of other things like pipenv.

The environment/dependency management story for Python is such a tire fire and I wish someone introduced me to it from a reasonably high level at the very beginning of my time with Python.

It doesn’t have to dig deep. Just needs to talk about what these things are, why they exist and why it’s such a challenge.

Too often I think people skip over these topics, considering them as nothing more than a means to an end. But by far and wide my largest challenges with Python has been environment and dependency management.

I agree that this is important.

My personal choice is to use pyenv together with poetry on a Linux machine.

Yet, please understand that I teach this course to absolute (!) beginners. That is why I chose to use the Anaconda Distribution in class, which comes with every package installed that I want my students to study.

Making your first steps in programming is hard enough and so I want to keep the “boiler plate” as small as possible.

But dependency management will be part of the more “advanced” lectures I will do in the future.

I hear you. There’s probably no one “right” answer because as I type this, I’m already conflicted with my own opinion. There’s already a mountain of important topics to teach.

What’s on my mind is that all of the “meta” parts of programming are usually seen as “boilerplate” when they make up such a critical part of “authoring” meaningful software.

I’m approaching this concern from my personal experience in university. I took a 100 level “CS for Non-CS majors.” We did all the formal CS stuff like talking about variables, functions, loops, recursion, etc. but by the end of the course I was still never taught how to take my java applications and package them in a way that I could share with my friends. It felt very much, “you can do so much with a computer… as long as you stay within the environment we rushed you through setting up and use the few libraries we gave you.” (I wrote a Reversi game for my girlfriend and shipped it by installing the entire development environment on her laptop! Hah.)

I guess my $0.02 is, maybe near the start, I would have loved a little detour block, “If you want to learn how to take these Python scripts/programs we’re writing and share them with the world, check out this advanced chapter at the end of the book!” The same might probably be true for, “if you want to play with other Python libraries, take a peek at this chapter that talks about pypi and environments!”

As a related note, this is what I think we really lose as we move to computing environments like the iPhone with absolutely mandatory code signing.

I’ve heard it said that “oh, if you’re a kid with an iPad and you just want to learn, you can download the little Swift playgrounds app, easy!”

But the thing is that the children I’ve worked with don’t want to create stuff inside a toy app that only other people with the toy app can use—they want to create programs that feel real. Sure, maybe the only thing that real app does is display a guess-the-numbers game, but still, you can install it on the computers of your parents and friends and say “look, I made this app!”

iOS devices just don’t let you do it. Android, Windows, and macOS does, and I really hope we don’t loose that in the name of security.

It might seem painful, but the fact is students will be reading lots of material besides what you personally prepare. That material will be referencing all sorts of “advanced” boilerplate things.

One of the highest-impact things you can do is orient them in a way to understand outside material more effectively. It doesn’t have to be comprehensive, but they’re going to see things like “pipenv” and “virtualenv” in outside material sooner rather than later.

I second this. Sure, there’s a huge value in the beginner-friendly structure that Jupyter notebooks provide, but it’s also a limiting factor as it’s just a very small subset of the Python experience (I’ve personally never edited a single .ipynb file in my life)

Note, this is intended for people who will use python for data science, not for future software engineers.

Data scientists using python but not using jupyter notebooks is the exception.

That is correct.

But even then, I think in the future the data scientist role and the data engineer role will converge and the person doing this must know virtual environments.

In the future we will likely have different tools to handle dependencies and environments. If these things aren’t immediately useful, it might be worth teaching other stuff instead.

I agree. One of my most memorable unanswered questions from my first C programming class is where the things we imported came from. I wondered how to inspect them so I would know what’s available in those libraries, how to use it and to some extent how I would build my own. I moved on from C and so in the end it didn’t matter much, but when you do settle into a language long term it is really helpful to know how to navigate docs, system/standard library and packaging tools so that you can participate in the ecosystem effectively.

Amazing content, thanks for making it public! For people lurking, it has Notebooks, Quiz (review) questions, AND exercises. This is very good.

@webartifex it could help the spread of your course if you could put the Notebooks in a self-hosted Jupyter Notebook environment, so students don’t have to install anaconda and all that.

Shameless plug here, I’m the creator of Notebooks.ai, which is a hosted Jupyter Lab environment for students, 100% free (we only charge big schools, so teachers and students are free to use). Here’s a quick demo of your first lecture: https://notebooks.ai/santiagobasulto/intro-to-python-demo

And aside from ours, there are other options like mybinder or Google Collab.

Good tips. I will look at this.

I am planning to use Google Collab for a totally remote class next weekend to set up group work.

At this point, I feel like there are too many “Intro to Python” courses floating around. It will take a newbie, multiple hours of reading reviews and recommendations to find a course. There are many good ones out there.

I wish there was more effort spent on creating intermediate courses. It would be great if there were more people trying to write books like this Nicolas Rougier is trying (https://github.com/rougier/scientific-visualization-book). Take a specific library and help people become proficient in them. Usually, the core contributors to a library are not always the best people to teach people how to use that library.

Some example topics that took me a lot of reading from different sources to understand.

– how to write a python library that you can host for public/private use

– adding test coverage to data science python projects

– learning libraries like matplotlib, seaborn beyond what you see in tutorials

I think material for all fo this exists in different sources like documentation/stack over flow but either it’s too detailed or too superfluous. The middle (intermediate) layer is often missing.

I get you.

The plotting tools I would actually consider in an “Intro to Data Science” course, not in an “Intro to Programming”.

I started to write a library implementing Gilbert Strang’s Linear Algebra book assuming nothing but core Python. What inspired me to do that was reading Philip N. Klein’s book “Coding the Matrix”. I thought writing a LA library for fun and study purposes makes sense if you want to go on and study data science. However, that project is not yet ready to be published. Would that be what you are looking for?

I agree with you about the tools being in a data science course. What I’m pointing out is a lot of people have spent energy creating intro to programming courses but similar amount of effort has not been spent in creating intermediate courses that help people become proficient with the tools of data science. It’s usually “intro” level content or library documentation.

I think writing code really helps you understand the algorithms in more depth so I’m all for exercises for LA like you suggest.

> The course’s main goal is to prepare the student for further studies in the “field” of data science.

Does the material seem to have major gaps for DS to anyone else? There’s no pandas, no matplotlib, no ML. It seems like more a tutorial with a computer science focus, with recursion and bit manipulation. Those are great programming topics but rarely used by data scientists. For that I would probably use .py files, not jupyter notebooks. It just doesn’t align with my experience as a Python-based data scientist.

Pandas and numpy & friends will be in Chapter 9 that I had not yet time to finish (the semester runs until end of April and is currently at Chapter 5).

Furthermore, it is a programming (!) course, not (!) a data science course.

Please define data science to me. Is it “only” curve fitting to you? Or also optimization (e.g., in logistics)? Then, dynamic programming (and because of that recursion) is super important.

I rely on Jupyter notebooks mainly because it is a course for people with absolutely no prior experience that also do not major in CS.

.py files are actually explained in Chapter 2 and will be used in a follow-up course.

What I never understand about intros like this (even intro courses in college), is why not start with types? Reading the content, it’s obvious there’s something different between a string and a number, and a list (or a dict) is obviously very different again. You quickly figure out that these things are also capable of doing completely different operations, but why? And what things (methods) are they capable of?

I was helping a friend who decided to start a CompSci degree recently (the 101 course was taught in python), and she was massively struggling with type coercion and understanding type methods. Looking at the course forum, so was everybody else. I helped her understand what types were, why they were important and what the built in type methods were (and how to read the python documentation), and it was a major breakthrough for her. She’s now getting close to 100% on her tests.

I was also blown away by how archaic some of the content was, like using .format() instead of fStrings, but that’s a seperate topic.

I introduce the idea of a type right in chapter 1 because of exactly the argument you make.

As Python is really more about the behavior of objects and not so much their type, I introduce these already from chapter 4 onward, for example, iterable vs. container, and many more. I actually would say that this is the essence of any dynamic language (duck typing).

“and how to read the python documentation” -> that is an important point you raise!!! I found that beginners have real trouble reading the docs because they are screening for words like “list” instead of “iterable”. However, as I teach abstract behaviors early, they actually understand the docs.

“like using .format() instead of fStrings” -> I mention .format() but mainly use f-strings and tell the students right away that they are both faster and easier to read and that they should default to them.

Wow this is really thorough and well thought out. It’s not just an “intro” but goes into all the language builtins, and common usage patterns (e.g. zip, map, etc), which many “intro-to” style courses skip over.

Code in repo is MIT licensed and videos are tagged as Creative Commons (CC BY). Finally a good OER course for people to learn Python!

The Python community is basically one big OER course. Just look at all the recordings from the many PyCons.

I have a second half for the “book” planned but “unfortunately” also need to do some research for my PhD.

Maybe someone wants to provide some more advanced chapters via a pull request?

Can I suggest adjusting it for some other consumer platforms, such as, for example, Pyhtonista on the iPad? That would make it easier to divulge in these turbulent times when many people are stuck using somewhat unusual instruments and platforms.

Is it really this easy to become a data scientist today? Can you go from 0 to 1 with online programming tutorials like the original post and fast.ai?

It depends on who wants to hire you. For the top jobs, this is for sure not enough.

I always tell people to please make sure they know all the contents from here https://www.youtube.com/watch?v=ZK3O402wf1c&list=PL49CF3715C… (Gilbert Strang’s Linear Algebra course) before they make any claims about being a data scientist. I bet I can teach a monkey to open a CSV in pandas and call .fit() in sklearn. But do the people really understand the underlying assumptions. Most self-proclaimed data scientists don’t I am sure.

I still have a hard time calling myself a data scientist. And I am three years into a relevant PhD. The more I study, the less I feel I truly know.

You still need to know SQL, and algorithms are helpful unless you expect to be spoon-fed pristine datasets that are ready for analysis.

The applicant pool seems to be full of those who took an online course and whose “personal” projects consist entirely of projects ripped from fast.ai or similar. Anymore, I get seem to get more spam on LinkedIn from people looking to be hired as a data scientist at my company than I do from recruiters looking to hire me. And looking over their resumes, I can see why they need to hustle so much. Successful candidates need to know how to do more than classify pet images from the Oxford dataset.

So to answer your question…Maybe? I mean, these candidates are certainly trying much harder than I ever needed to for a job. But presumably they eventually get hired into roles with less discerning companies.

Agree. I actually found that I set up a Postgres instance to put the data of an object and use pandas only via sqlalchemy. Most of the stuff pandas does, can be achieved a lot more efficient if done right in a real database.

I also agree about the algorithms part. For my research, I look into vehicle routing problems a lot and there is no sklearn for that or something alike. Maybe an idea for a future project.

An experienced data scientist with deep knowledge in the topics you mention easily replaces 10 of those “candidates”. I feel the best way to train more serious programmers and data scientists is to teach kids to program earlier in high school. Maybe make it mandatory just as math is today. Then, a lot more students may choose CS or math as a major in college.

I consider my course a thorough course. You really learn a lot of theory. This is usually what makes students at my university drop the course after two weeks of auditing. Just because the course assumes no prior CS contents (anything really, except high school math), does not mean they don’t get introduced.

Oh I wasn’t suggesting anything against the quality or content of your course – perhaps I misunderstood the comment I replied to, but I took it as:

> High schoolers are really just doing some web course and then getting data scientist jobs?

so my response was:

> No – or at least this isn’t evidence of that – the author is a postgraduate student and the target audience appears to be other graduates/university students.

I hope there are no companies that hire a “data scientist” that only speaks HTML 🙂

The course aims at students of business administration. What do they usually do? Maximize some profit function. So, my background is really more traditional Operations Research.

That’s a really cool project! I will probably be recommending this to beginner Python programmers I know.

As a side note, I think sharing lessons as Jupyter notebooks is a really great idea for programming education. I would like to see more of this style of course for other programming languages as well.

I agree. Although at some point, any beginner must be exposed to the command line 🙂

My observation about myself when studying is this: Nothing, no video course, beats a well written book. And, such books are often “slow” reads as you have to think a lot when reading. I found Jupyter notebooks a good instrument to put a lot of info in. My guess is that there is three times the info in my course’s notebooks as there is in the 25 hour video recordings.

Awesome work!

I can’t wait to test this on my non-programmer friends who want to learn how to code.
This seems to have a somewhat different approach to the usual Python tutorial, it might do the trick!

Thanks

I tried to put in everything I myself would have wanted to know when starting in 2014.

That is why I spend so much time on the memory diagrams in the videos in particular. It literally took me years to figure that out. I have watched a lot of online lectures (e.g., from OCW), but only rarely do instructors talk about that. Maybe formally studying CS would have done the trick for me as well 🙂

It’s not necessarily covered in a traditional CS program, it really depends on the instructor.

This is a good resource.

Regarding the videos: they’re hard to watch because the text size is very small and it seems to be blurry due to compression/encoding issues and in some of the videos, the audio track has a high-pitched hum in the background that makes it very hard to listen to.

If you still have access to the source materials, you should seriously consider cleaning up the sound and re-encoding the videos for better quality (as well as making the text bigger where possible).

Two videos have the hum. This cannot be changed because the lecture capture system in the lecture room does not allow this (It’s Panopto).

The quality is actually high enough. Maybe that is due as YouTube is currently lowering the quality due to Corona?

My plan is to incorporate the feedback everyone gives in the next weeks and do the recordings again in studio at the end of summer for the next semester.

The videos are actually intended to only support the chapters in the Jupyter notebooks.

The funny thing is this: Once I started doing the memory diagrams in the lecture, the students’ work exhibited a lot more quality. Further, absolute beginners understand, for example, the concept of an iterator vs. iterable.

BTW: chapters 9 and 10 will be released soon on GitHub.

> The quality is actually high enough. Maybe that is due as YouTube is currently lowering the quality due to Corona?

Maybe. Also, for newer videos YouTube takes a while to expose higher qualities. I see now that the videos have very few views and that only 720p is available. This could also be the reason.

> Two videos have the hum. This cannot be changed because the lecture capture system in the lecture room does not allow this (It’s Panopto).

You can probably get rid of the hum pretty easily with pretty basic filtering in a video/audio program (or even a Python library).
You might also be able to re-upload just the audio tracks. Just say if you want my help with this.

> The funny thing is this: Once I started doing the memory diagrams in the lecture, the students’ work exhibited a lot more quality. Further, absolute beginners understand, for example, the concept of an iterator vs. iterable.

I’m not surprised. I’ve been saying for a while now, that learning programming is analogous to learning how to play an instrument (e.g. the piano) where you need to learn the right techniques and concepts early on if you want to be able to become highly skilled later on. Otherwise it’s unlearning and relearning a bunch of stuff, which is less than optimal.
Thank you for validating this idea of mine, somewhat.

The videos are actually available in 1080p.

If you can re-do the audio and tell me how I can do that in YouTube without re-uploading the video (it’s embedded, for example, in the university’s Moodle), I will gladly do that. If not, as I said, I’ll do proper recordings once I got more time.
I did up to 4x 2.5 hour videos a day last week to get the lectures up on time. You can actually hear that in my voice sometimes.

I absolutely agree with your approach to learning.

Can anyone recommend a good document camera / visualizer that works on Ubuntu Linux? The reason I had to use the lecture capture system in the lecture rooms is mainly because I don’t own a document cam. Ideally, the cam would also work with Zoom. Many thanks in advance.

Is anyone aware of a repo with an introduction to Angular (2+) that’s a similar style and equally high quality?

Why Chapter 6 has no exercises? (It’s not a complain. Just asking in case you forgot to link it.)

They should be ready tomorrow evening. Within the next two weeks, a missing chapter 9 on arrays will also be there.
I spent around 50 hours doing the videos in the last couple of days while also holding remote office hours for the 80 students in my class at the university. Crazy situation at the moment, putting everything online due to Corona. We even had an infected student on campus, luckily mild version.

Didn’t think anyone would comment here as there are so many Python intros out there 🙂

Source Article