The Top 5 Most Forked Github Repos: 2018

It’s almost 2019, and what a year it’s been.  One year ago today, I was struggling to get through the final few days before winter break as a tenth grade math teacher, and struggling to convince my advisees to get their college apps in and their college visits completed.  Today, I’m sitting comfortably on the other side of Fullstack Academy’s software engineering immersive and spending my week interviewing and completing coding challenges….whew!

As a new software engineer with some time on my hands, I found myself wondering about the most popular Github repos OF ALL TIME.  I’ve certainly forked a fair few in my day, but what are the best ones?  The most popular ones?  Well, here you go.  In this post, I’ll list the repos, and then try to explain and/or uncover why they’re so popular.

Screen Shot 2018-12-19 at 7.15.20 PM
To find the top 5 most forked repos, you can search Github for ‘stars:>1’ and then sort by ‘Most forks’ on the righthand side of the page.
  1. datasharing by jtleek

Wouldn’t it be great to be able to list the most forked repo of all time on your resume?  This one is interesting, because it’s just a README!  Like, a really good one, but still.  As of this post, it has 191,505 forks and 4,580 stars.  Issue #217 on the repo asks why it’s the most forked of all time, and the top answer says that it’s a required fork of a Coursera course that focuses on Github.  I asked a world-renowned data scientist (my husband) about this repo, and here was his response:

Screen Shot 2018-12-19 at 7.28.57 PM
We were texting from 2 rooms away.

2. ProgrammingAssignment2 by rdpeng

This is an assignment for a Coursera course in R that asks students to cache some computations.  Is Coursera advertisting this stuff somewhere?  That the top 2 Github repos of all time are forked by their students?  As of this post, this repo has 114,434 forks and 567 stars.  Questions raised by this repo: Where is ProgrammingAssignment1?  Does it not require a fork?

3. Spoon-Knife by octocat

This repo is an example of how to fork a repo!  Makes sense that it’s been forked a fair few times.  Also…get the name?  GET IT?  As of this post, this repo has 101, 816 forks and 10,079 stars.

4. tensorflow by tensorflow

Now we’re getting to the good stuff!  From the README:

TensorFlow is an open source software library for numerical computation using data flow graphs. The graph nodes represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture enables you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code. TensorFlow also includes TensorBoard, a data visualization toolkit.

This repo was created by the Google Brain team, and is used and interacted with in Python, although it performs a lot of the work using C++.  TensorFlow essentially makes machine learning easier to work with by abstracting away a lot of the messy details.  It allows a developer to focus on what they want to happen and what the logic should look like, rather than fussing around with the specific inputs and outputs of various functions that you need to string together.  Instead of learning how to create my own deep neural network from scratch, I can use TensorFlow.  It was first released by Google as an open source project about 3 years ago (November 2015.)  It’s cool.  Five stars.  Here’s a video of a talk called Effective TensorFlow for Non-Experts, originally presented at Google I/0 2017, that does a good job of explaining its purpose and functionality.

As of this post, TensorFlow has 70,708 forks and 117,010 stars (and is the first repo on the list with more stars than forks – you go, Glen Coco!)

5. bootstrap by twbs

There’s something oddly pleasing to me that the #4 most forked repo was a deep, data-heavy, backend kinda tool whereas the #5 repo is a frontend framework.  It’s Bootstrap!  Bootstrap was initially created by Twitter developers and was initially released in 2011.  I’ve never actually used Bootstrap, being more of a Semantic UI kinda gal myself, but I remember hearing it thrown around all the time and eventually googling it to see what the deal was.  Essentially, Bootstrap has a ton of built-in CSS and JS that allows you to make your site look pretty without having to reinvent the wheel every time.  If you’ve used something like Semantic UI or Material UI, you’re probably familiar with the general concept of a frontend framework.

As of this post, Bootstrap has 63,605 forks and 129,469 stars (and one of the stars is from me!)

So there you go!  The top 5 most forked repos.  I think it’s fascinating to peek into what other developers are forking, and I’m already considering a future post to delve into the top 10 most forked repos (potentially ignoring any repos that are course-specific assignments…)

What’s the most useful repo you’ve forked?

Floating Point Numbers (& Mario)

When I first came across the term floating point, it was as a simple definition.  This one comes straight from JavaScript & JQuery by Jon Duckett:

floating point number is a real number that uses decimals to represent a fraction.  The term floating point refers to the decimal point.

When reading this, my math-teacher-brain, said, “Okay, so…a decimal.  Cool.”  But a few months later, I ran into a StackOverflow question that questioned my simple categorization of floating point numbers.  The heart of the question was this image:

Screen Shot 2018-12-12 at 1.11.28 PM
The accompanying text was (and I’m paraphrasing slightly here), “WTF, JS?” 

At this point, being knee-deep in a capstone project, I added ‘JS floating point weirdness’ to my list of potential future blog posts and carried on with my day.  But now here I am, with much more time on my hands, and a quest to explore this strange JavaScript behavior.

The original StackOverflow question had a an excellent response that explained a lot of the logistical reasons for these odd outputs.  I recommend reading the top answer there in its original form, but to paraphrase: computers have to store numbers, and they have to strike a balance between precision and space.  If I am 1/3″ taller than 5’9″ and I want to save my height as a variable, it’s unreasonable (and impossible) for the computer to save that irrational 0.3 decimal….because it doesn’t really matter and would take up A TON (well…infinite) space.  Enter floating point to strike a compromise.

The next bit of this explanation comes thanks to The Floating Point Guide.  A floating point number is comprised of two parts: the significand stores the number’s digits and can be positive or negative.  The exponent says where the decimal point is placed in relation to the significand.  If you think back to learning about scientific notation in high school chemistry, you’re on the right track.  Here’s a handy diagram, again from The Floating Point Guide:

Screen Shot 2018-12-12 at 1.26.36 PM
If this is all making sense to you, send a thank-you email to your 10th grade science teacher!

There’s also a whole standard called IEEE 754 that dictates exactly how all of this goes down, which you are free to dive into.  After reading all about scientific notation, I still wanted to know EXACTLY WHY I wasn’t getting 0.3 when I typed in 0.1 + 0.2.  WHY?

Well….floating point numbers are just a variation of scientific notation.  Essentially, they use base 2 instead of base 10, and they’re stored as 32 bits. The first bit is for the sign, the next 8 bits are for the exponent, and the remaining 23 bits (called the mantissa…nerdy baby name anyone?) is for the significant digits of the number.  Here’s a visual:

Screen Shot 2018-12-12 at 1.36.38 PM

There’s an equation to calculate the value of the number given the sign bit, the mantissa, and the exponent, but I’ll spare you.

Getting bored?  Cool.  Let’s talk Mario!  Seriously.

Super Mario 64 has a crazy glitch that is CAUSED BY FLOATING POINT NUMBERS BEING CRAZY.  Up above, we described the floating point system as a compromise: I want to be able to store a ton of numbers but not use up a ton of space.  One of the weird implications, though, is that the numbers you can store is not evenly spaced across a number line.  As it turns out, I can represent a ton of numbers that are close to 0, but fewer and fewer as we move out towards infinity.

Screen Shot 2018-12-12 at 1.41.27 PM

Each power of 2 has an equal number of possible float values, which means that as you increase the powers of 2, you also drastically (exponentially!) increase the distance between float values.  Cool.  For small-ish numbers, this really doesn’t matter much.  If Mario’s coordinates are represented as X and Y where both are floating point numbers, we can just round to the nearest one.  If Mario is very close to (0, 0), the rounding is going to be teeny tiny and completely unnoticeable.  As he gets farther away, the game might look a bit jumpy as his X coordinate stays the same…then suddenly rounds up to the next floating point.  At some point, he can’t move anymore!  The distance between floating point numbers is so great that poor Mario is stuck.

For a truly awesome explanation of floating point math and examples of what this looks like in gameplay, check out this video by UncommentatedPannen.

hero_chara_mario_pc
Happy coding, from me and Mario!