The Top 5 Most Forked Github Repos: 2018

It’s almost 2019, and what a year it’s been.  One year ago today, I was struggling to get through the final few days before winter break as a tenth grade math teacher, and struggling to convince my advisees to get their college apps in and their college visits completed.  Today, I’m sitting comfortably on the other side of Fullstack Academy’s software engineering immersive and spending my week interviewing and completing coding challenges….whew!

As a new software engineer with some time on my hands, I found myself wondering about the most popular Github repos OF ALL TIME.  I’ve certainly forked a fair few in my day, but what are the best ones?  The most popular ones?  Well, here you go.  In this post, I’ll list the repos, and then try to explain and/or uncover why they’re so popular.

Screen Shot 2018-12-19 at 7.15.20 PM
To find the top 5 most forked repos, you can search Github for ‘stars:>1’ and then sort by ‘Most forks’ on the righthand side of the page.
  1. datasharing by jtleek

Wouldn’t it be great to be able to list the most forked repo of all time on your resume?  This one is interesting, because it’s just a README!  Like, a really good one, but still.  As of this post, it has 191,505 forks and 4,580 stars.  Issue #217 on the repo asks why it’s the most forked of all time, and the top answer says that it’s a required fork of a Coursera course that focuses on Github.  I asked a world-renowned data scientist (my husband) about this repo, and here was his response:

Screen Shot 2018-12-19 at 7.28.57 PM
We were texting from 2 rooms away.

2. ProgrammingAssignment2 by rdpeng

This is an assignment for a Coursera course in R that asks students to cache some computations.  Is Coursera advertisting this stuff somewhere?  That the top 2 Github repos of all time are forked by their students?  As of this post, this repo has 114,434 forks and 567 stars.  Questions raised by this repo: Where is ProgrammingAssignment1?  Does it not require a fork?

3. Spoon-Knife by octocat

This repo is an example of how to fork a repo!  Makes sense that it’s been forked a fair few times.  Also…get the name?  GET IT?  As of this post, this repo has 101, 816 forks and 10,079 stars.

4. tensorflow by tensorflow

Now we’re getting to the good stuff!  From the README:

TensorFlow is an open source software library for numerical computation using data flow graphs. The graph nodes represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture enables you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code. TensorFlow also includes TensorBoard, a data visualization toolkit.

This repo was created by the Google Brain team, and is used and interacted with in Python, although it performs a lot of the work using C++.  TensorFlow essentially makes machine learning easier to work with by abstracting away a lot of the messy details.  It allows a developer to focus on what they want to happen and what the logic should look like, rather than fussing around with the specific inputs and outputs of various functions that you need to string together.  Instead of learning how to create my own deep neural network from scratch, I can use TensorFlow.  It was first released by Google as an open source project about 3 years ago (November 2015.)  It’s cool.  Five stars.  Here’s a video of a talk called Effective TensorFlow for Non-Experts, originally presented at Google I/0 2017, that does a good job of explaining its purpose and functionality.

As of this post, TensorFlow has 70,708 forks and 117,010 stars (and is the first repo on the list with more stars than forks – you go, Glen Coco!)

5. bootstrap by twbs

There’s something oddly pleasing to me that the #4 most forked repo was a deep, data-heavy, backend kinda tool whereas the #5 repo is a frontend framework.  It’s Bootstrap!  Bootstrap was initially created by Twitter developers and was initially released in 2011.  I’ve never actually used Bootstrap, being more of a Semantic UI kinda gal myself, but I remember hearing it thrown around all the time and eventually googling it to see what the deal was.  Essentially, Bootstrap has a ton of built-in CSS and JS that allows you to make your site look pretty without having to reinvent the wheel every time.  If you’ve used something like Semantic UI or Material UI, you’re probably familiar with the general concept of a frontend framework.

As of this post, Bootstrap has 63,605 forks and 129,469 stars (and one of the stars is from me!)

So there you go!  The top 5 most forked repos.  I think it’s fascinating to peek into what other developers are forking, and I’m already considering a future post to delve into the top 10 most forked repos (potentially ignoring any repos that are course-specific assignments…)

What’s the most useful repo you’ve forked?

The Mysterious Magic of Webpack

This week we finished up our e-commerce group projects on Wednesday and then started a short solo hackathon sprint.  I decided to explore a machine learning API called Clarifai to build an app that can recognize houses, faces, and houses that look like they have faces.  It’s not quite done yet, so more on that later.

I started this project using boilerplate code, and I realized that there was a lot of magic going on behind the scenes that I didn’t fully understand.  Generally, I knew what all of the different libraries were doing (Travis was doing something with testing before deploy, webpack was bundling code, etc.)   I’ve decided to investigate each of these libraries that I came across a bit more deeply, because I want to be able to provide more eloquent explanations of the tools I’m using to support my code.

First up: the mysterious Webpack.

After reading through the documentation, I’m using a metaphor of a vacation packing list to help me understand what webpack is doing with my JavaScript files.

  • Webpack is a static module bundler.  Modules are chunks of functionality.  Imagine that my packing list is on my computer and I’m using links to other, smaller packing lists that live elsewhere.  Let’s say that my Clothes packing list module includes a link to a Beachwear list, some items I’ve just typed in, and another link to a Hiking list.  The bundle created would include all of the other lists referenced, which are called dependencies.  Bundling allows the browser to load fewer chunks of information in order to display your app – instead of having to click through to see all of these packing lists, it just gets one master list to display.
  • The webpack entry point is like saying, ‘Hey application!  I’m gonna give you some bundled instructions, and to do them, you gotta start at the Clothes part of the list.”  Webpack defaults to using ./src/index.js as its entry point, just like I default to starting by packing clothes first.  Both of us are flexible if given other specific instructions.  Maybe you’d prefer to start packing your makeup first.
  • The webpack output is like saying, ‘Hey application!  I made you all these nice instructions on how to pack for your trip, and I’m putting them all together in a list named [something].bundle.js.  Webpack defaults to ./dist/main.js, but again, that can be changed in the config file.
  • Loaders allow webpack to interpret files it’s not typically familiar with.  Without loaders, webpack can handle JavaScript and JSON.  Loaders are like giving webpack the tools to read some of the packing list that’s in French without freaking out.  It can still include those files in its master packing list file.

There are other components of webpack that I haven’t touched on here, but going through the basics and comparing them to my imperfect-but-helpful packing example definitely makes me more clear about what webpack is doing with my application.  It’s essentially a streamlining tool that takes a whole mess of files that all reference each other and creates a simpler, fewer-layer to do list for the browser to implement.