Creating a Seed File with Neo4j

Neo4j is a graph database that differs from a more traditional relational database (like a SQL database, for example) in the way that it stores relationships between nodes.  While a relational database can hold foreign keys that reference other model instances, Neo4j has edges that connect nodes, and these edges are POWERFUL.  An edge isn’t just a blank connection, it can hold information about the connection and has a direction.  If I follow you on Twitter, there might be an edge connecting our nodes that says how long I’ve been following you (I have no idea how Twitter works, is that obvious?) The node would be unidirectional if I’m following you and you’re not following me, or it could be bi-directional if we’re mutually following one another.  Queries are also written differently, in a language called Cypher.

Just as thinking about database relationships in graph form involve a bit of a paradigm shift, thinking about seeding data involves similar flexibility.

A bare-bones PostgreSQL seed file may have a function like this:

Screen Shot 2018-11-16 at 10.10.30 AM

I’ve imported my database (db) where I defined all of my models, and I’m mapping over a campus array and a student array.  Both of these arrays contain instances that match up to the way I set up the models.

Neo4j seeding is a similar idea, but different in actual implementation.  The similarity is in what you’re feeding into your seed file.  It’s still expecting some data structure that contains a bunch of future-nodes – the things you’re putting in your database.  For this particular project, I was seeding my database with recipes and ingredients.

Here’s what one recipe looks like.  The recipes were generated as a result of scraping a website, so they were kind of wonky JSON objects:

Screen Shot 2018-11-16 at 10.15.02 AM

Okay, onto the actual seeding.  The function that is actually invoked within this seed file is called runSeed.  It’s essentially a helper function with some helpful console logs and 2 asynchronous functions, recipeSeeder and seed, nested inside of a try/catch.

Screen Shot 2018-11-16 at 10.17.16 AM

The seed function being invoked within runSeed is just like the PostgreSQL campus/student seed file discussed above.  For this project, we used the Neo4j database for everything, but we also had users stored in a relational database because our boilerplate code already included login/sign-up functionality that relied on it.  Why fix what ain’t broke?  We kept it.

The recipeSeeder function is what we wrote to create the rest.  It does 4 major tasks:

  1. It finds all nodes currently in the database, and deletes them.  This is the equivalent of running db.sync({force: true}).  We get a clean slate to start seeding.
  2. It makes assertions for the nodes we’re about to create.  We asserted that the recipe names were unique and that the ingredient names were unique.
  3. It maps over all the recipes in the database it receives as an argument and runs a Cypher query to create a new Recipe node if it doesn’t exist already.
  4. It maps over all the recipe’s ingredients and runs another query to create new Ingredient node if it doesn’t exist already.

Here’s the function as a whole:

Screen Shot 2018-11-16 at 10.36.11 AM

I thought about deleting the comments before adding this picture, but I added them to explain to my group members what was happening and I think that they’re helpful.  Same with my linter’s yellow squiggles – this is in-progress code here!

Lines 37 – 43 are doing the housekeeping: clean up all prior nodes, and make sure Neo4j knows we don’t want any duplicates.

Lines 46 – 57 are creating the Recipe nodes.  The Cypher query is in yellow.  MERGE finds an existing node or creates a new node with the parameters included in the braces, and it returns the node it finds or creates.  You could use template literals within the query to have the query interpret variables, but the structure used here is more secure and protects against injection.

Lines 58 – 78 are creating the Ingredient nodes.  We iterate through the ingredients on the recipe and run another MERGE query, then we build the relationships.  MATCH looks for a Recipe node and an Ingredient node, and the WHERE clause specifies what it should be looking for.  Here, we want the name of the Recipe to be our current recipe and the name of the Ingredient to be our current ingredient.  Then, we use MERGE again to check if there’s already a relationship.  If there’s not, MERGE builds it.  We decided to place 2 pieces of information on the edge/relationship created: quantity and type.  If my recipe calls for 2 tablespoons of honey, quantity would be 2 and type would be tablespoons.  This will allow us to implement sliders for increasing recipe quantities later on.

Lines 79 – 83 are more housekeeping: close the session, and close the driver.  Tie up everything with a nice, neat little bow.  Seeding is done!

Of course, errors are possible.  Before we used MERGE we were using CREATE (because that is a logical word to use when you’re trying to create nodes), but because of our unique constraints errors would be thrown if attempting to create a node that already existed.  MERGE solved that problem for us.  Another weird quirk about Neo4j is that you’ll get an error if you try to seed the database when your connection isn’t open – makes sense, but always takes me by surprise.  You need to go into the Neo4j browser and connect before running any queries.

So that’s it!  I’ve been using Neo4j for a grand total of 3.5 days at this point, so I’m sure I will look back on this article in a few weeks and groan, but I think it’s beneficial to chronicle my experiences with new technologies nonetheless.

 

The Mysterious Magic of Webpack

This week we finished up our e-commerce group projects on Wednesday and then started a short solo hackathon sprint.  I decided to explore a machine learning API called Clarifai to build an app that can recognize houses, faces, and houses that look like they have faces.  It’s not quite done yet, so more on that later.

I started this project using boilerplate code, and I realized that there was a lot of magic going on behind the scenes that I didn’t fully understand.  Generally, I knew what all of the different libraries were doing (Travis was doing something with testing before deploy, webpack was bundling code, etc.)   I’ve decided to investigate each of these libraries that I came across a bit more deeply, because I want to be able to provide more eloquent explanations of the tools I’m using to support my code.

First up: the mysterious Webpack.

After reading through the documentation, I’m using a metaphor of a vacation packing list to help me understand what webpack is doing with my JavaScript files.

  • Webpack is a static module bundler.  Modules are chunks of functionality.  Imagine that my packing list is on my computer and I’m using links to other, smaller packing lists that live elsewhere.  Let’s say that my Clothes packing list module includes a link to a Beachwear list, some items I’ve just typed in, and another link to a Hiking list.  The bundle created would include all of the other lists referenced, which are called dependencies.  Bundling allows the browser to load fewer chunks of information in order to display your app – instead of having to click through to see all of these packing lists, it just gets one master list to display.
  • The webpack entry point is like saying, ‘Hey application!  I’m gonna give you some bundled instructions, and to do them, you gotta start at the Clothes part of the list.”  Webpack defaults to using ./src/index.js as its entry point, just like I default to starting by packing clothes first.  Both of us are flexible if given other specific instructions.  Maybe you’d prefer to start packing your makeup first.
  • The webpack output is like saying, ‘Hey application!  I made you all these nice instructions on how to pack for your trip, and I’m putting them all together in a list named [something].bundle.js.  Webpack defaults to ./dist/main.js, but again, that can be changed in the config file.
  • Loaders allow webpack to interpret files it’s not typically familiar with.  Without loaders, webpack can handle JavaScript and JSON.  Loaders are like giving webpack the tools to read some of the packing list that’s in French without freaking out.  It can still include those files in its master packing list file.

There are other components of webpack that I haven’t touched on here, but going through the basics and comparing them to my imperfect-but-helpful packing example definitely makes me more clear about what webpack is doing with my application.  It’s essentially a streamlining tool that takes a whole mess of files that all reference each other and creates a simpler, fewer-layer to do list for the browser to implement.

 

Node.js 101

Node.js is a server environment.  It’s open source, free, runs on a bunch of different platforms, and uses JavaScript.  It works well because it’s asynchronous and event based!

Server Environment?

Think of a server as a server at a restaurant.  A customer asks for stuff, and they deliver the stuff.  In computing, it’s similar.  The server is still called a server, but the customer is called a client.  Instead of the stuff being a meatloaf dinner, the stuff is known a service – like sharing data or a resource.  The server could be a computer program or a device – it’s not always the giant room of big machines you may be envisioning.  Node.js allows you to write computer programs that can function as a server.

Okay…Asynchronous?

Here’s a metaphor.  You’re working the register at a sandwich shop.  You get an order for a PB&J, and there are 7 more people in line.  The synchronous strategy to handle this would be to ask the sandwich-maker to make the PB&J, wait while she makes it, give it to the hungry customer, then take the next order.  You keep going this way until everyone in line has a sandwich.  The asynchronous strategy would be to pass along the first order, then take another order.  Pass along the second order, then take another.  At some point, the first PB&J is done and you can deliver it, but you’re taking orders and passing them along to the sandwich-maker the entire time.

Node runs things asynchronously.  It gets a request (“i want PBJ.html!”) and lets the computer file system know about the request (“computer #1 wants PBJ.html, can you take care of that?”)  Then it’s available to wait for more request while the file system deals with the request – it needs to read it and respond with PBJ.html.  Other server frameworks, like PHP or ASP, don’t start listening for additional requests until PBJ.html has been delivered to the requester.  They’re the first strategy in the sandwich shop metaphor, while Node.js is the second.

So that’s the asynchronous part.  Sometimes it’s also referred to as “blocking” (synchronous, one sandwich at a time) vs. “non-blocking” (asynchronous, multiple orders open at once.)

Uhh…Event Based?

This is what makes Node.js fast.  In event-driven programming, the code only runs when certain events happen.  We’re not sending PBJ.html all the time, only when a request is made for it.  The events can be things like mouse clicks, pressing a key, or trying to access a certain port (more on that later.)  In any case,  the code isn’t running unless there’s a reason to run.

So What Does It Do?

Node can generate the content on a web page, it can manipulate files, it can play nicely with databases (add, delete, or modify info), or it can collect data from a form.  Cool!  Even cooler – it’s just a JavaScript file.  The file has instructions on what to do when certain events happen.

Arguably the best parts about Node.js is the Node Package Manager, or npm.   It’s the Amazon of web development (you can get anything!) and contains over 600,000 packages that do lots of cool things.  You can import these packages into your code and take advantage of their functionality.  There are packages to help test your code, to build databases, to allow for real-time communication, to send automated emails to users, and to make front-end development simpler.  There are also a number of built-in modules that you don’t have even have to install.

Node.js works best in conjunction with other frameworks from npm – using pure Node.js to built the back-end of a website isn’t great, but using it with Express, Sequelize, React, Redux…now we’re getting somewhere (more on all of those later.)

Next up – Express!