If you're new to Graph Databases, you can think of "Classes" and "Properties" along the lines of, respectively, "Table names" and "Table field lists".
If you need to first clear out your test database, one of the cells below (currently commented out) will conveniently let you do it
import os
import sys
import getpass
import pandas as pd
from brainannex import GraphAccess, GraphSchema
# In case of problems, try a sys.path.append(directory) , where directory is your project's root directory
GraphAccess library¶# Save your credentials here - or use the prompts given by the next cell
host = "bolt://localhost" # EXAMPLES: bolt://123.456.789.012 OR bolt://localhost
# (CAUTION: do NOT include the port number!)
password = ""
db = GraphAccess(host=host,
credentials=("neo4j", password), debug=False) # Notice the debug option being OFF
Connection to Neo4j database established.
print("Version of the Neo4j driver: ", db.version())
Version of the Neo4j driver: 4.4.13
GraphSchema library¶# CLEAR OUT THE DATABASE
#db.empty_dbase() # UNCOMMENT IF DESIRED ***************** WARNING: USE WITH CAUTION!!! ************************
GraphSchema.set_database(db)
# Create a "City" Class node - together with its Properties, based on the data to import
GraphSchema.create_class_with_properties(name="City", properties=["City ID", "name"])
(3408, 'schema-1')
# Likewise for a "State" Class node - together with its Properties, based on the data to import
GraphSchema.create_class_with_properties(name="State", properties=["State ID", "name", "2-letter abbr"])
(3411, 'schema-4')
# Now add a relationship named "IS_IN", from the "City" Class to the "State" Class
GraphSchema.create_class_relationship(from_class="City", to_class="State", rel_name="IS_IN")
We'll pass our data as Pandas data frames; those could easily be read in from CSV files, for example
city_df = pd.DataFrame({"City ID": [1, 2, 3, 4], "name": ["Berkeley", "Chicago", "San Francisco", "New York City"]})
city_df
| City ID | name | |
|---|---|---|
| 0 | 1 | Berkeley |
| 1 | 2 | Chicago |
| 2 | 3 | San Francisco |
| 3 | 4 | New York City |
state_df = pd.DataFrame({"State ID": [1, 2, 3], "name": ["California", "Illinois", "New York"], "2-letter abbr": ["CA", "IL", "NY"]})
state_df
| State ID | name | 2-letter abbr | |
|---|---|---|---|
| 0 | 1 | California | CA |
| 1 | 2 | Illinois | IL |
| 2 | 3 | New York | NY |
# In this example, we assume a separate table ("join table") with the data about the relationships;
# this would always be the case for many-to-many relationships;
# 1-to-many relationships, like we have here, could also be stored differently
state_city_links_df = pd.DataFrame({"State ID": [1, 1, 2, 3], "City ID": [1, 3, 2, 4]})
state_city_links_df
| State ID | City ID | |
|---|---|---|
| 0 | 1 | 1 |
| 1 | 1 | 3 |
| 2 | 2 | 2 |
| 3 | 3 | 4 |
GraphSchema.import_pandas_nodes(df=city_df, class_name="City") # Import the 4 cities
import_pandas_nodes(): importing 4 records in 1 batch(es) of size up to 1000...
Importing batch # 1 : 4 row(s)
Interim status: at the end of this batch, imported a grand total of 4 record(s), and created a grand total of 4 new node(s)
FINISHED importing 4 record(s), and created 4 new node(s) in the process
{'number_nodes_created': 4, 'affected_nodes_ids': [3415, 3416, 3417, 3418]}
GraphSchema.import_pandas_nodes(df=state_df, class_name="State") # Import the 3 states
import_pandas_nodes(): importing 3 records in 1 batch(es) of size up to 1000...
Importing batch # 1 : 3 row(s)
Interim status: at the end of this batch, imported a grand total of 3 record(s), and created a grand total of 3 new node(s)
FINISHED importing 3 record(s), and created 3 new node(s) in the process
{'number_nodes_created': 3, 'affected_nodes_ids': [3419, 3420, 3421]}
# Finally, link up the cities to the states, using links named "IS_IN"
GraphSchema.import_pandas_links(df=state_city_links_df,
class_from="City", class_to="State",
col_from="City ID", col_to="State ID",
link_name="IS_IN")
import_pandas_links(): importing 4 links in 1 batch(es) of max size 1000...
Result of running batch : {'_contains_updates': True, 'relationships_created': 4, 'returned_data': [{'link_id': 1954}, {'link_id': 1955}, {'link_id': 1956}, {'link_id': 1957}]}
FINISHED importing a total of 4 links
[1954, 1955, 1956, 1957]
This is what we have created with our import:
