Programming Programming Knowledge

What is Git version Control

January 5, 2019

What is GIT?

- There are a lot of tutorials out there for Git. So what's new in this post?

- Well, it's true that you'll find a lot of blog post regarding Git. But they are not in a single place. So I thought why don't I write and put them in the same place!

By the way, In this post I am not showing any tutorial of git version control. For tutorial go to this link.

Git storage mechanism

Git storage mechanism

Definition

Git is known as the "distributed version control system" but in order to understand git, we need to know how git stores information. At its core git is kind of a key-value store. Where key is the 'Hash' of data and value is the data. You can retrieve the data using the key.

Key

It is a cryptographic hash function which produces 40 digit hexadecimal data after the data is given to it. The key will be always the same if the input is the same. This key is called SHA1.

Blob

Git stores it's data into a blob, along with some metadata. These metadata are

  • Identifier for the blob
  • Size of the content
  • Content
  • \0 delimiter

 

Git Folder Architeture

Git Folder Architecture

Architecture

After git generates the key it stores its data .git>objects>folder_name. Let's say the hash generated starts with

8ab686eaf... Then the folder will be created using the first 2 characters of the hash. But so far we have talked about the files, what about the folders?

well, for this git creates something which is called tree. Tree points to the blobs or to the other trees. Like blob, trees also have some metadata like,

  • Type of pointer ( tree or blobs )
  • filename or directory name
  • mode (executable file, symbolic link, ... )

So the whole thing is a directional graph. Where a pointer is pointing to the other trees in the case of subdirectories or to the blob if it is pointing to a file. I am not going to show the data structure cause this is not the main target here. I just want to show how things are working inside the git control system. Git objects are in compressed format. Git creates 'pack files ' by compressing multiple files together. 'Pack files' stores the object, but there is another thing which is called 'deltas' that holds the differences occurred in a file from one version to another version. Pack files are usually generated when the git repo has too many objects or during garbage collection or during the push to the remote. I not going to talk about how garbage collection happens!!

Git Commits

Next, I am going to talk about git commits

Commit Object

Just like other git objects, Commit objects points to a tree while containing some metadata. These are

Git Commit Example

Git Commit Example

  • Author and Committer
  • Date
  • Message
  • Parent commit (one and more)

The SHA1 of the commit is the hash of all this information.

Example: 

In this example shown in the picture, the green balls are the commits and the other balls are trees. There, the first commit 'fae12' is the parent commit which only pointing to the trees and nothing else. How do we know it's the parent commit? Well, it because it does not point to any other green balls ( commit). On the contrary, the second commit 'ab29d' is pointing to the first commit and to the trees it is corresponding. This is how commits are ordered in this form. Each commit holds the information about how the repositories looked like at the time of the commit. So, in short, each commit is a code snapshot of the repositories, a combination of changes that happened between each commit.

We can not change a commit. Because when we change a file the commit will create a new SHA1 hash. Can anyone tell me, what will gonna happen if no files are changed? Will it be the same commit or different?

 

Initial Commit

Now, we can take a deep look when you create the first commit. So let's say, you created a folder named sample. Then initialized git version control on that folder. So for doing that you ran the following command.

git init

If there is no problem, then it will say "Initialized empty Git repository in /folder/name/projhect/.git/" .  Now create a text file and add it to the git. For this, run these command

echo 'Hello, World' > hello.txt
git add hello.txt
git commit -m "Initial Commit"

Reply after a successful git commit

Git Commit

As you can see after I ran these commands on the terminal. it says that it has created a commit at af11c91 hash. Also, keep that in mind, this hash value will be different from each other. the reason is, even if you have the file the date and time part would be different in the time of commit along with the author, that's why it will be different. after committing the file, you can go to the subdirectories> ".git/objects" you will find that there are multiple folders created in there. It is done due to the fact that, large projects have hundreds of git commits, which would be messy, thus Git creates subdirectories based on the first 2 characters of the hash and stores them inside that for optimization.

If you look at the picture you will see that there are 3 folders inside the objects folder.

folder structure inside git objects folder

Git objects structure

Previously we discussed in objects folder, there can be 3 types of things. These are either a blob, tree or a commit. To see, what is the object is we can run a few commands.

git cat-file -t 8ab68
blob
git cat-file -p 8ab68
Hello, World
git cat-file -t af11c
commit
git cat-file -t bc225
tree

In this commands, you can see that -p outputs the type of object. that is either blob or a commit or a tree, whereas, -t returns the contents of the file. I only showed the use of the -p in one case, but you are free to run this command for other objects too.

References - Pointer to commits

There are multiple types of references, which points to a specific commit. These references are

  • Tags
  • Branches
  • Head ( a pointer to the current commit )

Head

Head is a special type of pointers. It points to the current commit. When to check out from a commit it also points to the current branch. In short, it points to the very last commit you did.  By the way, Git only has one head.

We already know in the previous section, about the objects. Now we will know about the pointer. Now run this command

cat .git/HEAD
ref: ref/heads/master

cat .git/refs/heads/master
af11c91600f0cbb.....

In the first command, you will see, where the HEAD is pointed at & right now it is pointed at ref/heads/master.  Now if we try to look at that file, you will see that it is pointing at a hash. Can you remember which Hash it is?? If you can, good job! It's the tree object in the objects folder. So HEAD is pointing to a master and master is currently pointing to the tree. (HEAD->master->af11c9)

Now like me, you must be thinking what is this master, where did it come from!!? It is the branch name. Whenever we initialize a git inside a folder, it automatically creates the master branch and if do not create any new branch our commits are stored in that branch that is the master branch. To put this into the test, let's go to the refs directory.

git branch

In this folder, you will see a file named master inside heads folder.

git branch after adding a new branch

git branch after adding a new branch

Now run the command to create a new branch.

git branch a_new_branch

now, if you see the folder again, you will see that a new file has been created inside heads folder.

 

 

 

That is it for today. Next time I will write about the 3 areas where our code lives. Till then, Goodbye. Happy Coding!

You Might Also Like

No Comments

Leave a Reply