arrow-down coffee engineering consultancy development remote-management support linkedin twitter youtube email phone gitlab github

GitHub to GitLab: The Easy Way

Submitted by oliver on July 23, 2018

OlinData has a bunch of private repositories on Github that we want to migrate to our gitlab server, mostly to save some costs and to centralise on one source control solution.

My first thought was to just clone the repositories from github, add a new remote on our gitlab server, then push. This is how git is designed; there's no one master place where data is stored.

git clone git@github.com:olindata/somerepo
cd somerepo
git remote add olindata git@gitlab.olindata.com:olindata/somerepo
git push -u olindata

Job done.

github and gitlab do more than just host repositories, though. I needed to migrate the history of pull requests and issues too. This is extra data, linked to the repositories, not kept in git.

We all know migrations are painful. It doesn't matter what kind of data or what tools you have available to you; there's always a catch. A very 'stupid' way to make migrations easier is to just minimise the amount of data to migrate. I asked myself: "how many repositories actually have any pull requests or issues associated to them?". And I wrote a script, portableghrepo which tells me the answer.

usage: portableghrepo repo

Assuming you have your github personal access token saved in the file "githubtoken", you can test whether the repository "0intro/plan9" has any pull requests or issues against it like so:

portableghrepo 0intro/plan9

If there are any, it prints the repository name, otherwise, it prints nothing.

I had a list of repositores in the file "repos". So I looped through all of them:

while read r; do portableghrepo $r; done < repos

Using output redirection in the shell you can save the output to a file:

while read r; do portableghrepo $r; done <repos >portablerepos

Turns out the vast majority of private repositories didn't have any associated issues or pull requests, so I was very happy, because I have a lot less work to do now.

Finally, here's portableghrepo:

#!/bin/sh

read token < githubtoken

if ! test $1
then
    echo "no repository specified"
    exit 1
fi

pulls=`curl -Gsd "access_token=$token" \
    "https://api.github.com/repos/$1/pulls/1"`

# github repositories with pull requests or issues return a json list containing
# the key "number"
if echo "$pulls" | grep "number" >/dev/null
then
    exit
fi

issues=`curl -Gsd "access_token=$token" \
    "https://api.github.com/repos/$1/issues/1"`

if echo "$issues" | grep "number" >/dev/null
then
    exit
else 
    # no pull requests or issues
    echo $1
fi

See also Github's HTTP API documentation.

 

Repositories with Pull Requests, Issues etc.

GitLab provides the GitHub import tool which can magically import data from GitHub into GitLab. It (almost) worked 100% out of the box on our self-hosted GitLab instance. For unknown reasons I could only import data as personal projects.

But this is where my favourite trick came in to play. Remember that there were now far fewer GitHub repositories that needed to be migrated. So instead of trawling through gitlab logs and doing a deep dive on OAUTH, I just imported the rest of the repositories as personal projects, then moved each one over by hand. Not glamorous, but it worked out to be enough for this one-off job. It reminds me of one of my favourite quotes by the famous Ken Thompson:

When in doubt, use brute force.