Lab 2 - Introduction to Git/GitHub

Authors

Charles Lehnen

Melissa Guzman

Introduction to Git/GitHub

In this lab, we will explore the fundamentals of Git and GitHub, crucial tools for modern-day software development and data science. Git is a distributed version control system that enables you to track changes in your codebase, collaborate with others, and maintain a history of changes to your project.

This is not a comprehensive guide to Git/Github, but should introduce you to the basics.

Why use Git?

Version control systems are a category of software tools that help an individual or team manage changes to source code over time. Git, as a version control system, has several benefits:

  • Track Changes: Every change made to your source code can be tracked along with who made the change and why. If something breaks, you can easily find out which change caused the issue.

    • Example: track changes in Microsoft Word
  • Historical Backup: You can go back in time and restore previous versions of your project, which allows you to recover lost functionality.

    • Examples:
      • Version history in Google Docs
      • Edit history in individual cells in Google Sheets
      • Browsing to previous versions of a document using Github.com
  • Cloud Backup: Hosting services like GitHub provide a remote location to store your repositories, ensuring that your work is safe and up to date across devices.

  • Team Development: Git allows multiple team members can work on the same files concurrently. Git is designed to handle potential conflicts by providing tools for merging changes made by different developers.

  • Hosting Project Online: Share your work with others around the globe in order to contribute to your field, as a living resume of your capabilities, and to gain insight from others on how to improve your work.

    • As a living resume:

      • Make sure to make your private repo contributions visible:

    • Examples:

    • Because of this aspect, intuitive organization is important!

      • Here is a template format you can use:

      • You can also follow a template like this for coursework:

      • Activity: Take 5 minutes to better organize the folder/file structure of your repo

Github Warnings!
  • Note: Github does not like empty folders

  • Also, be careful with name changes, they can detach version history from your file. Best to do renaming through Github.com.

Git vs. GitHub

While Git and GitHub are often mentioned together, they are distinct entities:

  • Git: A version control system that runs locally on your computer. You can use Git without GitHub to manage your version control locally or with a different remote repository host.

  • GitHub: An online service that hosts your Git repositories. It adds many of its own features including a web interface.

Git Workflow

Personal Use

The basic Git workflow for personal use involves the following steps:

  1. Working Directory: Where you do work on your local device.
  2. Staging Area: A file in the .git directory that stores information about what you have chosen to go into your next commit.
  3. Local Repository: The .git history on your local device.
  4. Remote Repository: The .git history in the cloud that should be synced between devices

Source: Git & GitHub - Workflow Fundamentals

Team Use

For team projects, the workflow includes additional steps to facilitate collaboration:

  1. Forking/Branching: Creating a personal copy of the main repo.
  2. Cloning: Copying a repo to your local machine.
  3. Committing: Saving snapshots of your changes in the local repository.
  4. Pushing: Sending your committed changes to a remote repository.
  5. Pull Requests: Requesting that the project maintainer pulls the changes you’ve made to the main repo.
  6. Merging: The project maintainer reviews and merges your changes into the main project.

Source: Vulnerable GitHub Actions Workflows Part 1: Privilege Escalation Inside Your CI/CD Pipeline

Git files

Understanding the structure of a Git repository is crucial:

  • .git File: This hidden folder in your repository contains all version control history.
  • .gitignore: A text file that tells Git which files or folders to ignore (not add to stage) in a project.
  • Licenses: Choosing a license dictates how others can use, modify, or distribute your code, especially in regards to commercial usage.
  • README: A markdown file that introduces and explains a project. It can include instructions on how to use your code, who maintains it, and other relevant information.

Basic CLI Commands in Git

Here are some common commands used in the command-line interface (CLI) of Git:

  • git init: Create a new local repository.
  • git clone [url]: Copy a repository from GitHub to your local machine.
  • git add [file]: Add a file as it looks now to your next commit (stage).
  • git commit -m "[descriptive message]": Commit your staged content as a new commit snapshot.
  • git push [alias] [branch]: Transmit local branch commits to the remote repository branch.
  • git pull: Fetch and merge any commits from the tracking remote branch.

Best Practices

When to Commit Changes

When working with Git, committing changes is like setting a checkpoint in your development process which you can return to if needed. Here are some best practices for when to commit your changes:

  • Logical Changes: Make a commit when you complete a logical section of work, such as fixing a bug, adding a new function, or improving performance.
  • Successes: Commit your changes when your code works. This practice ensures that you are committing a stable version of the code.
  • End of Work Session: It’s good to commit at the end of a work session. This creates a restore point to which you can return to.
  • Before New Tasks: Before switching to a new branch or starting a new task, commit your current changes to keep your work organized.

Avoid committing half-done work that could break functionality. Use Git’s “stash” feature or work on a separate branch if you need to switch contexts temporarily.

Commit Messages

Commit messages should be clear and descriptive. They should explain what was changed and why. If you used a resource, it is good to cite it in the description of your commit. Here are some examples of good commit messages:

- git commit -m "Fixed bug causing incorrect output"

- git commit -m "Added new function to calculate average"

- git commit -m "Updated README with instructions"

When to Pull

Regularly pulling changes from the remote repository ensures that your local development is synchronized with the work of your work in the cloud or the work of your teammates. Here’s when to pull:

  • Beginning of Work: Pull the latest changes at the start of your work session to ensure you’re working with the most up-to-date code.
  • Frequent Updates: Pull frequently while working, especially before starting new tasks, to avoid large merges.
  • Before Pushing: Pull right before you push your own changes to make sure that you can resolve any conflicts on your local machine first.

Use git pull to fetch and merge changes from the remote branch to your local branch. For a more controlled update, you can use git fetch to retrieve remote changes and git merge to combine them with your branch.

When to Push

Pushing changes to the remote repository should be done regularly to back up your work and keep your collaborators up to date with your progress. Here’s when to push:

  • After a Series of Commits: Once you’ve made several commits that constitute a section of work, push these changes to the remote repository.
  • At the End of your Work Session: Ensure that your daily progress is saved remotely by pushing your commits at the end of a work session.
  • Collaborative Work: If you’re working with a team, push your commits often to minimize merge conflicts and keep the remote repository current with everyone’s changes.
  • After Pulling: Always push your latest commits after pulling new changes to ensure smooth integration of the work.

If working on a shared branch, be considerate about pushing only stable and tested changes to avoid disrupting your teammates’ work.

Activity

  1. Edit your README and save your changes
  2. Commit your changes with a descriptive message
  3. Pull and push your changes
  4. Edit your README online
  5. Pull your changes to your local device
  6. Create a new file call test.R and add it to your .gitignore file
  7. Now what happens when you try to commit your changes?
  8. Delete your test.R file from your directory and your .gitignore file
  9. Edit your README
  10. Commit, but do not push
  11. Edit your README again
  12. Amend your previous commit, what happens?
  13. Pull and push your changes
  14. Go to Github.com and browse previous versions of your repo

References

Stevens, Martin Henry Hoffman. 2009. A Primer of Ecology with r. Springer.