Skip to main content

Git Tip of the day -- Remove accidentally added files

Development
6 min read

We've all been there. You're working late, feeling productive, and you run git add . without thinking. Then you realize you just committed your .env file with production API keys, or that embarrassing debug log with customer data, or your personal notes with colorful commentary about the codebase.

The good news? This is fixable. The bad news? Simply deleting the file and making a new commit doesn't solve the problem. That sensitive data is still sitting in your Git history, waiting for someone with git log and too much curiosity.

Here's how to actually remove files from Git history, when to use different approaches, and how to avoid this mess in the future.

Understanding the problem

When you add a file to Git and commit it, that file becomes part of your repository's permanent history. Even if you delete it in a subsequent commit, the original content remains accessible through Git's history.

This is normally a feature — Git's ability to recover deleted files is incredibly useful. But when the file contains sensitive information, this becomes a security liability.

Why this matters more than you think

Security implications: API keys, passwords, and tokens in Git history can be discovered by anyone with repository access, including future team members or attackers who gain access.

Compliance issues: Many security frameworks require that sensitive data never be stored in version control, even temporarily.

Code scanning tools: GitHub, GitLab, and other platforms run automated scans that can detect secrets in your history and flag them.

Third-party access: Services like GitHub Copilot, code analysis tools, and CI/CD systems may have access to your full repository history.

The right approach for different scenarios

The solution depends on what you've committed, where you've pushed it, and how much history you're willing to rewrite.

Scenario 1: File just committed, not pushed yet

This is the easiest case. You can simply amend your last commit or reset to before the problematic commit.

# If the unwanted file is in your most recent commit
git reset --soft HEAD~1
echo 'sensitive-file.txt' >> .gitignore
git add .gitignore
git commit -m "Add gitignore and proper files"

# Or if you want to amend the existing commit
git reset HEAD~1
git add .gitignore
git add . # Add everything except the files now in gitignore
git commit -m "Your original commit message"

Scenario 2: File pushed to remote, small team

If you've pushed to a shared repository but your team is small and communicative, you can rewrite history and force-push.

⚠️ Warning: This rewrites Git history. Everyone with the repository will need to reset their local copies.

# Add file to gitignore first
echo 'sensitive-file.txt' >> .gitignore
git add .gitignore
git commit -m "Add sensitive file to gitignore"

# Remove file from all history
git filter-branch --index-filter \
"git rm -rf --cached --ignore-unmatch sensitive-file.txt" HEAD

# Force push the cleaned history
git push --force-with-lease

Scenario 3: File in production repository

For repositories with many contributors or production deployments, consider these safer approaches:

BFG is faster and safer than git filter-branch for large repositories:

# Install BFG (macOS with Homebrew)
brew install bfg

# Add to gitignore first
echo 'sensitive-file.txt' >> .gitignore
git add .gitignore
git commit -m "Add sensitive file to gitignore"

# Clean the repository
bfg --delete-files sensitive-file.txt
git reflog expire --expire=now --all
git gc --prune=now --aggressive

# Push the cleaned repository
git push --force-with-lease

Option B: Invalidate and Rotate

Sometimes it's safer to invalidate the exposed secrets rather than rewrite history:

# Simply remove the file going forward
git rm sensitive-file.txt
echo 'sensitive-file.txt' >> .gitignore
git add .gitignore
git commit -m "Remove sensitive file and add to gitignore"

# Then immediately:
# 1. Rotate any API keys or passwords that were exposed
# 2. Update all systems using those credentials
# 3. Monitor for unauthorized access

Modern tools and better approaches

git-filter-repo is the modern replacement for git filter-branch:

# Install git-filter-repo
pip install git-filter-repo

# Remove file from entire history
git filter-repo --path sensitive-file.txt --invert-paths

# Add to gitignore
echo 'sensitive-file.txt' >> .gitignore
git add .gitignore
git commit -m "Add gitignore for sensitive files"

Using GitHub's built-in tools

GitHub provides tools for removing sensitive data:

  1. GitHub CLI approach:

    gh api repos/:owner/:repo/git/refs/heads/:branch \
    --method DELETE
  2. GitHub web interface:

    • Go to repository settings
    • Navigate to "Security & analysis"
    • Use "Secret scanning" alerts to identify issues

Automated prevention with pre-commit hooks

Set up pre-commit hooks to catch sensitive files before they're committed:

# Install pre-commit
pip install pre-commit

# Create .pre-commit-config.yaml
cat > .pre-commit-config.yaml << EOF
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-added-large-files
- id: check-merge-conflict
- id: detect-private-key
- repo: https://github.com/Yelp/detect-secrets
rev: v1.4.0
hooks:
- id: detect-secrets
args: ['--baseline', '.secrets.baseline']
EOF

# Install the hooks
pre-commit install

Comprehensive cleanup strategy

Here's a systematic approach to cleaning up a repository that may have multiple sensitive files:

Step 1: Audit your repository

# Search for common sensitive patterns
git log --all --full-history -- "*.env*"
git log --all --full-history -- "*secret*"
git log --all --full-history -- "*password*"
git log --all --full-history -- "*key*"

# Use git-secrets to scan history
git secrets --scan-history

Step 2: Create a comprehensive gitignore

# Add comprehensive patterns to .gitignore
cat >> .gitignore << EOF
# Environment files
.env
.env.local
.env.*.local

# Secret files
*secret*
*password*
*.pem
*.p12
*.key
*.crt

# Configuration files that might contain secrets
config/database.yml
config/secrets.yml
*.conf

# IDE and editor files
.vscode/settings.json
.idea/

# OS generated files
.DS_Store
Thumbs.db
EOF

Step 3: Clean the repository systematically

# Using BFG for multiple files
bfg --delete-files "*.{env,key,pem,p12}"
bfg --replace-text passwords.txt # File containing regex patterns

# Or using git-filter-repo
git filter-repo --path-glob '*.env' --invert-paths
git filter-repo --path-glob '*.key' --invert-paths

Step 4: Verify and communicate

# Verify the cleanup worked
git log --oneline --all | head -20
git log --all --full-history -- "*.env*"

# Communicate with your team
cat > CLEANUP_NOTICE.md << EOF
# Repository Cleanup Notice

This repository has been cleaned to remove sensitive files from Git history.

## What happened
- Sensitive files were accidentally committed
- Full Git history has been rewritten
- All sensitive credentials have been rotated

## What you need to do
1. Back up any local changes
2. Delete your local repository
3. Fresh clone from origin
4. Reapply any local changes

## New security measures
- Pre-commit hooks installed
- Comprehensive .gitignore added
- Regular security scanning enabled
EOF

Advanced techniques for specific situations

Removing files from specific branches only

# Clean only the main branch
git filter-repo --path sensitive-file.txt --invert-paths --refs refs/heads/main

# Clean multiple branches
git filter-repo --path sensitive-file.txt --invert-paths \
--refs refs/heads/main,refs/heads/develop

Handling large repositories efficiently

# Use partial clones for large repositories
git clone --filter=blob:none <repository-url>

# Clean with BFG using specific object size limits
bfg --strip-blobs-bigger-than 50M

Preserving important history while cleaning

# Create a backup branch before cleaning
git branch backup-before-cleanup

# Use git-filter-repo with path renaming
git filter-repo --path-rename 'secret.txt:secret.txt.REMOVED'

Communication and team coordination

When you need to rewrite shared repository history, communication is crucial:

Before the cleanup

## Security Cleanup Notice

**URGENT:** We need to clean sensitive data from our Git history.

**Timeline:**
- [Date/Time]: Cleanup begins
- [Date/Time]: Force push completed
- [Date/Time]: All team members should re-clone

**Your action required:**
1. Push any important local changes before [time]
2. Fresh clone after [time]
3. Verify your development environment still works

**What's being removed:**
- [List sensitive files]
- [Any other cleanup details]

After the cleanup

## Cleanup Complete

The repository has been cleaned. Please:

1. **Delete your local copy**
2. **Fresh clone from origin**
3. **Recreate any local branches**
4. **Update any deployment scripts that reference old commit hashes**

**Security measures added:**
- Pre-commit hooks for secret detection
- Updated .gitignore
- Automated scanning enabled

Prevention strategies

The best way to handle accidentally committed secrets is to prevent them from happening:

Environment setup

# Use direnv for environment management
echo 'eval "$(direnv hook bash)"' >> ~/.bashrc

# Create .envrc for project
cat > .envrc << EOF
export API_KEY="development-key-here"
export DATABASE_URL="postgres://localhost/myapp_dev"
EOF

# Allow the environment
direnv allow .

Git configuration

# Set up global gitignore
git config --global core.excludesFile ~/.gitignore_global

# Add common patterns to global gitignore
cat > ~/.gitignore_global << EOF
.env
.env.*
*secret*
*.key
*.pem
.DS_Store
EOF

IDE and editor setup

Configure your development environment to highlight sensitive files:

VS Code settings.json:

{
"files.watcherExclude": {
"**/.env*": true,
"**/secret*": true
},
"files.associations": {
"*.env*": "plaintext"
}
}

Recovery and verification

After cleaning your repository, verify the cleanup was successful:

Verification commands

# Check that files are gone from history
git log --all --full-history -- "sensitive-file.txt"
git rev-list --all | xargs git grep "API_KEY" || echo "Not found - good!"

# Verify current gitignore works
touch .env
git add .env # Should fail or warn

# Check repository size reduction
git count-objects -vH

Testing your cleanup

# Clone the cleaned repository to a fresh location
git clone /path/to/cleaned/repo /tmp/test-cleanup

# Search for sensitive patterns
cd /tmp/test-cleanup
git log --all --oneline | grep -i secret
git log --all --patch | grep -i "api_key\|password\|secret"

The key to handling accidentally committed secrets is acting quickly and thoroughly. Don't just remove the file — eliminate it from history, rotate any exposed credentials, and put measures in place to prevent future incidents.

Remember: once something is committed to Git, assume it's potentially compromised until you've completely rewritten the history and rotated any sensitive values. Better to be paranoid and safe than sorry and breached.