Preston's request to delete/purge the sql files

Description

Preston wrote: Somewhat related.. these sql files are huge and causing issues. Can they be removed and purged from the git history? It’s not apparent from the names what is current and necessary, and the errors are blockers. If they are actually getting baked into the build images they need to be externalized. GitHub won’t even accept the source code due to the hard errors..

remote: warning: File reference-api-mysql/src/main/resources/db/mysql/hspc_5_stu3_default_dataset.sql is 70.20 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File 360-migration/370-migration/temp-dump.sql is 55.76 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-webapp-base/src/main/resources/db/hspc_4_dstu2_default_dataset.sql is 54.93 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-webapp-base/src/main/resources/db/hspc_4_initial_dataset.sql is 60.97 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-webapp-base/src/main/resources/db/hspc_4_initial_dataset.sql is 52.19 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-mysql/src/main/resources/db/mysql/hspc_8_dstu2_default_dataset.sql is 76.07 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-webapp/src/main/resources/db/mysql/hspc_8_dstu2_default_dataset.sql is 76.07 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-mysql/src/main/resources/db/mysql/hspc_8_dstu2_default_dataset.sql is 76.06 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-mysql/src/main/resources/db/mysql/hspc_5_dstu2_default_dataset.sql is 76.07 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-mysql/src/main/resources/db/mysql/hspc_5_dstu2_default_dataset.sql is 90.17 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-mysql/src/main/resources/db/mysql/hspc_5_stu3_default_dataset.sql is 55.75 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-mysql/src/main/resources/db/mysql/hspc_8_stu3_default_dataset.sql is 55.35 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-webapp/src/main/resources/db/mysql/hspc_8_stu3_default_dataset.sql is 55.35 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-webapp/src/main/resources/db/mysql/hspc_8_r4_default_dataset.sql is 54.73 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File 360-migration/370-migration/temp-dump.sql is 60.77 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-mysql/src/main/resources/db/mysql/hspc_8_stu3_default_dataset.sql is 54.58 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-mysql/src/main/resources/db/mysql/hspc_8_r4_default_dataset.sql is 54.73 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-mysql/src/main/resources/db/mysql/hspc_8_r4_default_dataset.sql is 54.61 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-mysql/src/main/resources/db/mysql/hspc_8_r4_default_dataset.sql is 54.58 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File 360-migration/temp-dump.sql is 55.75 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File 360-migration/temp-dump.sql is 55.35 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-mysql/src/main/resources/db/mysql/hspc_8_r4_default_dataset.sql is 54.72 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-mysql/src/main/resources/db/mysql/hspc_5_r4_default_dataset.sql is 54.72 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-mysql/src/main/resources/db/mysql/hspc_5_stu3_default_dataset.sql is 55.33 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-mysql/src/main/resources/db/mysql/hspc_5_stu3_default_dataset.sql is 52.89 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-webapp-base/src/main/resources/db/hspc_5_stu3_default_dataset.sql is 70.12 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-webapp-base/src/main/resources/db/hspc_5_stu3_default_dataset.sql is 70.11 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-webapp-base/src/main/resources/db/hspc_5_stu3_default_dataset.sql is 53.62 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-webapp-base/src/main/resources/db/hspc_4_stu3_default_dataset.sql is 60.97 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File reference-api-webapp-base/src/main/resources/db/hspc_4_initial_dataset.sql is 60.94 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.
remote: error: Trace: 67c535da692156a74a1829a867c711b13b3bee1305563292a930822e49bbbb16
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File reference-api-webapp/src/main/resources/db/migration/migration_360/370-migration/output.log is 344.60 MB; this exceeds GitHub's file size limit of 100.00 MB
remote: error: File reference-api-mysql/src/main/resources/db/mysql/hspc_5_dstu2_default_dataset.sql is 101.77 MB; this exceeds GitHub's file size limit of 100.00 MB
remote: error: File reference-api-webapp-base/src/main/resources/db/hspc_5_dstu2_default_dataset.sql is 101.73 MB; this exceeds GitHub's file size limit of 100.00 MB

Activity

Show:
Shilpy Sharma
November 16, 2020, 5:39 PM

Replied to Preton: …..these are the dataset of patient, provider and patient related conditions that we provide to our users as default dataset for DSTU2, STU3 and R4 sandboxes.  We can look into it if these could be purged.


Any dataset for hspc_4_ and hspc_5_ are old and redundant, they are okay to be deleted.  
The ones with hspc_8_ are the current dataset that we provide. These are also over the limit and it would be hard to reduce the size there, thus we would need to externalize those.

Shilpy Sharma
November 16, 2020, 5:39 PM

Need to confirm with Jacob, if these files are okay to be purged from git history.

Nikolai Schwertner
November 19, 2020, 5:54 PM

Decision made: Compress and move the files into a separate GIT repo. Then purge them from the GIT commits history of the current repo (overwrite the history) while preserving the commits sequence history and content otherwise.

Gopal Menon
November 19, 2020, 6:25 PM
Edited

Shilpy and I had made a quick attempt to remove files larger than 50MB from the git history yesterday. We used a tool called BFG Repo Cleaner. However this wiped out the entire history (from a local copy). In the IPM today, Preston said that he had used the same tool earlier and it did not wipe out the history. We would need to do a little more research on this.

https://rtyley.github.io/bfg-repo-cleaner/

Gopal Menon
November 25, 2020, 11:47 PM
Edited

Shilpy and I worked on this and were able to upload the repo to GitHub. We first needed to remove large files from git history. We also needed to remove a log file that was in the current code base.

Add the GitHub repo as a remote repo

git remote add upstream

Remove blobs larger than 100 MB

java -jar /Users/gopalmenon/Downloads/bfg-1.13.0.jar --strip-blobs-bigger-than 100M reference-api.git 

Remove specific file from history

java -jar /Users/gopalmenon/Downloads/bfg-1.13.0.jar --delete-files output.log 

Then the file reference-api-webapp/src/main/resources/db/migration/migration_360/370-migration/output.log was removed and the change committed.

git push upstream

git push --tags upstream

We would need to redo this as the master branch on Bitbucket still has the 300MB+ size output.log file. It was removed from the develop branch. We would need to merge the develop branch into master. Make a release. Then repeat the above steps to copy the repo to GitHub after first removing it from GitHub.

Assignee

Shilpy Sharma

Reporter

Shilpy Sharma

Labels

None

Priority

Major
Configure