{"id":6017,"date":"2024-05-10T12:33:09","date_gmt":"2024-05-10T12:33:09","guid":{"rendered":"https:\/\/ultratendencyaca-urouz8wsum.live-website.com\/?p=6017"},"modified":"2024-05-14T08:40:17","modified_gmt":"2024-05-14T08:40:17","slug":"galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection","status":"publish","type":"post","link":"https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/","title":{"rendered":"GaLore: Speichereffizientes LLM-Training durch Gradient Low-Rank Projection"},"content":{"rendered":"<p><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-margin-bottom:3%;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:1216.8px;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-1\"><p>Das Trainieren gro\u00dfer Sprachmodelle (Large Langugage Models = LLMs) erfordert erhebliche Mengen an Speicher und Rechenleistung. Um zum Beispiel ein LLaMA 7B-Modell von Grund auf zu trainieren, ben\u00f6tigt eine einzige Batchgr\u00f6\u00dfe mindestens 58 GB Speicher. Eine Methode, die diese Speicherprobleme entsch\u00e4rfen kann, ist die Low-rank Adaptation (LoRA). Bei diesem Ansatz werden jeder Schicht trainierte Low-Rank-Matrizen hinzugef\u00fcgt, wodurch die Anzahl der Parameter reduziert wird. Diese Methode kann jedoch die Parametersuche in Subspaces mit niedrigem Rang einschr\u00e4nken, die Trainingsdynamik ver\u00e4ndern und sogar einen Warmstart mit vollem Rang erforderlich machen, was m\u00f6glicherweise zu einer schlechteren Leistung im Vergleich zum Training mit Gewichten mit vollem Rang f\u00fchrt.<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-2 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:1216.8px;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-1 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-2\"><h2>GaLore: Gradient Low-Rank Projektion<\/h2>\n<p>Gradient Low-Rank Projection (GaLore) stellt eine Trainingsstrategie vor, die Lernen mit allen Parametern und mit gr\u00f6\u00dferer Speichereffizienz als herk\u00f6mmliche Low-Rank-Anpassungsmethoden, wie z.B. LoRA, erm\u00f6glicht. GaLore erreicht eine bis zu 65,5%ige Reduktion der Speichernutzung innerhalb der Optimierungszust\u00e4nde und bewahrt so die Effizienz und Leistung w\u00e4hrend des Pre-Trainings auf den Architekturen LLaMA 1B und 7B sowie w\u00e4hrend der Feinabstimmung der GLUE-Aufgaben auf RoBERTa. Bemerkenswert ist, dass 8-Bit GaLore den Optimierungsspeicher um 82,5% und den gesamten Trainingsspeicher um 63,3% im Vergleich zum BF16-Standard reduziert. Insbesondere er\u00f6ffnet es die noch nie dagewesene M\u00f6glichkeit, ein 7B-Modell auf Consumer-GPUs wie der NVIDIA RTX 4090 vorzutrainieren, ohne dass ein paralleles Checkpointing des Modells oder Offloading-Strategien erforderlich sind.<\/p>\n<\/div><div class=\"fusion-image-element \" style=\"--awb-margin-bottom:1%;--awb-caption-title-font-family:var(--body_typography-font-family);--awb-caption-title-font-weight:var(--body_typography-font-weight);--awb-caption-title-font-style:var(--body_typography-font-style);--awb-caption-title-size:var(--body_typography-font-size);--awb-caption-title-transform:var(--body_typography-text-transform);--awb-caption-title-line-height:var(--body_typography-line-height);--awb-caption-title-letter-spacing:var(--body_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-1 hover-type-none\"><img decoding=\"async\" width=\"939\" height=\"390\" title=\"_5aa65a819e5c59267a503c71-GaLore Memory-Efficient LLM Training by Gradient Low-Rank Projection-150424-155539 (1)\" src=\"https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-1.jpg\" data-orig-src=\"\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-1.jpg\" alt class=\"lazyload img-responsive wp-image-5999\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27939%27%20height%3D%27390%27%20viewBox%3D%270%200%20939%20390%27%3E%3Crect%20width%3D%27939%27%20height%3D%27390%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-1-200x83.jpg 200w, https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-1-400x166.jpg 400w, https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-1-600x249.jpg 600w, https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-1-800x332.jpg 800w, https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-1.jpg 939w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 1024px) 100vw, (max-width: 640px) 100vw, 939px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-3\"><p style=\"text-align: center;\"><em>Speicherverbrauch des Pre-Trainings eines LLaMA 7B-Modells mit einer Token-Batch-Gr\u00f6\u00dfe von 256 auf einem einzigen Ger\u00e4t, ohne Aktivierungs-Checkpointing und Speicher-Offloading.<\/em><\/p>\n<\/div><div class=\"fusion-text fusion-text-4\"><p>GaLore erm\u00f6glicht das Lernen mit vollen Parametern, um die von LoRA aufgezeigten Einschr\u00e4nkungen zu \u00fcberwinden, und ist dabei deutlich speichereffizienter als herk\u00f6mmliche Low-Rank-Anpassungsmethoden. Die Schl\u00fcsselidee besteht darin, die sich langsam ver\u00e4ndernde Low-Rank-Struktur des Gradienten G der Gewichtsmatrix W zu nutzen, anstatt zu versuchen, die Gewichtsmatrix selbst als Low-Rank zu approximieren.<\/p>\n<\/div><div class=\"fusion-image-element \" style=\"text-align:center;--awb-margin-bottom:1%;--awb-caption-title-font-family:var(--body_typography-font-family);--awb-caption-title-font-weight:var(--body_typography-font-weight);--awb-caption-title-font-style:var(--body_typography-font-style);--awb-caption-title-size:var(--body_typography-font-size);--awb-caption-title-transform:var(--body_typography-text-transform);--awb-caption-title-line-height:var(--body_typography-line-height);--awb-caption-title-letter-spacing:var(--body_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-2 hover-type-none\"><img decoding=\"async\" width=\"603\" height=\"531\" title=\"_5aa65a819e5c59267a503c71-GaLore Memory-Efficient LLM Training by Gradient Low-Rank Projection-150424-155539 (2)\" src=\"https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-2.jpg\" data-orig-src=\"\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-2.jpg\" alt class=\"lazyload img-responsive wp-image-6003\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27603%27%20height%3D%27531%27%20viewBox%3D%270%200%20603%20531%27%3E%3Crect%20width%3D%27603%27%20height%3D%27531%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-2-200x176.jpg 200w, https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-2-400x352.jpg 400w, https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-2-600x528.jpg 600w, https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-2.jpg 603w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 1024px) 100vw, (max-width: 640px) 100vw, 603px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-5\"><p style=\"text-align: center;\"><em>Lernen durch niedrigrangige Subspaces \u0394WT1und \u0394WT2 mit GaLore<\/em><\/p>\n<\/div><div class=\"fusion-text fusion-text-6\"><p><strong>Formel f\u00fcr die Aktualisierung der Gewichte: <\/strong>Das Gewicht W_t bei einem bestimmten Trainingsschritt t wird gem\u00e4\u00df der Formel aktualisiert:<\/p>\n<\/div><div class=\"fusion-image-element \" style=\"text-align:center;--awb-margin-bottom:1%;--awb-caption-title-font-family:var(--body_typography-font-family);--awb-caption-title-font-weight:var(--body_typography-font-weight);--awb-caption-title-font-style:var(--body_typography-font-style);--awb-caption-title-size:var(--body_typography-font-size);--awb-caption-title-transform:var(--body_typography-text-transform);--awb-caption-title-line-height:var(--body_typography-line-height);--awb-caption-title-letter-spacing:var(--body_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-3 hover-type-none\"><img decoding=\"async\" width=\"350\" height=\"32\" title=\"weightupdateformula\" src=\"https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/weightupdateformula.png\" data-orig-src=\"\/wp-content\/uploads\/2024\/05\/weightupdateformula.png\" alt class=\"lazyload img-responsive wp-image-6006\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27350%27%20height%3D%2732%27%20viewBox%3D%270%200%20350%2032%27%3E%3Crect%20width%3D%27350%27%20height%3D%2732%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/weightupdateformula-200x18.png 200w, https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/weightupdateformula.png 350w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 1024px) 100vw, (max-width: 640px) 100vw, 350px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-7\"><p><strong>Subspace Switching:<\/strong> Das Modell wechselt w\u00e4hrend des Trainings dynamisch zwischen Subspaces mit niedrigem Rang. Die Auswahl eines Subspaces basiert auf einem Zeitplan, der durch <img decoding=\"async\" class=\"lazyload alignnone size-full wp-image-6008\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27174%27%20height%3D%2735%27%20viewBox%3D%270%200%20174%2035%27%3E%3Crect%20width%3D%27174%27%20height%3D%2735%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"\/wp-content\/uploads\/2024\/05\/subspaceswitchting.png\" alt=\"\" width=\"174\" height=\"35\"> bestimmt wird, wobei T_i die Anzahl der Aktualisierungen innerhalb des i-ten Subspaces angibt.<\/p>\n<p>Der Weg von <img decoding=\"async\" class=\"lazyload alignnone size-full wp-image-6011\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%2720%27%20height%3D%2725%27%20viewBox%3D%270%200%2020%2025%27%3E%3Crect%20width%3D%2720%27%20height%3D%2725%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"\/wp-content\/uploads\/2024\/05\/Gt.png\" alt=\"\" width=\"20\" height=\"25\"> durch mehrere Subspaces mit niedrigem Rang ist in einer Abbildung dargestellt. Dieser Ansatz, der es dem Modell erlaubt, mehrere<br \/>\nSubspaces mit niedrigem Rang zu durchlaufen ist entscheidend f\u00fcr ein erfolgreiches Vortraining von LLMs. Diese Technik ist ein ausgekl\u00fcgelter Ansatz zur Optimierung des Trainings gro\u00dfer neuronaler Netzwerke. Durch sorgf\u00e4ltiges Navigieren durch verschiedene Subspaces kann das Modell den Gewichtsraum effizienter erkunden, was zu einer besseren Generalisierung und Leistung f\u00fchren kann. Diese Methode tr\u00e4gt der Komplexit\u00e4t des Gewichtsraums von LLMs Rechnung und nutzt mathematische Werkzeuge wie SVD, um das Training effektiver und effizienter zu gestalten.<\/p>\n<\/div><div class=\"fusion-image-element \" style=\"text-align:center;--awb-margin-bottom:3%;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-4 hover-type-none\"><img decoding=\"async\" width=\"744\" height=\"254\" title=\"_5aa65a819e5c59267a503c71-GaLore Memory-Efficient LLM Training by Gradient Low-Rank Projection-150424-155539 (4)\" src=\"https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-4.jpg\" data-orig-src=\"\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-4.jpg\" alt class=\"lazyload img-responsive wp-image-6014\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27744%27%20height%3D%27254%27%20viewBox%3D%270%200%20744%20254%27%3E%3Crect%20width%3D%27744%27%20height%3D%27254%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-4-200x68.jpg 200w, https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-4-400x137.jpg 400w, https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-4-600x205.jpg 600w, https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5aa65a819e5c59267a503c71-GaLore-Memory-Efficient-LLM-Training-by-Gradient-Low-Rank-Projection-150424-155539-4.jpg 744w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 1024px) 100vw, (max-width: 640px) 100vw, 744px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-8\"><p style=\"text-align: center;\"><em>Vergleich mit Low-Rank-Algorithmen beim Vortraining verschiedener Gr\u00f6\u00dfen von LLaMA-Modellen auf dem C4-Datensatz. <\/em><em>Die Validierungs-Perplexit\u00e4t wird zusammen mit einer Speichersch\u00e4tzung f\u00fcr die Gesamtheit der Parameter und Optimierer <\/em><em>Zust\u00e4nde basierend auf dem BF16-Format.<\/em><\/p>\n<\/div><div class=\"fusion-text fusion-text-9\"><p>Diese Tabelle zeigt die Speichereffizienz von GaLore anhand experimenteller Ergebnisse. Der Versuchsaufbau ist wie folgt skizziert:<\/p>\n<ul>\n<li>F\u00fcr GaLore wird die Unterraumfrequenz T auf 200 festgelegt, wobei ein Skalierungsfaktor \u03b1 von 0,25 auf alle in der Tabelle genannten Modellgr\u00f6\u00dfen angewendet wird.<\/li>\n<li>F\u00fcr alle Low-Rank-Methoden wird f\u00fcr jede Modellgr\u00f6\u00dfe ein konsistenter Rang r gew\u00e4hlt, und diese Methoden werden auf alle Multi-Head-Attentionen<br \/>\nund Feedforward-Schichten innerhalb der Modelle angewendet.<\/li>\n<li>Das Training wird mit dem Adam-Optimierer unter Verwendung seiner Standard-Hyperparameter durchgef\u00fchrt.<\/li>\n<li>Die Speichernutzung wird auf der Grundlage des BF16-Formats gesch\u00e4tzt, wobei der f\u00fcr die Gewichtungsparameter und den Optimiererzustand erforderliche Speicher<br \/>\nber\u00fccksichtigt wird.<\/li>\n<\/ul>\n<p>Die Tabelle zeigt, dass GaLore andere Sub-Ranking-Methoden \u00fcbertrifft und eine Leistung erzielt, die mit der des Full-Rank-Trainings vergleichbar ist.<br \/>\nInsbesondere bei einer Modellgr\u00f6\u00dfe von 1B \u00fcbertrifft GaLore die Leistung der Full-Rank-Baseline, wenn &#8218;r = 1024&#8216; anstelle von &#8218;r = 512&#8216; verwendet wird. Au\u00dferdem ben\u00f6tigt<br \/>\nGaLore im Vergleich zu LoRA und ReLoRA weniger Speicherplatz f\u00fcr die Speicherung der Modellparameter und den Zustand des Optimierers.<\/p>\n<\/div><div class=\"fusion-text fusion-text-10\"><h2>Fazit<\/h2>\n<p>GaLore reduziert den Speicherverbrauch w\u00e4hrend des Pre-Trainings und der Feinabstimmung von Large Language Models (LLMs) bei gleichbleibender Leistung erheblich. Dies deutet auf eine geringere Abh\u00e4ngigkeit von gro\u00dfen Rechensystemen hin und l\u00e4sst auf das Potenzial f\u00fcr erhebliche Kosteneinsparungen schlie\u00dfen. Das Paper r\u00e4umt jedoch ein, dass GaLore noch vor ungel\u00f6sten Problemen steht: die Anwendung auf verschiedene Trainingsarten wie Vision-Transformatoren und Diffusionsmodelle, die weitere Verbesserung der Speichereffizienz durch Quantisierung und spezielle Parametrisierung und die Erforschung des Potenzials f\u00fcr elastisches, datenverteiltes Training auf verbrauchergerechter Hardware.<br \/>\nTrotz dieser ungel\u00f6sten Probleme erleichtert GaLore das Training von LLMs mit Consumer-Hardware und erm\u00f6glicht so eine breitere Beteiligung. Diese verbesserte Zug\u00e4nglichkeit k\u00f6nnte den Fortschritt der LLM-Forschung beschleunigen. Es ist zu hoffen, dass GaLore mit einer st\u00e4rkeren Beteiligung der Gemeinschaft seine derzeitigen Herausforderungen \u00fcberwindet und sich zu einem wertvollen Werkzeug f\u00fcr die LLM-Gemeinschaft entwickelt.<\/p>\n<p>Referenz:<br \/>\n<a href=\"https:\/\/arxiv.org\/pdf\/2403.03507\" target=\"_blank\" rel=\"noopener\">GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection (arxiv.org)<\/a><\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-3 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:1216.8px;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><\/div><\/div><\/p>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":3,"featured_media":6040,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[51],"tags":[],"class_list":["post-6017","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-de"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>GaLore: Speichereffizientes LLM-Training durch Gradient Low-Rank Projection - Ultra Tendency Academy<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"GaLore: Speichereffizientes LLM-Training durch Gradient Low-Rank Projection - Ultra Tendency Academy\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/\" \/>\n<meta property=\"og:site_name\" content=\"Ultra Tendency Academy\" \/>\n<meta property=\"article:published_time\" content=\"2024-05-10T12:33:09+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-05-14T08:40:17+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5.png\" \/>\n\t<meta property=\"og:image:width\" content=\"750\" \/>\n\t<meta property=\"og:image:height\" content=\"500\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Sally Bo Hatter\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Sally Bo Hatter\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"18\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/2024\\\/05\\\/10\\\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/2024\\\/05\\\/10\\\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\\\/\"},\"author\":{\"name\":\"Sally Bo Hatter\",\"@id\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/#\\\/schema\\\/person\\\/b417acb6e3e5e24ff1b0c5941e419ea9\"},\"headline\":\"GaLore: Speichereffizientes LLM-Training durch Gradient Low-Rank Projection\",\"datePublished\":\"2024-05-10T12:33:09+00:00\",\"dateModified\":\"2024-05-14T08:40:17+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/2024\\\/05\\\/10\\\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\\\/\"},\"wordCount\":3667,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/2024\\\/05\\\/10\\\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ultratendency.academy\\\/wp-content\\\/uploads\\\/2024\\\/05\\\/5.png\",\"articleSection\":[\"AI\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/ultratendency.academy\\\/de\\\/2024\\\/05\\\/10\\\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/2024\\\/05\\\/10\\\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\\\/\",\"url\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/2024\\\/05\\\/10\\\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\\\/\",\"name\":\"GaLore: Speichereffizientes LLM-Training durch Gradient Low-Rank Projection - Ultra Tendency Academy\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/2024\\\/05\\\/10\\\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/2024\\\/05\\\/10\\\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ultratendency.academy\\\/wp-content\\\/uploads\\\/2024\\\/05\\\/5.png\",\"datePublished\":\"2024-05-10T12:33:09+00:00\",\"dateModified\":\"2024-05-14T08:40:17+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/#\\\/schema\\\/person\\\/b417acb6e3e5e24ff1b0c5941e419ea9\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/2024\\\/05\\\/10\\\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/ultratendency.academy\\\/de\\\/2024\\\/05\\\/10\\\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/2024\\\/05\\\/10\\\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\\\/#primaryimage\",\"url\":\"https:\\\/\\\/ultratendency.academy\\\/wp-content\\\/uploads\\\/2024\\\/05\\\/5.png\",\"contentUrl\":\"https:\\\/\\\/ultratendency.academy\\\/wp-content\\\/uploads\\\/2024\\\/05\\\/5.png\",\"width\":750,\"height\":500},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/2024\\\/05\\\/10\\\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Startseite\",\"item\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"GaLore: Speichereffizientes LLM-Training durch Gradient Low-Rank Projection\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/\",\"name\":\"Ultra Tendency Academy\",\"description\":\"News &amp; Expertentipps aus der IT-Branche\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/#\\\/schema\\\/person\\\/b417acb6e3e5e24ff1b0c5941e419ea9\",\"name\":\"Sally Bo Hatter\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/6af8a92f6ae6021d8e0786d04c66cacfb1c012d43877d0715f99e0fb5a379d7a?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/6af8a92f6ae6021d8e0786d04c66cacfb1c012d43877d0715f99e0fb5a379d7a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/6af8a92f6ae6021d8e0786d04c66cacfb1c012d43877d0715f99e0fb5a379d7a?s=96&d=mm&r=g\",\"caption\":\"Sally Bo Hatter\"},\"url\":\"https:\\\/\\\/ultratendency.academy\\\/de\\\/author\\\/sallybohattar\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"GaLore: Speichereffizientes LLM-Training durch Gradient Low-Rank Projection - Ultra Tendency Academy","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/","og_locale":"de_DE","og_type":"article","og_title":"GaLore: Speichereffizientes LLM-Training durch Gradient Low-Rank Projection - Ultra Tendency Academy","og_url":"https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/","og_site_name":"Ultra Tendency Academy","article_published_time":"2024-05-10T12:33:09+00:00","article_modified_time":"2024-05-14T08:40:17+00:00","og_image":[{"width":750,"height":500,"url":"https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5.png","type":"image\/png"}],"author":"Sally Bo Hatter","twitter_card":"summary_large_image","twitter_misc":{"Verfasst von":"Sally Bo Hatter","Gesch\u00e4tzte Lesezeit":"18\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/#article","isPartOf":{"@id":"https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/"},"author":{"name":"Sally Bo Hatter","@id":"https:\/\/ultratendency.academy\/de\/#\/schema\/person\/b417acb6e3e5e24ff1b0c5941e419ea9"},"headline":"GaLore: Speichereffizientes LLM-Training durch Gradient Low-Rank Projection","datePublished":"2024-05-10T12:33:09+00:00","dateModified":"2024-05-14T08:40:17+00:00","mainEntityOfPage":{"@id":"https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/"},"wordCount":3667,"commentCount":0,"image":{"@id":"https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/#primaryimage"},"thumbnailUrl":"https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5.png","articleSection":["AI"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/","url":"https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/","name":"GaLore: Speichereffizientes LLM-Training durch Gradient Low-Rank Projection - Ultra Tendency Academy","isPartOf":{"@id":"https:\/\/ultratendency.academy\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/#primaryimage"},"image":{"@id":"https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/#primaryimage"},"thumbnailUrl":"https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5.png","datePublished":"2024-05-10T12:33:09+00:00","dateModified":"2024-05-14T08:40:17+00:00","author":{"@id":"https:\/\/ultratendency.academy\/de\/#\/schema\/person\/b417acb6e3e5e24ff1b0c5941e419ea9"},"breadcrumb":{"@id":"https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/#primaryimage","url":"https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5.png","contentUrl":"https:\/\/ultratendency.academy\/wp-content\/uploads\/2024\/05\/5.png","width":750,"height":500},{"@type":"BreadcrumbList","@id":"https:\/\/ultratendency.academy\/de\/2024\/05\/10\/galore-speichereffizientes-llm-training-durch-gradient-low-rank-projection\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Startseite","item":"https:\/\/ultratendency.academy\/de\/"},{"@type":"ListItem","position":2,"name":"GaLore: Speichereffizientes LLM-Training durch Gradient Low-Rank Projection"}]},{"@type":"WebSite","@id":"https:\/\/ultratendency.academy\/de\/#website","url":"https:\/\/ultratendency.academy\/de\/","name":"Ultra Tendency Academy","description":"News &amp; Expertentipps aus der IT-Branche","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ultratendency.academy\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Person","@id":"https:\/\/ultratendency.academy\/de\/#\/schema\/person\/b417acb6e3e5e24ff1b0c5941e419ea9","name":"Sally Bo Hatter","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/secure.gravatar.com\/avatar\/6af8a92f6ae6021d8e0786d04c66cacfb1c012d43877d0715f99e0fb5a379d7a?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/6af8a92f6ae6021d8e0786d04c66cacfb1c012d43877d0715f99e0fb5a379d7a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/6af8a92f6ae6021d8e0786d04c66cacfb1c012d43877d0715f99e0fb5a379d7a?s=96&d=mm&r=g","caption":"Sally Bo Hatter"},"url":"https:\/\/ultratendency.academy\/de\/author\/sallybohattar\/"}]}},"_links":{"self":[{"href":"https:\/\/ultratendency.academy\/de\/wp-json\/wp\/v2\/posts\/6017","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ultratendency.academy\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ultratendency.academy\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ultratendency.academy\/de\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/ultratendency.academy\/de\/wp-json\/wp\/v2\/comments?post=6017"}],"version-history":[{"count":0,"href":"https:\/\/ultratendency.academy\/de\/wp-json\/wp\/v2\/posts\/6017\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ultratendency.academy\/de\/wp-json\/wp\/v2\/media\/6040"}],"wp:attachment":[{"href":"https:\/\/ultratendency.academy\/de\/wp-json\/wp\/v2\/media?parent=6017"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ultratendency.academy\/de\/wp-json\/wp\/v2\/categories?post=6017"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ultratendency.academy\/de\/wp-json\/wp\/v2\/tags?post=6017"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}