Zaɓi Harshe

Splitwise: Ingantacciyar Samar da LLM ta hanyar Raba Matakai

Splitwise tana inganta samar da LLM ta hanyar raba lissafin gabatarwa da samarwar alama a kan injuna daban-daban, tana samun mafi girman kwarara da rage farashi/amfani da wutar lantarki.
computingpowercoin.org | PDF Size: 2.6 MB
Kima: 4.5/5
Kimarku
Kun riga kun ƙididdige wannan takarda
Murfin Takardar PDF - Splitwise: Ingantacciyar Samar da LLM ta hanyar Raba Matakai

Teburin Abubuwan Ciki

1. Gabatarwa

Manyan samfuran harshe na samarwa (LLMs) sun kawo sauyi ga sarrafa harshe na halitta, amma bukatunsu na lissafi suna haifar da manyan kalubale don ingantaccen samarwa. Hanyar Splitwise tana magance waɗannan kalubalen ta hanyar gane da kuma amfani da bambancin halayen lissafi na manyan matakai biyu a cikin samarwar LLM.

2. Bayanan Baya da Dalili

2.1 Matakan Samar da LLM

Samarwar LLM ta ƙunshi matakai biyu daban-daban:

  • Matakin Lissafin Gabatarwa: Mai tsananin lissafi, sarrafa duk alamomin shigarwa a layi daya
  • Matakin Samar da Alama: Mai cike da buƙatar ƙwaƙwalwar ajiya, samar da alamomin fitarwa a jere

2.2 Iyakokin Kayan Aiki

Kwatancen Ƙayyadaddun GPU

A100 da H100: ƙaruwar lissafi 3.43× amma ingantaccen bandeji na ƙwaƙwalwar ajiya kawai 1.64×

GPUs na zamani suna nuna rashin daidaito tsakanin ƙarfin lissafi da iyawar ƙwaƙwalwar ajiya, suna haifar da rashin inganci a cikin samarwar LLM.

3. Tsarin Splitwise

3.1 Bayyani na Tsarin Gina

Splitwise tana turawa lissafin gabatarwa da samarwar alama akan injuna daban-daban da aka inganta don bukatun kowane mataki.

3.2 Gudanar da Albarkatun Na Musamman na Mataki

High-compute GPUs (H100) don matakin gabatarwa, GPUs masu tasiri farashi don matakin samarwar alama.

4. Aiwar da Fasaha

4.1 Tushen Lissafi

Hanyar kulawa a cikin masu canzawa za a iya wakilta su kamar haka:

$Attention(Q, K, V) = softmax(\\frac{QK^T}{\\sqrt{d_k}})V$

Inda $Q$, $K$, $V$ suka wakilci tambayoyi, makullai, da kimar bi da bi, kuma $d_k$ shine girma na makullai.

4.2 Aiwar Lambar

class SplitwiseScheduler:
    def schedule_request(self, request):
        if request.phase == "prompt":
            return self.assign_to_prompt_machine(request)
        else:
            return self.assign_to_token_machine(request)
    
    def transfer_state(self, prompt_output, token_machine):
        # Ingantaccen canja wurin jihar ta amfani da RDMA
        return token_machine.load_state(prompt_output)

5. Sakamakon Gwaji

Splitwise ta cimma:

  • Matsakaicin kwarara 1.4× mafi girma a farashi 20% mafi ƙasa
  • Matsakaicin kwarara 2.35× mafi yawa a ƙarƙashin kasafin wutar lantarki da farashi iri ɗaya
  • Ingantaccen daidaiton jinkiri da amfani da albarkatu

6. Bincike da Tattaunawa

Splitwise tana wakiltar ci gaba mai muhimmanci a cikin ingantaccen samarwar LLM ta hanyar magance rashin daidaito na asali tsakanin bukatun lissafi da iyawar kayan aiki. Hanyar ta samo kwarin gwiwa daga ka'idojin tsarin rarraba masu kama da waɗanda aka yi amfani da su a cikin MapReduce da sauran tsare-tsaren sarrafa layi daya. Ta hanyar gane cewa matakin samarwar alama yana da iyaka da ƙwaƙwalwar ajiya maimakon lissafi, Splitwise tana ba da damar ƙarin ingantaccen rabon albarkatu wanda ya dace da ainihin bukatun lissafi na kowane mataki na samarwa.

Wannan aikin ya ginu akan kafaffen ka'idoji a cikin tsarin kwamfuta, musamman matsalar bangon ƙwaƙwalwar ajiya da Wulf da McKee suka gano a cikin 1995, wanda ya nuna girman bambanci tsakanin saurin processor da aikin ƙwaƙwalwar ajiya. Tsarin kulawa na transformer, wanda aka fara gabatar da shi a cikin takarda Vaswani et al. na 1997 "Attention is All You Need," ya ƙirƙiri waɗannan matakai daban-daban na lissafi, amma ƙoƙarin ingantawa na baya sun fi mayar da hankali kan matsa lamba da ƙididdigewa maimakon rabuwar gine-gine.

Idan aka kwatanta da turawa guda ɗaya na al'ada, hanyar rabuwar mataki na Splitwise tana nuna yadda za a iya amfani da kayan aiki na musamman da kyau, kama da yadda ginshiƙan TPU na Google suka inganta don takamaiman ayyukan ML. Ingantaccen kwarara 1.4× da rage farashi 20% suna da mahimmanci musamman idan aka yi la'akari da girman girman turawa LLM na zamani, inda ko da ƙananan ƙimar ingantawa ke nufin adana aiki mai yawa.

Hanyar ta dace da sabbin abubuwan da suka shafi kwamfuta iri-iri, inda tsarin suke haɗa nau'ikan processors daban-daban da aka inganta don takamaiman ayyuka. Yayin da LLMs ke ci gaba da girma cikin girma da rikitarwa, hanyoyi kamar Splitwise za su zama mahimmanci ga ingantaccen turawa AI, suna magance duka matsalolin tattalin arziki da muhalli masu alaƙa da samarwa na babban sikelin.

7. Aikace-aikacen Gaba

Hanyoyin gaba sun haɗa da:

  • Ingantaccen samar da samfuran nau'i-nau'i da yawa
  • Turawa na lissafi na gefe
  • Rabin albarkatun daidaitawa na ainihin lokaci
  • Haɗa kai tare da sabbin tsarin gine-ginen kayan aiki

8. Bayanan

  1. Vaswani, A., et al. "Attention is All You Need." NeurIPS 2017.
  2. Brown, T., et al. "Language Models are Few-Shot Learners." NeurIPS 2020.
  3. Wulf, W. A., & McKee, S. A. "Hitting the memory wall: implications of the obvious." ACM SIGARCH Computer Architecture News, 1995.
  4. NVIDIA Corporation. "NVIDIA H100 Tensor Core GPU Architecture." 2022.
  5. Dean, J., & Ghemawat, S. "MapReduce: Simplified data processing on large clusters." OSDI 2004.