Resole merge conflicts

2025-07-08 05:51:26 -07:00 · 2022-10-03 14:27:32 +02:00 · 2022-10-03 14:27:32 +02:00 · de71bb6ff5
commit de71bb6ff5
parent ed1d877b7c 12d2367a5a
6 changed files with 53 additions and 35 deletions
--- a/README.md
+++ b/README.md
@ -18,9 +18,11 @@ Note that the VRAM requirements listed by `setup.sh` are *total* -- if you have
 lmao
 Okay, fine, we now have some minimal information on [the wiki](https://github.com/moyix/fauxpilot/wiki) and a [discussion forum](https://github.com/moyix/fauxpilot/discussions) where you can ask questions. Still no formal support or warranty though!
 ## Setup
-Run the setup script to choose a model to use. This will download the model from Huggingface and then convert it for use with FasterTransformer.
+Run the setup script to choose a model to use. This will download the model from [Huggingface/Moyix](https://huggingface.co/Moyix) in GPT-J format and then convert it for use with FasterTransformer.
 ```
 $ ./setup.sh
@ -173,7 +175,7 @@ In [2]: openai.api_key = 'dummy'
 In [3]: openai.api_base = 'http://127.0.0.1:5000/v1'
-In [4]: result = openai.Completion.create(engine='codegen', prompt='def hello', max_tokens=16, temperature=0.1, stop=["\n\n"])
+In [4]: result = openai.Completion.create(model='codegen', prompt='def hello', max_tokens=16, temperature=0.1, stop=["\n\n"])
 In [5]: result
 Out[5]:
@ -212,4 +214,6 @@ Perhaps more excitingly, you can configure the official [VSCode Copilot plugin](
 And you should be able to use Copilot with your own locally hosted suggestions! Of course, probably a lot of stuff is subtly broken. In particular, the probabilities returned by the server are partly fake. Fixing this would require changing FasterTransformer so that it can return log-probabilities for the top k tokens rather that just the chosen token.
 Another issue with using the Copilot plugin is that its tokenizer (the component that turns text into a sequence of integers for the model) is slightly different from the one used by CodeGen, so the plugin will sometimes send a request that is longer than CodeGen can handle. You can work around this by replacing the `vocab.bpe` and `tokenizer.json` found in the Copilot extension (something like `.vscode/extensions/github.copilot-[version]/dist/`) with the ones found [here](https://github.com/moyix/fauxpilot/tree/main/copilot_proxy/cgtok/openai_format).
 Have fun!
--- a/copilot_proxy/utils/codegen.py
+++ b/copilot_proxy/utils/codegen.py
@ -111,7 +111,7 @@ class CodeGenProxy:
        if stop_words is None:
            stop_words = []
        if stop_words:
-            stop_word_list = np.repeat(to_word_list_format([stop_words], self.tokenizer), input_start_ids.shape[0],
+            stop_word_list = np.repeat(self.to_word_list_format([stop_words], self.tokenizer), input_start_ids.shape[0],
                                       axis=0)
        else:
            stop_word_list = np.concatenate([np.zeros([input_start_ids.shape[0], 1, 1]).astype(
--- a/docker-compose.yaml
+++ b/docker-compose.yaml
@ -18,12 +18,15 @@ services:
              count: all
              capabilities: [gpu]
  copilot_proxy:
-    image: moyix/copilot_proxy:latest
+    # For dockerhub version
    # image: moyix/copilot_proxy:latest
    # command: python3 -m flask run --host=0.0.0.0 --port=5000
    # For local build
-#    build:
+    build:
-#      context: .
+      context: .
-#      dockerfile: copilot_proxy/Dockerfile
+      dockerfile: copilot_proxy/Dockerfile
    env_file:
      # Automatically created via ./setup.sh
      - .env
    ports:
      - "${API_EXTERNAL_PORT}:5000"
--- a/example.env
+++ b/example.env
@ -1,4 +1,4 @@
-TRITON_HOST=localhost
+TRITON_HOST=triton
 TRITON_PORT=8001
 API_HOST=0.0.0.0
 API_PORT=5000
--- a/launch.sh
+++ b/launch.sh
@ -8,9 +8,8 @@ fi
 source .env
 # On newer versions, docker-compose is docker compose
-DOCKER_COMPOSE=$(command -v docker-compose)
+if command -v docker-compose > /dev/null; then
-if [ -z "$DOCKER_COMPOSE" ]; then
+    docker compose up
-    DOCKER_COMPOSE="docker compose"
+else
    docker-compose up
 fi
 $DOCKER_COMPOSE up
--- a/setup.sh
+++ b/setup.sh
@ -12,6 +12,18 @@ if [ -f .env ]; then
    fi;
 fi
 function check_dep(){
    echo "Checking for $1 ..."
    which "$1" 2>/dev/null || {
        echo "Please install $1."
        exit 1
    }
 }
 check_dep curl
 check_dep zstd
 check_dep docker
 echo "Models available:"
 echo "[1] codegen-350M-mono (2GB total VRAM required; Python-only)"
 echo "[2] codegen-350M-multi (2GB total VRAM required; multi-language)"