mirror of
https://github.com/fauxpilot/fauxpilot.git
synced 2025-07-07 13:31:55 -07:00
Resole merge conflicts
This commit is contained in:
commit
de71bb6ff5
6 changed files with 53 additions and 35 deletions
50
README.md
50
README.md
|
@ -18,12 +18,14 @@ Note that the VRAM requirements listed by `setup.sh` are *total* -- if you have
|
||||||
|
|
||||||
lmao
|
lmao
|
||||||
|
|
||||||
|
Okay, fine, we now have some minimal information on [the wiki](https://github.com/moyix/fauxpilot/wiki) and a [discussion forum](https://github.com/moyix/fauxpilot/discussions) where you can ask questions. Still no formal support or warranty though!
|
||||||
|
|
||||||
## Setup
|
## Setup
|
||||||
|
|
||||||
Run the setup script to choose a model to use. This will download the model from Huggingface and then convert it for use with FasterTransformer.
|
Run the setup script to choose a model to use. This will download the model from [Huggingface/Moyix](https://huggingface.co/Moyix) in GPT-J format and then convert it for use with FasterTransformer.
|
||||||
|
|
||||||
```
|
```
|
||||||
$ ./setup.sh
|
$ ./setup.sh
|
||||||
Models available:
|
Models available:
|
||||||
[1] codegen-350M-mono (2GB total VRAM required; Python-only)
|
[1] codegen-350M-mono (2GB total VRAM required; Python-only)
|
||||||
[2] codegen-350M-multi (2GB total VRAM required; multi-language)
|
[2] codegen-350M-multi (2GB total VRAM required; multi-language)
|
||||||
|
@ -40,7 +42,7 @@ Downloading and converting the model, this will take a while...
|
||||||
Converting model codegen-350M-multi with 1 GPUs
|
Converting model codegen-350M-multi with 1 GPUs
|
||||||
Loading CodeGen model
|
Loading CodeGen model
|
||||||
Downloading config.json: 100%|██████████| 996/996 [00:00<00:00, 1.25MB/s]
|
Downloading config.json: 100%|██████████| 996/996 [00:00<00:00, 1.25MB/s]
|
||||||
Downloading pytorch_model.bin: 100%|██████████| 760M/760M [00:11<00:00, 68.3MB/s]
|
Downloading pytorch_model.bin: 100%|██████████| 760M/760M [00:11<00:00, 68.3MB/s]
|
||||||
Creating empty GPTJ model
|
Creating empty GPTJ model
|
||||||
Converting...
|
Converting...
|
||||||
Conversion complete.
|
Conversion complete.
|
||||||
|
@ -67,23 +69,23 @@ Done! Now run ./launch.sh to start the FauxPilot server.
|
||||||
Then you can just run `./launch.sh`:
|
Then you can just run `./launch.sh`:
|
||||||
|
|
||||||
```
|
```
|
||||||
$ ./launch.sh
|
$ ./launch.sh
|
||||||
[+] Running 2/0
|
[+] Running 2/0
|
||||||
⠿ Container fauxpilot-triton-1 Created 0.0s
|
⠿ Container fauxpilot-triton-1 Created 0.0s
|
||||||
⠿ Container fauxpilot-copilot_proxy-1 Created 0.0s
|
⠿ Container fauxpilot-copilot_proxy-1 Created 0.0s
|
||||||
Attaching to fauxpilot-copilot_proxy-1, fauxpilot-triton-1
|
Attaching to fauxpilot-copilot_proxy-1, fauxpilot-triton-1
|
||||||
fauxpilot-triton-1 |
|
fauxpilot-triton-1 |
|
||||||
fauxpilot-triton-1 | =============================
|
fauxpilot-triton-1 | =============================
|
||||||
fauxpilot-triton-1 | == Triton Inference Server ==
|
fauxpilot-triton-1 | == Triton Inference Server ==
|
||||||
fauxpilot-triton-1 | =============================
|
fauxpilot-triton-1 | =============================
|
||||||
fauxpilot-triton-1 |
|
fauxpilot-triton-1 |
|
||||||
fauxpilot-triton-1 | NVIDIA Release 22.06 (build 39726160)
|
fauxpilot-triton-1 | NVIDIA Release 22.06 (build 39726160)
|
||||||
fauxpilot-triton-1 | Triton Server Version 2.23.0
|
fauxpilot-triton-1 | Triton Server Version 2.23.0
|
||||||
fauxpilot-triton-1 |
|
fauxpilot-triton-1 |
|
||||||
fauxpilot-triton-1 | Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
|
fauxpilot-triton-1 | Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
|
||||||
fauxpilot-triton-1 |
|
fauxpilot-triton-1 |
|
||||||
fauxpilot-triton-1 | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
|
fauxpilot-triton-1 | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
|
||||||
fauxpilot-triton-1 |
|
fauxpilot-triton-1 |
|
||||||
fauxpilot-triton-1 | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
|
fauxpilot-triton-1 | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
|
||||||
fauxpilot-triton-1 | By pulling and using the container, you accept the terms and conditions of this license:
|
fauxpilot-triton-1 | By pulling and using the container, you accept the terms and conditions of this license:
|
||||||
fauxpilot-triton-1 | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
|
fauxpilot-triton-1 | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
|
||||||
|
@ -93,12 +95,12 @@ fauxpilot-copilot_proxy-1 | * Running on all addresses (0.0.0.0)
|
||||||
fauxpilot-copilot_proxy-1 | WARNING: This is a development server. Do not use it in a production deployment.
|
fauxpilot-copilot_proxy-1 | WARNING: This is a development server. Do not use it in a production deployment.
|
||||||
fauxpilot-copilot_proxy-1 | * Running on http://127.0.0.1:5000
|
fauxpilot-copilot_proxy-1 | * Running on http://127.0.0.1:5000
|
||||||
fauxpilot-copilot_proxy-1 | * Running on http://172.18.0.3:5000 (Press CTRL+C to quit)
|
fauxpilot-copilot_proxy-1 | * Running on http://172.18.0.3:5000 (Press CTRL+C to quit)
|
||||||
fauxpilot-triton-1 |
|
fauxpilot-triton-1 |
|
||||||
fauxpilot-triton-1 | ERROR: This container was built for NVIDIA Driver Release 515.48 or later, but
|
fauxpilot-triton-1 | ERROR: This container was built for NVIDIA Driver Release 515.48 or later, but
|
||||||
fauxpilot-triton-1 | version was detected and compatibility mode is UNAVAILABLE.
|
fauxpilot-triton-1 | version was detected and compatibility mode is UNAVAILABLE.
|
||||||
fauxpilot-triton-1 |
|
fauxpilot-triton-1 |
|
||||||
fauxpilot-triton-1 | [[]]
|
fauxpilot-triton-1 | [[]]
|
||||||
fauxpilot-triton-1 |
|
fauxpilot-triton-1 |
|
||||||
fauxpilot-triton-1 | I0803 01:51:02.690042 93 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f6104000000' with size 268435456
|
fauxpilot-triton-1 | I0803 01:51:02.690042 93 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f6104000000' with size 268435456
|
||||||
fauxpilot-triton-1 | I0803 01:51:02.690461 93 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
|
fauxpilot-triton-1 | I0803 01:51:02.690461 93 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
|
||||||
fauxpilot-triton-1 | I0803 01:51:02.692434 93 model_repository_manager.cc:1191] loading: fastertransformer:1
|
fauxpilot-triton-1 | I0803 01:51:02.692434 93 model_repository_manager.cc:1191] loading: fastertransformer:1
|
||||||
|
@ -112,28 +114,28 @@ fauxpilot-triton-1 | {
|
||||||
fauxpilot-triton-1 | I0803 01:51:04.711929 93 libfastertransformer.cc:321] After Loading Model:
|
fauxpilot-triton-1 | I0803 01:51:04.711929 93 libfastertransformer.cc:321] After Loading Model:
|
||||||
fauxpilot-triton-1 | I0803 01:51:04.712427 93 libfastertransformer.cc:537] Model instance is created on GPU NVIDIA RTX A6000
|
fauxpilot-triton-1 | I0803 01:51:04.712427 93 libfastertransformer.cc:537] Model instance is created on GPU NVIDIA RTX A6000
|
||||||
fauxpilot-triton-1 | I0803 01:51:04.712694 93 model_repository_manager.cc:1345] successfully loaded 'fastertransformer' version 1
|
fauxpilot-triton-1 | I0803 01:51:04.712694 93 model_repository_manager.cc:1345] successfully loaded 'fastertransformer' version 1
|
||||||
fauxpilot-triton-1 | I0803 01:51:04.712841 93 server.cc:556]
|
fauxpilot-triton-1 | I0803 01:51:04.712841 93 server.cc:556]
|
||||||
fauxpilot-triton-1 | +------------------+------+
|
fauxpilot-triton-1 | +------------------+------+
|
||||||
fauxpilot-triton-1 | | Repository Agent | Path |
|
fauxpilot-triton-1 | | Repository Agent | Path |
|
||||||
fauxpilot-triton-1 | +------------------+------+
|
fauxpilot-triton-1 | +------------------+------+
|
||||||
fauxpilot-triton-1 | +------------------+------+
|
fauxpilot-triton-1 | +------------------+------+
|
||||||
fauxpilot-triton-1 |
|
fauxpilot-triton-1 |
|
||||||
fauxpilot-triton-1 | I0803 01:51:04.712916 93 server.cc:583]
|
fauxpilot-triton-1 | I0803 01:51:04.712916 93 server.cc:583]
|
||||||
fauxpilot-triton-1 | +-------------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
fauxpilot-triton-1 | +-------------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
fauxpilot-triton-1 | | Backend | Path | Config |
|
fauxpilot-triton-1 | | Backend | Path | Config |
|
||||||
fauxpilot-triton-1 | +-------------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
fauxpilot-triton-1 | +-------------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
fauxpilot-triton-1 | | fastertransformer | /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
|
fauxpilot-triton-1 | | fastertransformer | /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
|
||||||
fauxpilot-triton-1 | +-------------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
fauxpilot-triton-1 | +-------------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
fauxpilot-triton-1 |
|
fauxpilot-triton-1 |
|
||||||
fauxpilot-triton-1 | I0803 01:51:04.712959 93 server.cc:626]
|
fauxpilot-triton-1 | I0803 01:51:04.712959 93 server.cc:626]
|
||||||
fauxpilot-triton-1 | +-------------------+---------+--------+
|
fauxpilot-triton-1 | +-------------------+---------+--------+
|
||||||
fauxpilot-triton-1 | | Model | Version | Status |
|
fauxpilot-triton-1 | | Model | Version | Status |
|
||||||
fauxpilot-triton-1 | +-------------------+---------+--------+
|
fauxpilot-triton-1 | +-------------------+---------+--------+
|
||||||
fauxpilot-triton-1 | | fastertransformer | 1 | READY |
|
fauxpilot-triton-1 | | fastertransformer | 1 | READY |
|
||||||
fauxpilot-triton-1 | +-------------------+---------+--------+
|
fauxpilot-triton-1 | +-------------------+---------+--------+
|
||||||
fauxpilot-triton-1 |
|
fauxpilot-triton-1 |
|
||||||
fauxpilot-triton-1 | I0803 01:51:04.738989 93 metrics.cc:650] Collecting metrics for GPU 0: NVIDIA RTX A6000
|
fauxpilot-triton-1 | I0803 01:51:04.738989 93 metrics.cc:650] Collecting metrics for GPU 0: NVIDIA RTX A6000
|
||||||
fauxpilot-triton-1 | I0803 01:51:04.739373 93 tritonserver.cc:2159]
|
fauxpilot-triton-1 | I0803 01:51:04.739373 93 tritonserver.cc:2159]
|
||||||
fauxpilot-triton-1 | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
fauxpilot-triton-1 | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
fauxpilot-triton-1 | | Option | Value |
|
fauxpilot-triton-1 | | Option | Value |
|
||||||
fauxpilot-triton-1 | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
fauxpilot-triton-1 | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
@ -151,7 +153,7 @@ fauxpilot-triton-1 | | min_supported_compute_capability | 6.0
|
||||||
fauxpilot-triton-1 | | strict_readiness | 1 |
|
fauxpilot-triton-1 | | strict_readiness | 1 |
|
||||||
fauxpilot-triton-1 | | exit_timeout | 30 |
|
fauxpilot-triton-1 | | exit_timeout | 30 |
|
||||||
fauxpilot-triton-1 | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
fauxpilot-triton-1 | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
fauxpilot-triton-1 |
|
fauxpilot-triton-1 |
|
||||||
fauxpilot-triton-1 | I0803 01:51:04.740423 93 grpc_server.cc:4587] Started GRPCInferenceService at 0.0.0.0:8001
|
fauxpilot-triton-1 | I0803 01:51:04.740423 93 grpc_server.cc:4587] Started GRPCInferenceService at 0.0.0.0:8001
|
||||||
fauxpilot-triton-1 | I0803 01:51:04.740608 93 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000
|
fauxpilot-triton-1 | I0803 01:51:04.740608 93 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000
|
||||||
fauxpilot-triton-1 | I0803 01:51:04.781561 93 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
|
fauxpilot-triton-1 | I0803 01:51:04.781561 93 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
|
||||||
|
@ -163,7 +165,7 @@ Once everything is up and running, you should have a server listening for reques
|
||||||
|
|
||||||
```python
|
```python
|
||||||
$ ipython
|
$ ipython
|
||||||
Python 3.8.10 (default, Mar 15 2022, 12:22:08)
|
Python 3.8.10 (default, Mar 15 2022, 12:22:08)
|
||||||
Type 'copyright', 'credits' or 'license' for more information
|
Type 'copyright', 'credits' or 'license' for more information
|
||||||
IPython 8.2.0 -- An enhanced Interactive Python. Type '?' for help.
|
IPython 8.2.0 -- An enhanced Interactive Python. Type '?' for help.
|
||||||
|
|
||||||
|
@ -173,10 +175,10 @@ In [2]: openai.api_key = 'dummy'
|
||||||
|
|
||||||
In [3]: openai.api_base = 'http://127.0.0.1:5000/v1'
|
In [3]: openai.api_base = 'http://127.0.0.1:5000/v1'
|
||||||
|
|
||||||
In [4]: result = openai.Completion.create(engine='codegen', prompt='def hello', max_tokens=16, temperature=0.1, stop=["\n\n"])
|
In [4]: result = openai.Completion.create(model='codegen', prompt='def hello', max_tokens=16, temperature=0.1, stop=["\n\n"])
|
||||||
|
|
||||||
In [5]: result
|
In [5]: result
|
||||||
Out[5]:
|
Out[5]:
|
||||||
<OpenAIObject text_completion id=cmpl-6hqu8Rcaq25078IHNJNVooU4xLY6w at 0x7f602c3d2f40> JSON: {
|
<OpenAIObject text_completion id=cmpl-6hqu8Rcaq25078IHNJNVooU4xLY6w at 0x7f602c3d2f40> JSON: {
|
||||||
"choices": [
|
"choices": [
|
||||||
{
|
{
|
||||||
|
@ -212,4 +214,6 @@ Perhaps more excitingly, you can configure the official [VSCode Copilot plugin](
|
||||||
|
|
||||||
And you should be able to use Copilot with your own locally hosted suggestions! Of course, probably a lot of stuff is subtly broken. In particular, the probabilities returned by the server are partly fake. Fixing this would require changing FasterTransformer so that it can return log-probabilities for the top k tokens rather that just the chosen token.
|
And you should be able to use Copilot with your own locally hosted suggestions! Of course, probably a lot of stuff is subtly broken. In particular, the probabilities returned by the server are partly fake. Fixing this would require changing FasterTransformer so that it can return log-probabilities for the top k tokens rather that just the chosen token.
|
||||||
|
|
||||||
|
Another issue with using the Copilot plugin is that its tokenizer (the component that turns text into a sequence of integers for the model) is slightly different from the one used by CodeGen, so the plugin will sometimes send a request that is longer than CodeGen can handle. You can work around this by replacing the `vocab.bpe` and `tokenizer.json` found in the Copilot extension (something like `.vscode/extensions/github.copilot-[version]/dist/`) with the ones found [here](https://github.com/moyix/fauxpilot/tree/main/copilot_proxy/cgtok/openai_format).
|
||||||
|
|
||||||
Have fun!
|
Have fun!
|
||||||
|
|
|
@ -111,7 +111,7 @@ class CodeGenProxy:
|
||||||
if stop_words is None:
|
if stop_words is None:
|
||||||
stop_words = []
|
stop_words = []
|
||||||
if stop_words:
|
if stop_words:
|
||||||
stop_word_list = np.repeat(to_word_list_format([stop_words], self.tokenizer), input_start_ids.shape[0],
|
stop_word_list = np.repeat(self.to_word_list_format([stop_words], self.tokenizer), input_start_ids.shape[0],
|
||||||
axis=0)
|
axis=0)
|
||||||
else:
|
else:
|
||||||
stop_word_list = np.concatenate([np.zeros([input_start_ids.shape[0], 1, 1]).astype(
|
stop_word_list = np.concatenate([np.zeros([input_start_ids.shape[0], 1, 1]).astype(
|
||||||
|
|
|
@ -18,12 +18,15 @@ services:
|
||||||
count: all
|
count: all
|
||||||
capabilities: [gpu]
|
capabilities: [gpu]
|
||||||
copilot_proxy:
|
copilot_proxy:
|
||||||
image: moyix/copilot_proxy:latest
|
# For dockerhub version
|
||||||
|
# image: moyix/copilot_proxy:latest
|
||||||
|
# command: python3 -m flask run --host=0.0.0.0 --port=5000
|
||||||
# For local build
|
# For local build
|
||||||
# build:
|
build:
|
||||||
# context: .
|
context: .
|
||||||
# dockerfile: copilot_proxy/Dockerfile
|
dockerfile: copilot_proxy/Dockerfile
|
||||||
env_file:
|
env_file:
|
||||||
|
# Automatically created via ./setup.sh
|
||||||
- .env
|
- .env
|
||||||
ports:
|
ports:
|
||||||
- "${API_EXTERNAL_PORT}:5000"
|
- "${API_EXTERNAL_PORT}:5000"
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
TRITON_HOST=localhost
|
TRITON_HOST=triton
|
||||||
TRITON_PORT=8001
|
TRITON_PORT=8001
|
||||||
API_HOST=0.0.0.0
|
API_HOST=0.0.0.0
|
||||||
API_PORT=5000
|
API_PORT=5000
|
||||||
|
|
|
@ -8,9 +8,8 @@ fi
|
||||||
source .env
|
source .env
|
||||||
|
|
||||||
# On newer versions, docker-compose is docker compose
|
# On newer versions, docker-compose is docker compose
|
||||||
DOCKER_COMPOSE=$(command -v docker-compose)
|
if command -v docker-compose > /dev/null; then
|
||||||
if [ -z "$DOCKER_COMPOSE" ]; then
|
docker compose up
|
||||||
DOCKER_COMPOSE="docker compose"
|
else
|
||||||
|
docker-compose up
|
||||||
fi
|
fi
|
||||||
|
|
||||||
$DOCKER_COMPOSE up
|
|
||||||
|
|
12
setup.sh
12
setup.sh
|
@ -12,6 +12,18 @@ if [ -f .env ]; then
|
||||||
fi;
|
fi;
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
function check_dep(){
|
||||||
|
echo "Checking for $1 ..."
|
||||||
|
which "$1" 2>/dev/null || {
|
||||||
|
echo "Please install $1."
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
check_dep curl
|
||||||
|
check_dep zstd
|
||||||
|
check_dep docker
|
||||||
|
|
||||||
|
|
||||||
echo "Models available:"
|
echo "Models available:"
|
||||||
echo "[1] codegen-350M-mono (2GB total VRAM required; Python-only)"
|
echo "[1] codegen-350M-mono (2GB total VRAM required; Python-only)"
|
||||||
echo "[2] codegen-350M-multi (2GB total VRAM required; multi-language)"
|
echo "[2] codegen-350M-multi (2GB total VRAM required; multi-language)"
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue