{"id":3303,"date":"2025-12-23T13:58:18","date_gmt":"2025-12-23T18:58:18","guid":{"rendered":"https:\/\/blog.lufamily.ca\/kang\/?p=3303"},"modified":"2025-12-23T13:58:19","modified_gmt":"2025-12-23T18:58:19","slug":"new-ai-server","status":"publish","type":"post","link":"https:\/\/blog.lufamily.ca\/kang\/2025\/12\/23\/new-ai-server\/","title":{"rendered":"New AI Server"},"content":{"rendered":"\n<p>In a previous <a href=\"https:\/\/blog.lufamily.ca\/kang\/2025\/12\/08\/ai-server-bytes-the-dust\/\" data-type=\"post\" data-id=\"3282\">post<\/a>, I commented on our AI server containing an old P40 GPU failed. We replaced our server with the following parts.<\/p>\n\n\n\n<figure class=\"wp-block-table kl-small-font\"><table><thead><tr><th>Component<\/th><th>Description<\/th><\/tr><\/thead><tbody><tr><td>CPU<\/td><td>AMD Ryzen 9 9900X 4.4 GHz 12-Core Processor<\/td><\/tr><tr><td>CPU Cooler<\/td><td>Thermalright Peerless Assassin 120 SE 66.17 CFM CPU Cooler<\/td><\/tr><tr><td>Motherboard<\/td><td>Asus B650E MAX GAMING WIFI ATX AM5 Motherboard<\/td><\/tr><tr><td>Memory<\/td><td>Crucial Pro 64 GB (2 x 32 GB) DDR5-6000 CL40 Memory<\/td><\/tr><tr><td>Storage<\/td><td>Samsung 990 Pro 2 TB M.2-2280 PCle 4.0 X4 NVME Solid State Drive<\/td><\/tr><tr><td>GPU<\/td><td>2 x EVGA FTW3 ULTRA GAMING GeForce RTX 3090 24 GB Video Card (refurbished)<\/td><\/tr><tr><td>Case<\/td><td>Fractal Design Meshify 3 XL ATX Full Tower Case<\/td><\/tr><tr><td>Power Supply<\/td><td>SeaSonic PRIME TX-1600 ATX 3.1 1600 W 80+ Titanium Certified Fully Modular ATX Power Supply<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>I purchased all of our components at Amazon and the total (including shipping and taxes) came to to be $6,271.22. The most expensive parts were the GPU ($2,979.98), the power supply ($903.95), and then the memory ($843.19). All prices are quoted in Canadian dollars.<\/p>\n\n\n\n<p>I had no issues in building the computer.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/IMG_8995-scaled.jpeg\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"1024\" src=\"https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/IMG_8995-768x1024.jpeg\" alt=\"\" class=\"wp-image-3304\" srcset=\"https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/IMG_8995-768x1024.jpeg 768w, https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/IMG_8995-225x300.jpeg 225w, https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/IMG_8995-1152x1536.jpeg 1152w, https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/IMG_8995-1536x2048.jpeg 1536w, https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/IMG_8995-1200x1600.jpeg 1200w, https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/IMG_8995-scaled.jpeg 1920w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px\" \/><\/a><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/IMG_8997-scaled.jpeg\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"768\" src=\"https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/IMG_8997-1024x768.jpeg\" alt=\"\" class=\"wp-image-3305\" srcset=\"https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/IMG_8997-1024x768.jpeg 1024w, https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/IMG_8997-300x225.jpeg 300w, https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/IMG_8997-768x576.jpeg 768w, https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/IMG_8997-1536x1152.jpeg 1536w, https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/IMG_8997-2048x1536.jpeg 2048w, https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/IMG_8997-1200x900.jpeg 1200w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/a><\/figure>\n\n\n\n<p>As you can see above, after the CPU cooler and the GPU&#8217;s were installed you can barely see the motherboard. Although there are still PCIe slots available, there is no more room to actually place new PCIe cards. We still have two more DIMM slots, so we can consider a future memory upgrade.<\/p>\n\n\n\n<p>One of the main concerns I had was to plug this computer into an electrical socket that will not trip any of my breakers. The 1,600W power supply is awfully close to the maximum theoretical limit of a 15A breaker in our house, which would be around 1,800W. This server is too powerful for any of my current UPS units or power bars. It will have to be connected directly to a wall on a circuit that is not loaded by other appliances.<\/p>\n\n\n\n<p>After testing the memory using <a href=\"https:\/\/www.memtest86.com\" target=\"_blank\" rel=\"noreferrer noopener\">MemTest<\/a>, I installed Ubuntu Server 24.04.3 LTS. To prepare the machine for AI work load, I will then need to install Nvidia CUDA.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Installing CUDA<\/h2>\n\n\n\n<p>The first step that I did was to install the Nvidia CUDA. I followed the steps <a href=\"https:\/\/docs.nvidia.com\/cuda\/cuda-installation-guide-linux\/#ubuntu\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a> for Ubuntu. I specifically follow the Network Repository Installation directions.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\u276f sudo su -\n# wget https:\/\/developer.download.nvidia.com\/compute\/cuda\/repos\/ubuntu2404\/x86_64\/cuda-keyring_1.1-1_all.deb\n# dpkg -i cuda-keyring_1.1-1_all.deb\n\n# apt install cuda-toolkit\n# apt install nvidia-gds\n# reboot<\/code><\/pre>\n\n\n\n<p>After the reboot, I tested CUDA by doing the following:<\/p>\n\n\n\n<pre class=\"wp-block-code kl-small-font\"><code># nvidia-smi              <\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Installing vLLM<\/h2>\n\n\n\n<p>I then proceeded to install vLLM using the <a href=\"https:\/\/docs.vllm.ai\/en\/latest\/getting_started\/quickstart\/\" target=\"_blank\" rel=\"noreferrer noopener\">Quick Start<\/a> guide for Nividia CUDA.<\/p>\n\n\n\n<p>However I just use the Quick Start guide as a guidance. Ultimately I followed the following steps:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\u276f sudo apt install build-essential python3.12-dev\n\n\u276f mkdir py_vllm\n\u276f cd py_vllm\n\n\u276f python3 -m venv vllm_cuda13_env\n\u276f source vllm_cuda13_env\/bin\/activate\n\n\u276f pip install torch-c-dlpack-ext\n\u276f pip install torch torchvision --index-url https:\/\/download.pytorch.org\/whl\/cu130\n\u276f pip install vllm --pre --extra-index-url https:\/\/wheels.vllm.ai\/nightly<\/code><\/pre>\n\n\n\n<p>I tried to run vLLM using Podman but I always run out of memory for certain models, so I chose the Python method of deployment.<\/p>\n\n\n\n<p>I then try to run it with <a href=\"https:\/\/huggingface.co\/Qwen\/Qwen3-14B\" target=\"_blank\" rel=\"noreferrer noopener\">Qwen\/Qwen3-14B<\/a> from Hugging Face. Since I have two GPU&#8217;s, I set the <code>tensor-parallel-size<\/code> to 2.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>export VLLM_USE_V1=1 vllm serve Qwen\/Qwen3-14B --tensor-parallel-size=2<\/code><\/pre>\n\n\n\n<p>It took a minute or two to download the model and initialize the GPU&#8217;s. Once it is up and running, I verified that it was running by using a simple curl command.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\u276f curl http:\/\/localhost:8000\/v1\/models | jq .\n  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n100   463  100   463    0     0   615k      0 --:--:-- --:--:-- --:--:--  452k\n{\n  \"object\": \"list\",\n  \"data\": &#91;\n    {\n      \"id\": \"Qwen\/Qwen3-14B\",\n      \"object\": \"model\",\n      \"created\": 1766511858,\n      \"owned_by\": \"vllm\",\n      \"root\": \"Qwen\/Qwen3-14B\",\n      \"parent\": null,\n      \"max_model_len\": 40960,\n      \"permission\": &#91;\n        {\n          \"id\": \"modelperm-bc2e247073d50d67\",\n          \"object\": \"model_permission\",\n          \"created\": 1766511858,\n          \"allow_create_engine\": false,\n          \"allow_sampling\": true,\n          \"allow_logprobs\": true,\n          \"allow_search_indices\": false,\n          \"allow_view\": true,\n          \"allow_fine_tuning\": false,\n          \"organization\": \"*\",\n          \"group\": null,\n          \"is_blocking\": false\n        }\n      ]\n    }\n  ]\n}<\/code><\/pre>\n\n\n\n<p>To deploy a model, I created the following <code>systemd<\/code> unit file in <code>\/etc\/systemd\/system<\/code> called <code>vllm.service<\/code>. This way vLLM will automatically start when the host is rebooted.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91;Unit]\nDescription=vLLM OpenAI Compatible Server\nAfter=network.target\n\n&#91;Service]\n# User and Group to run the service as (e.g., 'youruser', 'yourgroup')\nUser=kang\nGroup=kang\n# Set the working directory\nWorkingDirectory=\/home\/kang\/py_vllm\nEnvironment=VLLM_USE_V1=1\n\n# The command to start the vLLM server\n# Use 'exec' to ensure systemd correctly manages the process\nExecStart=\/home\/kang\/py_vllm\/vllm_cuda13_env\/bin\/python -m vllm.entrypoints.openai.api_server --model Qwen\/Qwen3-14B --host 0.0.0.0 --port 8000 --tensor-parallel-size 2 --enable-auto-tool-choice --tool-call-parser hermes\n\n# Restart the service if it fails\nRestart=always\n\n&#91;Install]\nWantedBy=multi-user.target<\/code><\/pre>\n\n\n\n<p>I used <code>0.0.0.0<\/code> as the host so that any machine on the network can connect to the service. If you use <code>127.0.0.1<\/code>, only local sessions can connect.<\/p>\n\n\n\n<p>To enable the above service, I had to do the following:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\u276f sudo systemctl daemon-reload\n\u276f sudo systemctl enable vllm.service\n\u276f sudo systemctl start vllm.service<\/code><\/pre>\n\n\n\n<p>I also enabled tooling for my <a href=\"https:\/\/opencode.ai\" target=\"_blank\" rel=\"noreferrer noopener\">Opencode.ai<\/a> experiments. vLLM ended up using all of 48GB VRAM on both GPU&#8217;s for the Qwen LLM as well as for caching. Impressive!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Installing Podman and Prepare for Quadlets<\/h2>\n\n\n\n<p>For everyday chats, I also configured a version of <a href=\"https:\/\/github.com\/ItzCrazyKns\/Perplexica\" target=\"_blank\" rel=\"noreferrer noopener\">Perplexica<\/a>. I chose to use Podman to install this, specifically using <a href=\"https:\/\/docs.podman.io\/en\/latest\/markdown\/podman-systemd.unit.5.html\" target=\"_blank\" rel=\"noreferrer noopener\">Podman Quadlet<\/a>. The idea is to run Perplexica under my user id (<code>kang<\/code>), instead of running it as <code>root<\/code>. Our first step is to install Podman and prepare our user account for quadlets.<\/p>\n\n\n\n<p>Note aside from explicit <code>sudo<\/code> references all other commands are run as the user.<\/p>\n\n\n\n<p>Install Podman:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo apt install podman<\/code><\/pre>\n\n\n\n<p>The container requires user and group ids so we need to map id spaces to my user account.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo usermod --add-subuids 100000-170000 --add-subgids 100000-170000 ${USER}\n\ncat \/etc\/subuid\ncat \/etc\/subgid<\/code><\/pre>\n\n\n\n<p>We need to have an active user session for the container after a reboot, so we need my account to linger around.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo loginctl enable-linger ${USER}<\/code><\/pre>\n\n\n\n<p>We need to proactively increase the kernel key size to avoid any exceeding quota situations like, <code>\u201cDisk quota exceeded: OCI runtime error\u201d<\/code>. Not just for this container, but also for any other future containers.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>echo \"kernel.keys.maxkeys=1000\" | sudo tee -a \/etc\/sysctl.d\/custom.conf<\/code><\/pre>\n\n\n\n<p>Lastly, we need to prepare two directories for the containers. The first will house the <code>systemd<\/code> unit definition of the container. The second is a directory that will act as local storage for the container.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mkdir -p $HOME\/.config\/containers\/systemd\nmkdir -p $HOME\/containers\/storage<\/code><\/pre>\n\n\n\n<p>If we have any previous containers running, we need to perform a system migrate. I did not perform this, because I ensure that I had no other Podman containers running. You also can enable the auto update feature for podman. I also did not do this, as I prefer to this manually.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>podman system migrate\nsystemctl --user enable --now podman-auto-update<\/code><\/pre>\n\n\n\n<p>For a more control networking experience and behaviour, we want to create our own container network. This will also help with DNS resolution. We need to create the network definition in <code>$HOME\/.config\/containers\/systemd\/${USER}.network<\/code> be sure to replace <code>${USER}<\/code> reference below with the actual user account name.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91;Unit]\nDescription=${USER} network\nAfter=podman-user-wait-network-online.service\n \n&#91;Network]\nNetworkName=${USER}\nSubnet=10.168.0.0\/24\nGateway=10.168.0.1\nDNS=192.168.168.198\n \n&#91;Install]\nWantedBy=default.target<\/code><\/pre>\n\n\n\n<p>We can then enable this network with the following commands:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>systemctl --user daemon-reload\nsystemctl --user start ${USER}-network\n\npodman network ls<\/code><\/pre>\n\n\n\n<p>The last command just verifies that the network is running and visible to Podman.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Installing Perplexica<\/h2>\n\n\n\n<p>Now that our Quadlet environment for the user account is all prepared, we can then proceed to install Perplexica.<\/p>\n\n\n\n<p>First we need to create two local directories that Perplexica will use.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mkdir -p $HOME\/containers\/storage\/perplexica\/data\nmkdir -p $HOME\/containers\/storage\/perplexica\/uploads<\/code><\/pre>\n\n\n\n<p>We then need to define the container in <code>$HOME\/.config\/containers\/systemd\/perplexica.container<\/code>:<br><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91;Unit]\nDescription=Perplexica\n\n&#91;Container]\nContainerName=perplexica\nImage=docker.io\/itzcrazykns1337\/perplexica:latest\nAutoUpdate=registry\n\nHealthCmd=curl http:\/\/localhost:3000\nHealthInterval=15m\nUserNS=keep-id:uid=1000,gid=1000\n\nNetwork=kang.network\nHostName=perplexica\nPublishPort=3000:3000\n\nVolume=%h\/containers\/storage\/perplexica\/data:\/home\/perplexica\/data\nVolume=%h\/containers\/storage\/perplexica\/uploads:\/home\/perplexica\/uploads\n\n&#91;Service]\nRestart=always\nTimeoutStartSec=300\n\n&#91;Install]\nWantedBy=default.target<\/code><\/pre>\n\n\n\n<p>Be sure to double check your account <code>uid<\/code> and <code>gid<\/code> is 1000. If not, then replace the above appropriately.<\/p>\n\n\n\n<p>Now we can start Perplexica.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>systemctl --user daemon-reload\nsystemctl --user start perplexica<\/code><\/pre>\n\n\n\n<p>Note that the above commands are run with the user account and not with <code>sudo<\/code> or as <code>root<\/code>. Also note the <code>--user<\/code> option.<\/p>\n\n\n\n<p>Once the service is running, you can get its logs by doing the following:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>journalctl --user -u perplexica<\/code><\/pre>\n\n\n\n<p>You can also see all containers running as quadlets using:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>systemctl --user status<\/code><\/pre>\n\n\n\n<p>With Perplexica running, we can proceed to its Web UI (<code>http:\/\/localhost:3000<\/code>) using our browser and point to our vLLM instance by creating an <strong>OpenAI<\/strong> connection type.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"726\" height=\"538\" src=\"https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/Screenshot-2025-12-23-at-1.22.07-PM.png\" alt=\"\" class=\"wp-image-3317\" srcset=\"https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/Screenshot-2025-12-23-at-1.22.07-PM.png 726w, https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/Screenshot-2025-12-23-at-1.22.07-PM-300x222.png 300w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px\" \/><\/figure>\n\n\n\n<p>Once the connection is established, you can proceed to add the Chat and Embedding Models. In our case I used <code>Qwen\/Qwen3-14B<\/code> as the model key. This is the same as the model id that vLLM is currently serving. The model name can be anything you assign.<\/p>\n\n\n\n<p>That is it! We now have a local chat service with Perplexica, and I can use the OpenAI compatible API with vLLM.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"561\" src=\"https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/Screenshot-2025-12-23-at-1.31.25-PM-1024x561.png\" alt=\"\" class=\"wp-image-3319\" srcset=\"https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/Screenshot-2025-12-23-at-1.31.25-PM-1024x561.png 1024w, https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/Screenshot-2025-12-23-at-1.31.25-PM-300x164.png 300w, https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/Screenshot-2025-12-23-at-1.31.25-PM-768x421.png 768w, https:\/\/blog.lufamily.ca\/kang\/wp-content\/uploads\/sites\/3\/2025\/12\/Screenshot-2025-12-23-at-1.31.25-PM.png 1124w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/figure>\n\n\n\n<p>Here is an example of using CURL with the API:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\u276f curl -X POST \"http:\/\/localhost:8000\/v1\/responses\" \\\n  -H \"Content-Type: application\/json\" \\\n  -d '{\n    \"model\": \"Qwen\/Qwen3-14B\",\n    \"input\": \"How are you today?\"\n  }' | jq -r '.output&#91;0].content&#91;0].text'\n  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n100  1791  100  1721  100    70    448     18  0:00:03  0:00:03 --:--:--   466\n&lt;think>\nOkay, the user asked, \"How are you today?\" I need to respond appropriately. First, I should acknowledge their greeting and express that I'm doing well. Since I don't have feelings, I can't experience emotions, but I can simulate a friendly response. I should keep it positive and open-ended to encourage further conversation. Maybe add an emoji to keep it friendly. Also, I should invite them to ask questions or share something. Let me check if the response is natural and not too robotic. Avoid any technical jargon. Make sure it's concise but warm. Alright, that should work.\n&lt;\/think>\n\nI'm doing great, thank you! I'm always ready to chat and help out. How about you? \ud83d\ude0a What's on your mind today?<\/code><\/pre>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In a previous post, I commented on our AI server containing an old P40 GPU failed. We replaced our server with the following parts. Component Description CPU AMD Ryzen 9 9900X 4.4 GHz 12-Core Processor CPU Cooler Thermalright Peerless Assassin 120 SE 66.17 CFM CPU Cooler Motherboard Asus B650E MAX GAMING WIFI ATX AM5 Motherboard &hellip; <a href=\"https:\/\/blog.lufamily.ca\/kang\/2025\/12\/23\/new-ai-server\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;New AI Server&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[111],"tags":[142,165,141,196,28,195],"class_list":["post-3303","post","type-post","status-publish","format-standard","hentry","category-tech","tag-ai","tag-cuda","tag-llm","tag-perplexica","tag-technology","tag-vllm"],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p7V6i8-Rh","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/blog.lufamily.ca\/kang\/wp-json\/wp\/v2\/posts\/3303","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.lufamily.ca\/kang\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.lufamily.ca\/kang\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.lufamily.ca\/kang\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.lufamily.ca\/kang\/wp-json\/wp\/v2\/comments?post=3303"}],"version-history":[{"count":15,"href":"https:\/\/blog.lufamily.ca\/kang\/wp-json\/wp\/v2\/posts\/3303\/revisions"}],"predecessor-version":[{"id":3324,"href":"https:\/\/blog.lufamily.ca\/kang\/wp-json\/wp\/v2\/posts\/3303\/revisions\/3324"}],"wp:attachment":[{"href":"https:\/\/blog.lufamily.ca\/kang\/wp-json\/wp\/v2\/media?parent=3303"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.lufamily.ca\/kang\/wp-json\/wp\/v2\/categories?post=3303"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.lufamily.ca\/kang\/wp-json\/wp\/v2\/tags?post=3303"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}