<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.2">Jekyll</generator><link href="/blog/feed.xml" rel="self" type="application/atom+xml" /><link href="/blog/" rel="alternate" type="text/html" /><updated>2025-01-28T18:04:23+00:00</updated><id>/blog/feed.xml</id><title type="html">Make Art with Python</title><subtitle>Programming for Creative People</subtitle><author><name>Kirk Kaiser</name></author><entry><title type="html">Why Social Media has Captured Our Attention</title><link href="/blog/social-media-is-the-final-film/" rel="alternate" type="text/html" title="Why Social Media has Captured Our Attention" /><published>2024-12-31T00:00:00+00:00</published><updated>2024-12-31T00:00:00+00:00</updated><id>/blog/social-media-is-the-final-film</id><content type="html" xml:base="/blog/social-media-is-the-final-film/"><![CDATA[<!-- Include Mermaid library -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/mermaid/10.6.1/mermaid.min.js"></script>

<style>
    .video-container {
      position: relative;
      width: 100%;
      max-width: 800px;
      margin: 20px 0;
    }
    .lazy-video {
      width: 100%;
      height: auto;
    }
  </style>

<script>
  // video-optimizer.js
class VideoOptimizer {
  constructor(options = {}) {
    this.options = {
      videoSelector: '.lazy-video',
      threshold: 0.1,
      rootMargin: '50px',
      ...options
    };
    
    this.videos = new Map(); // Store video elements and their observers
    this.init();
  }

  init() {
    // Find all video elements with the specified selector
    const videoElements = document.querySelectorAll(this.options.videoSelector);
    
    videoElements.forEach(video => {
      // Store original video source
      const videoSrc = video.getAttribute('data-src');
      if (!videoSrc) return;

      // Set up video element
      video.autoplay = true;
      video.muted = true;
      video.loop = true;
      video.playsInline = true;
      video.preload = 'metadata';

      // Create loading placeholder
      this.createPlaceholder(video);

      // Set up intersection observer
      this.observeVideo(video, videoSrc);
    });
  }

  createPlaceholder(video) {
    const placeholder = document.createElement('div');
    placeholder.className = 'video-placeholder';
    placeholder.style.cssText = `
      position: absolute;
      top: 0;
      left: 0;
      width: 100%;
      height: 100%;
      background-color: #f3f4f6;
      transition: opacity 0.3s ease;
    `;
    
    video.parentElement.style.position = 'relative';
    video.parentElement.insertBefore(placeholder, video);
    
    // Store placeholder reference
    video.placeholder = placeholder;
  }

  observeVideo(video, videoSrc) {
    const observer = new IntersectionObserver(
      entries => {
        entries.forEach(entry => {
          if (entry.isIntersecting) {
            this.loadVideo(video, videoSrc);
            observer.unobserve(video); // Stop observing once loaded
          }
        });
      },
      {
        threshold: this.options.threshold,
        rootMargin: this.options.rootMargin
      }
    );

    // Store observer reference
    this.videos.set(video, observer);
    observer.observe(video);
  }

  loadVideo(video, videoSrc) {
    // Create and add source element
    const source = document.createElement('source');
    source.src = videoSrc;
    source.type = 'video/mp4';
    video.appendChild(source);

    // Handle video loaded
    video.addEventListener('loadeddata', () => {
      if (video.placeholder) {
        video.placeholder.style.opacity = '0';
        setTimeout(() => {
          video.placeholder.remove();
          delete video.placeholder;
        }, 300);
      }
    });

    // Set up play/pause based on visibility
    const playbackObserver = new IntersectionObserver(
      entries => {
        entries.forEach(entry => {
          if (entry.isIntersecting) {
            video.play().catch(error => {
              console.warn('Autoplay failed:', error);
            });
          } else {
            video.pause();
          }
        });
      },
      { threshold: 0.1 }
    );

    playbackObserver.observe(video);
    this.videos.set(video, playbackObserver);
  }

  destroy() {
    // Clean up all observers
    this.videos.forEach((observer, video) => {
      observer.unobserve(video);
      observer.disconnect();
    });
    this.videos.clear();
  }
}
    // Initialize the video optimizer

    mermaid.initialize({ startOnLoad: true });
</script>

<h2 id="the-agency-eating-machine">The Agency Eating Machine</h2>

<div class="video-container">
	<video class="lazy-video" data-src="https://pub-f17786433d2849ff86c458a4019a0ed6.r2.dev/FilterAnimation.mp4"></video>
  </div>

<p>Despite a sensory information stream of 1 gigabit per second, humans are only capable of <a href="https://arxiv.org/abs/2408.10234">thinking at 10 bits per second</a>.</p>

<p>So even though we can see, feel, and experience a very many things at any moment, we are only capable of consciously thinking in a single thought stream at a time, and only at a rate of 10 bits per second.</p>

<p>This means every person’s daily budget for logical thinking is only around 576,000 bits of unique thought, assuming they’re awake for 16 hours out of the day, and in complete control of their consciousness.</p>

<p>At best, our maximum capacity for thought in a day is a little more than half of the sensory information we receive every single second.</p>

<p>Focused, conscious attention is a very valuable, fragile, and limited resource.</p>

<p>But for most of my life, I’ve been expected to conjure it up on demand, every working day.</p>

<h2 id="how-knowledge-work-works">How Knowledge Work… Works</h2>

<div class="video-container">
	<video class="lazy-video" data-src="https://pub-f17786433d2849ff86c458a4019a0ed6.r2.dev/IdeaLoadingAnimation2.mp4"></video>
  </div>

<p>That’s because, for the past 20+ years I’ve written software systems.</p>

<p>This has proven challenging, as creating software requires bringing up a special sort of focused attention that can be exhausting to create and maintain, but also very easy to lose.</p>

<p>Successful software development requires consistently blocking distractions, mentally loading problems, and systematically planning a solution, without getting distracted.</p>

<p>Fortunately, in software you can usually tell very quickly whether or not your work has “solved” the problem.</p>

<p>But notably, without a <strong>minimum threshold of focused attention, it’s impossible to write any software</strong>.</p>

<p>Most other knowledge work is also like this. Without the ability to focus and load up problems into our mental framework, we can’t get anything done. Think of it as a hill. <strong>Without the ability to get over the hill of required initial attention, it’s impossible to get any actual knowledge work done.</strong></p>

<p>But, now that’s changing!</p>

<h2 id="enter-the-thinking-machines">Enter the Thinking Machines</h2>

<div class="video-container">
	<video class="lazy-video" data-src="https://pub-f17786433d2849ff86c458a4019a0ed6.r2.dev/curly.mp4"></video>
  </div>

<p>Because now, we’ve got a new alien intelligence to work with and against, called LLMs.</p>

<p>Despite being trained on massive, internet scale dataset, LLMs are like us, limited in their ability to hold a finite set of symbols in their mind when solving a given problem. We call these symbols “tokens”.</p>

<p>Each time we interact with them we have a limited window within which to explain and set the stage for our unique problems, using the language of their tokens. This is called a “context window”.</p>

<p>Each LLM is diferent, but for example, Claude has a context window size of 200,000 tokens for Pro users.</p>

<p>This means if we want to extract answers and explore ideas with Claude, we must do so within a 200,000 token context window.</p>

<h2 id="how-llms-load-context-for-your-problems">How LLMs Load Context for Your Problems</h2>

<p>This context for describing and navigating through our problem is so critical to the peformance of LLMs, that Anthropic has released a whole new protocol called “Model Context Protocol”, to allow developers to empower the LLM to autonomously decide when it needs to to grab more context for any given query.</p>

<div class="video-container">
	<video class="lazy-video" data-src="https://pub-f17786433d2849ff86c458a4019a0ed6.r2.dev/curly-mcp.mp4"></video>
  </div>

<p>This is because LLM model weights are “frozen” in production. The LLM doesn’t know of, and can’t reason on things that have changed since it was trained.</p>

<p><a href="https://modelcontextprotocol.io/introduction">Model Context Protocol</a> lets us change that, by exposing <a href="https://modelcontextprotocol.io/docs/concepts/tools">Tools</a>, <a href="https://modelcontextprotocol.io/docs/concepts/prompts">Prompts</a>, and <a href="https://modelcontextprotocol.io/docs/concepts/resources">Resources</a>.</p>

<p>The LLM models can use these resources to autonomously decide to search the web, access a database, or create new things like web pages. Again, just as long as each of these actions and their results fit within the Context Window.</p>

<p>This makes the LLMs much more powerful at solving novel problems autonomously.</p>

<h2 id="the-human-model-context-protocol-of-media">The Human Model Context Protocol of Media</h2>

<div class="video-container">
	<video class="lazy-video" data-src="https://pub-f17786433d2849ff86c458a4019a0ed6.r2.dev/MultipleRadarCharts.mp4"></video>
  </div>

<p>Similarly, we humans generally rely on media platforms to load new context into our own thinking spaces.</p>

<p>What we read, see, and watch largely explains the context we have for understanding and relating to the world.</p>

<p>So how do we decide what we load into our own personal context?</p>

<p>In 1976, a man named <a href="https://en.wikipedia.org/wiki/Schramm%27s_model_of_communication">Wilbur Schramm</a> created a theory for how people decide which media to consume:</p>

<blockquote>
  <p>Expected value divided by the required effort.</p>
</blockquote>

<p>At the time, Schramm saw television as a promising new medium.</p>

<p>Television could possibly educate an entire population at once, delivering sensory rich content (critical for effective learning), at a fixed production cost.</p>

<p>This content could be built by world experts, designed to have maximum educational impact, and shape a shared social context and value system for an entire population at a time.</p>

<p>Programs like <a href="https://en.wikipedia.org/wiki/Sesame_Street">Sesame Street</a> tried to achieve this goal, and educate the young population.</p>

<p>This dream came with a trade-off: the viewers’ focused attention for 30-minute blocks, interrupted by commercials, in exchange for polished entertainment.</p>

<p>For decades, this agreement worked.</p>

<p>But then social media arrived, and the math on our collective attention spans changed.</p>

<h2 id="attention-at-30-seconds-or-less">Attention at 30 Seconds or Less</h2>

<div class="video-container">
	<video class="lazy-video" data-src="https://pub-f17786433d2849ff86c458a4019a0ed6.r2.dev/media-decision1.mp4"></video>
  </div>

<p>Where television asked for thirty minute units of our attention, TikTok and Instagram demanded just thirty seconds for a novel experience.</p>

<p>Better still, once on these platforms, they <em>they elinimate the need for conscious choice.</em> Instead their algorithm can usually find a better piece of content than we’d be able to find ourselves, without any conscious effort on our end.</p>

<p>This might seem like a simple shift in duration and cost of decision, but it fundamentally breaks Schramm’s ratio and lowers our ability to choose media experiences outside of these algorithms.</p>

<p>Traditional media has to create shows and films to target a very large audience, in order for the cost of creation to pay itself back. Social media flipped this approach, making the audience fund their own entertainment, allowing the audience to discover and create much more niche content than previsouyl feasible in mass media.</p>

<p>When the required effort of a media platform approaches zero, and content is infinitely personalized, the other media choices become even less relevant.</p>

<p>Why invest attention elsewhere, for a lower possible reward on average?</p>

<p>TikTok’s unique breakthrough of steering recommendations based upon whether or not we swipe “next” on a video has created an algorithm that now has the average teenager spending 3.5 hours per day, locked in a feed.</p>

<p>(For reference, teenage television viewership peaked in 1995 at around 2.95 hours per day.)</p>

<p>Of course, we should mention there are positive associated benefits:</p>

<blockquote>
  <p>A majority of adolescents report that social media helps them feel more accepted
(58%), like they have people who can support them through tough times (67%),
like they have a place to show their creative side (71%), and more connected to
what’s going on in their friends’ lives (80%)</p>
</blockquote>

<p>But there’s also something fundamentally different about how social media affects their mental health.</p>

<p>Around ⅓ of teens report using social media “almost constantly”, and:</p>

<blockquote>
  <p>adolescents who spent more than 3 hours per day on social media faced double the risk of experiencing poor mental health outcomes including symptoms of depression and anxiety</p>
</blockquote>

<p>(There’s plenty more from the <a href="https://www.hhs.gov/sites/default/files/sg-youth-mental-health-social-media-advisory.pdf">Youth Mental Health Social Media Advisory</a>.)</p>

<p>So what could be responsible for this rapid increase in misery and isolation?</p>

<h2 id="schramms-model-of-how-we-communicate-with-each-other">Schramm’s Model of How we Communicate With Each Other</h2>

<div class="video-container">
	<video class="lazy-video" data-src="https://pub-f17786433d2849ff86c458a4019a0ed6.r2.dev/communicate.mp4"></video>
  </div>

<p>To answer this, we can look another model Schramm created for explaining <a href="https://en.wikipedia.org/wiki/Schramm%27s_model_of_communication">how communication occurs between people</a>.</p>

<p>Critically, when we want to send a message to someone, we must first encode it, using a our shared experience as a medium.</p>

<p>Platforms like television and books give us a common set of myths, used as a shorthand to tell our inner stories, and relate to one another.</p>

<p>But through this lens, social media seems to be a <strong>shared experience lowering tool</strong>.</p>

<p>Social media is an isolating system by design, as the feeds become increasingly personalized. The ideal feed shrinks to fit our unique needs, and minimizes the size of our shared collective experience, pushing each us into isolated, self-reinforcing idea bubbles.</p>

<p>Social media customizes itself using our non-verbal, non-thinking cues. Again, these happen quicker than our verbal, conscious thinking process. They continously steer us further away from a shared cultural reference point, and further split us into a set of feedback loops inducing emotional responses and increasing stickiness.</p>

<p>Calling the current social feeds “algorithms” doesn’t really cut it anymore. The recommender systems that being built are advanced forms of artificial intelligence, trained to addict us in exchange for the opportunity to sell some ad space.</p>

<h2 id="better-living-through-llms">Better Living Through LLMs</h2>

<div class="video-container">
	<video class="lazy-video" data-src="https://pub-f17786433d2849ff86c458a4019a0ed6.r2.dev/ContextWindowAnimation.mp4"></video>
  </div>

<p>Because once again, AI has an unequal footing with our biological systems. We can only consciously perceive deliberate thoughts at our 10 bits / second when we given them our full attention, but the software systems built around us operate at a much faster pace.</p>

<p>LLMs and future media could increasingly short circuit our more basic processes and turn us into media addicted zombies if left unsupervised.</p>

<p>We’ve all seen people in ideal sensory zones– the beach, concerts, on dates… choosing to be on their phone, rather than in their current physical space.</p>

<p>But what if we wanted to build a media platform that challenged us, challenged each other to be better, instead?</p>

<p>What might that look like? How might we be pulled away from the endless loop of personal gratification and distraction addiction?</p>

<p>How could we take the positive things from social media, and add them to the emerging, lower cost of thinking powers that LLMs give us?</p>

<h2 id="llms-are-already-a-new-media-platform">LLMs are Already a New Media Platform</h2>

<iframe width="560" height="315" src="https://www.youtube.com/embed/hjEAOlaauT0?si=jOmA-LgrwRGxsobG" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>

<p><br /></p>

<p>For me, the video above shows social media at its best.</p>

<p>When we work together to create something new, opening each of us to participate in something absurd and fun.</p>

<p>Although we’re still in the early innings with LLMs, they do seem to be actively rewriting large portions of how we interact with and build a new media.</p>

<p>We can see this when we ask LLMs to do work for us that seems tedious, or when we feel stuck on a problem. They lower the costs of asking a “dumb” question to zero.</p>

<p>This means instead of the avoiding the normal anxiety associated with avoiding a problem, we can ask an LLM to take a first stab at a problems. We can then step in to correct the places that feel wrong.</p>

<p>And LLMs at their best allow us to be more ambitious in our work, writing software we might not have otherwise written, or thinking ideas that would otherwise be inaccessible.</p>

<p><strong>LLMs lower the cost of thinking new ideas.</strong></p>

<h2 id="so-whats-the-problem">So what’s the problem?</h2>

<p>LLMs by default have the same social problems that we see with social media.</p>

<p>They are fundamentally shared context shrinking.</p>

<p>Again, added to this is the fact that they have a knowledge cutoff date, and are currently “frozen” in time.</p>

<p>Some models can incorporate search results at prompt time, but they cannot “learn” from their conversations with you in real time, and they cannot “learn” from the current cultural climate.</p>

<p>But they <em>have made</em> the process of <strong>exploring and linking new ideas together cheaper</strong>. LLMs allow us to explore new ideas with a fundamentally lower cost of required attention investment. We can ask an LLM to link together two disparate ideas, have them translated into our knowledge domain, and see whether or not the beginning results match what we’d hoped for.</p>

<p><strong>If we give LLMs the ability to better fill up their context windows with information about us, and about the people we care about, they don’t have to be socially isolating.</strong></p>

<p>This will allow us to collectively think a lot of new thoughts that previously would have had too high of an attention cost to explore effectively, while minimizing the social costs.</p>

<p>I dream of making the tool that would combine the fun of the video above with the power and flexibility of LLMs to explore new ideas. In fact, that’s what I’ve been working on.</p>

<h2 id="drawing-the-owl-of-the-next-media-platform">Drawing the Owl of the Next Media Platform</h2>

<iframe width="560" height="315" src="https://www.youtube.com/embed/bEDA9lkkGVM?si=1YFhvQJNc_oFE8N-" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>

<p><br /></p>

<p>Can a better media platform emerge from the best parts of social media and LLMs?</p>

<p>I’ve spent the past year and a half focused on this very idea, trying to come up with something that could be fundamentally better.</p>

<p>One of the last <a href="https://finance.yahoo.com/quote/ILLR/">social media platforms</a> I built started with video editing.</p>

<p>It’s become apparent that using LLMs to search through piles of video for us is useful. We can use Model Context Protocol and custom Tools to suss out the interesting bits, or all the times when something related to our ideas happens.</p>

<p>We can also isolate things in video now too. This combined with the idea exploration LLMs are so good at makes for a much more dynamic, interactive, experimental platform.</p>

<p>I’ve started to put these ideas together, but it’s still incredibly early, with a lot of unanswered questions.</p>

<p>For example, what is the LLM medium equivalent of the TikTok recommendation algorithm for ideas?</p>

<p>On the day I published this post, almost every top app in the app store was an LLM chat tool, or some media platform. (Temu and a VPN app were the only two non media platforms.)</p>

<p><strong>LLMs are already a new media platform, as proven by their spots in the app store. The question is how we will decide to use them going forward.</strong></p>

<p>If you’re interested in an alternative, you can sign up for the waitlist for some new tools at <a href="https://www.video-jungle.com/">video-jungle.com</a>.</p>

<p>You can also reach out, and follow me on the existing media platforms. Because they are where everyone is, for now.</p>

<p>In the meantime, in order to reach people we will need to continue to perform, speak, and act in the ways the addiction algorithms reward.</p>

<p>Until we don’t.</p>

<p><em>Thanks to Erica Dohring for reading an early version of this blog post.</em></p>

<p><em>All the animations on this blog post were made with the help of Claude and Manim, and the source code for them lives on <a href="https://github.com/burningion/manim-animations/">Github</a>.</em></p>

<script>
  	const optimizer = new VideoOptimizer({
  	videoSelector: '.lazy-video',
  	threshold: 0.1,
  	rootMargin: '100px'
	});
  </script>]]></content><author><name>Kirk Kaiser</name></author><summary type="html"><![CDATA[(And what we might be able to do about it with the help of LLMs)]]></summary></entry><entry><title type="html">What I’ve Learned in the Past Year Spent Building an AI Video Editor</title><link href="/blog/a-year-of-showing-up/" rel="alternate" type="text/html" title="What I’ve Learned in the Past Year Spent Building an AI Video Editor" /><published>2024-08-27T00:00:00+00:00</published><updated>2024-08-27T00:00:00+00:00</updated><id>/blog/a-year-of-showing-up</id><content type="html" xml:base="/blog/a-year-of-showing-up/"><![CDATA[<h2 id="an-unexpected-year-spent-in-ai">An Unexpected Year Spent in AI</h2>

<p><a href="https://github.com/burningion/"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/d393e5a5-3cbc-47b8-9f17-90ebf3f21300/webscale" alt="Github contributions for the past year" /></a></p>

<p>Last year I was let go after just 6 months in a new role.</p>

<p>I had left a great company and boss to take a chance on a startup, and before I’d even begun, it was over.</p>

<p>I decided to take the event as an opportunity, and explore what was now becoming possible in video with LLMs, Diffusion models, and the growing number of other open models.</p>

<p>See, years ago I’d helped build a <a href="https://en.wikipedia.org/wiki/Triller_(app)#Launch_and_Early_years">generative video editor that became a unicorn</a>, and had ideas from then I’d wanted to see built.</p>

<p>These ideas were mostly unreasonable back in 2015, but given LLM and computer vision model progress, were now becoming possible.</p>

<h2 id="the-gpu-crunch-and-local-first-multi-modal-generative-ai">The GPU Crunch and Local First, Multi-Modal Generative AI</h2>

<p><a href="https://www.makeartwithpython.com/blog/building-an-ai-video-editor/"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/5fd535b2-841a-41a6-6ae0-4161706b0a00/webscale" alt="Local Editor" /></a></p>

<p>So I initially focused on building a local video editor improved with multi-modal artificial intelligence. It used <a href="https://segment-anything.com">computer vision</a> to detect, extract, and track objects in video, combined with Diffusion models to add and animate new objects in to videos.</p>

<p>I’d previously done daily video sketches 5 years ago, using <a href="https://github.com/matterport/Mask_RCNN">Mask-RCNN</a>, experimenting with skateboard videos:</p>

<center><blockquote class="instagram-media" data-instgrm-permalink="https://www.instagram.com/p/BgIPFGsleyT/?utm_source=ig_embed&amp;utm_campaign=loading" data-instgrm-version="14" style=" background:#FFF; border:0; border-radius:3px; box-shadow:0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width:540px; min-width:326px; padding:0; width:99.375%; width:-webkit-calc(100% - 2px); width:calc(100% - 2px);"><div style="padding:16px;"> <a href="https://www.instagram.com/p/BgIPFGsleyT/?utm_source=ig_embed&amp;utm_campaign=loading" style=" background:#FFFFFF; line-height:0; padding:0 0; text-align:center; text-decoration:none; width:100%;" target="_blank"> <div style=" display: flex; flex-direction: row; align-items: center;"> <div style="background-color: #F4F4F4; border-radius: 50%; flex-grow: 0; height: 40px; margin-right: 14px; width: 40px;"></div> <div style="display: flex; flex-direction: column; flex-grow: 1; justify-content: center;"> <div style=" background-color: #F4F4F4; border-radius: 4px; flex-grow: 0; height: 14px; margin-bottom: 6px; width: 100px;"></div> <div style=" background-color: #F4F4F4; border-radius: 4px; flex-grow: 0; height: 14px; width: 60px;"></div></div></div><div style="padding: 19% 0;"></div> <div style="display:block; height:50px; margin:0 auto 12px; width:50px;"><svg width="50px" height="50px" viewBox="0 0 60 60" version="1.1" xmlns="https://www.w3.org/2000/svg" xmlns:xlink="https://www.w3.org/1999/xlink"><g stroke="none" stroke-width="1" fill="none" fill-rule="evenodd"><g transform="translate(-511.000000, -20.000000)" fill="#000000"><g><path d="M556.869,30.41 C554.814,30.41 553.148,32.076 553.148,34.131 C553.148,36.186 554.814,37.852 556.869,37.852 C558.924,37.852 560.59,36.186 560.59,34.131 C560.59,32.076 558.924,30.41 556.869,30.41 M541,60.657 C535.114,60.657 530.342,55.887 530.342,50 C530.342,44.114 535.114,39.342 541,39.342 C546.887,39.342 551.658,44.114 551.658,50 C551.658,55.887 546.887,60.657 541,60.657 M541,33.886 C532.1,33.886 524.886,41.1 524.886,50 C524.886,58.899 532.1,66.113 541,66.113 C549.9,66.113 557.115,58.899 557.115,50 C557.115,41.1 549.9,33.886 541,33.886 M565.378,62.101 C565.244,65.022 564.756,66.606 564.346,67.663 C563.803,69.06 563.154,70.057 562.106,71.106 C561.058,72.155 560.06,72.803 558.662,73.347 C557.607,73.757 556.021,74.244 553.102,74.378 C549.944,74.521 548.997,74.552 541,74.552 C533.003,74.552 532.056,74.521 528.898,74.378 C525.979,74.244 524.393,73.757 523.338,73.347 C521.94,72.803 520.942,72.155 519.894,71.106 C518.846,70.057 518.197,69.06 517.654,67.663 C517.244,66.606 516.755,65.022 516.623,62.101 C516.479,58.943 516.448,57.996 516.448,50 C516.448,42.003 516.479,41.056 516.623,37.899 C516.755,34.978 517.244,33.391 517.654,32.338 C518.197,30.938 518.846,29.942 519.894,28.894 C520.942,27.846 521.94,27.196 523.338,26.654 C524.393,26.244 525.979,25.756 528.898,25.623 C532.057,25.479 533.004,25.448 541,25.448 C548.997,25.448 549.943,25.479 553.102,25.623 C556.021,25.756 557.607,26.244 558.662,26.654 C560.06,27.196 561.058,27.846 562.106,28.894 C563.154,29.942 563.803,30.938 564.346,32.338 C564.756,33.391 565.244,34.978 565.378,37.899 C565.522,41.056 565.552,42.003 565.552,50 C565.552,57.996 565.522,58.943 565.378,62.101 M570.82,37.631 C570.674,34.438 570.167,32.258 569.425,30.349 C568.659,28.377 567.633,26.702 565.965,25.035 C564.297,23.368 562.623,22.342 560.652,21.575 C558.743,20.834 556.562,20.326 553.369,20.18 C550.169,20.033 549.148,20 541,20 C532.853,20 531.831,20.033 528.631,20.18 C525.438,20.326 523.257,20.834 521.349,21.575 C519.376,22.342 517.703,23.368 516.035,25.035 C514.368,26.702 513.342,28.377 512.574,30.349 C511.834,32.258 511.326,34.438 511.181,37.631 C511.035,40.831 511,41.851 511,50 C511,58.147 511.035,59.17 511.181,62.369 C511.326,65.562 511.834,67.743 512.574,69.651 C513.342,71.625 514.368,73.296 516.035,74.965 C517.703,76.634 519.376,77.658 521.349,78.425 C523.257,79.167 525.438,79.673 528.631,79.82 C531.831,79.965 532.853,80.001 541,80.001 C549.148,80.001 550.169,79.965 553.369,79.82 C556.562,79.673 558.743,79.167 560.652,78.425 C562.623,77.658 564.297,76.634 565.965,74.965 C567.633,73.296 568.659,71.625 569.425,69.651 C570.167,67.743 570.674,65.562 570.82,62.369 C570.966,59.17 571,58.147 571,50 C571,41.851 570.966,40.831 570.82,37.631"></path></g></g></g></svg></div><div style="padding-top: 8px;"> <div style=" color:#3897f0; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:550; line-height:18px;">View this post on Instagram</div></div><div style="padding: 12.5% 0;"></div> <div style="display: flex; flex-direction: row; margin-bottom: 14px; align-items: center;"><div> <div style="background-color: #F4F4F4; border-radius: 50%; height: 12.5px; width: 12.5px; transform: translateX(0px) translateY(7px);"></div> <div style="background-color: #F4F4F4; height: 12.5px; transform: rotate(-45deg) translateX(3px) translateY(1px); width: 12.5px; flex-grow: 0; margin-right: 14px; margin-left: 2px;"></div> <div style="background-color: #F4F4F4; border-radius: 50%; height: 12.5px; width: 12.5px; transform: translateX(9px) translateY(-18px);"></div></div><div style="margin-left: 8px;"> <div style=" background-color: #F4F4F4; border-radius: 50%; flex-grow: 0; height: 20px; width: 20px;"></div> <div style=" width: 0; height: 0; border-top: 2px solid transparent; border-left: 6px solid #f4f4f4; border-bottom: 2px solid transparent; transform: translateX(16px) translateY(-4px) rotate(30deg)"></div></div><div style="margin-left: auto;"> <div style=" width: 0px; border-top: 8px solid #F4F4F4; border-right: 8px solid transparent; transform: translateY(16px);"></div> <div style=" background-color: #F4F4F4; flex-grow: 0; height: 12px; width: 16px; transform: translateY(-4px);"></div> <div style=" width: 0; height: 0; border-top: 8px solid #F4F4F4; border-left: 8px solid transparent; transform: translateY(-4px) translateX(8px);"></div></div></div> <div style="display: flex; flex-direction: column; flex-grow: 1; justify-content: center; margin-bottom: 24px;"> <div style=" background-color: #F4F4F4; border-radius: 4px; flex-grow: 0; height: 14px; margin-bottom: 6px; width: 224px;"></div> <div style=" background-color: #F4F4F4; border-radius: 4px; flex-grow: 0; height: 14px; width: 144px;"></div></div></a><p style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; line-height:17px; margin-bottom:0; margin-top:8px; overflow:hidden; padding:8px 0 7px; text-align:center; text-overflow:ellipsis; white-space:nowrap;"><a href="https://www.instagram.com/p/BgIPFGsleyT/?utm_source=ig_embed&amp;utm_campaign=loading" style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px; text-decoration:none;" target="_blank">A post shared by Kirk Kaiser (@zothcorp)</a></p></div></blockquote> <script async="" src="//www.instagram.com/embed.js"></script></center>

<p>These video sketches previously allowed me to explore the medium of AI-assisted video editing, without any strong expectations.</p>

<p>I assumed building a tool to continue exploring this work would prove fruitful:</p>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/b95dc1ca-e677-4068-c588-8e13b0ae0200/public" alt="The Editor at Work" /></p>

<p>And indeed it did! I was soon playing with video as a more fluid medium, one that felt a bit more editable. I began to understand how the new vision models worked, and how the GPU could be used to speed up rendering, inference, and video.</p>

<p>By using a combination of models I was able to create a prompt for adding unique, diffusion generated objects into videos, already masked off.</p>

<p>You can read about <a href="https://www.makeartwithpython.com/blog/building-an-ai-video-editor/">that process</a> in a previous blog post.</p>

<h2 id="a-sidequest-for-safety-not-the-ai-kind">A Sidequest for Safety (Not the AI Kind)</h2>

<p><a href="https://makeartwithpython.com/blog/submitting-sbir-application/"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/fb3853a3-f361-4d15-0b01-da8a53768b00/webscale" alt="Bicyclist Safety" /></a></p>

<p>But as I was building out computer vision pipelines and prototypes for the video editor, I experienced a string of tragic local deaths.</p>

<p>Bicyclists and pedestrians kept getting hit by cars.</p>

<p>So as a side project, I started researching cyclist safety, and soon discovered just how terrible the pedestrian infrastructure is in the United States. I wondered if there wasn’t maybe a technical solution to reduce or eliminate these deaths, as the statistics showed they’re rapidly increasing.</p>

<p>So on a whim I put together a proposal to address this using artificial intelligence and robotics, and submitted it to the <a href="https://www.sbir.gov/">NSF’s SBIR program</a>.</p>

<p>To my surprise, they invited me to submit a formal, <a href="https://new.nsf.gov/funding/opportunities/nsf-small-business-innovation-research-small-0">Phase I proposal</a>.</p>

<p>If accepted, this meant I could get up to $2 million to pursue and develop my technology, <em>without</em> the government taking any equity.</p>

<p><strong>So of course, I did that.</strong></p>

<p>It took a <a href="https://www.makeartwithpython.com/blog/submitting-sbir-application/">few months worth of work</a>, and brought me far out of my comfort zone.</p>

<p>But! One of the conditions of submitting my proposal was that I had to pause all of my open source work related to the project, as the government couldn’t give me a grant for work already done.</p>

<p>This frustrated me, as pedestrians continued to die in my town. I felt guilty for each additional person who got hit.</p>

<p>As a consolation, since I’d already done the tedious paperwork to form and qualify a company to accept government grants like the SBIR (via <a href="https://sam.gov/content/home">SAM.gov</a>), most of the labor to create another proposal was already taken care of.</p>

<p>So I also submitted an SBIR proposal to the <a href="https://its.dot.gov/csai/">Department of Transportation for Complete Streets AI</a>.</p>

<p>This proposal imagined using smartphones and computer vision to help fill in gaps of pedestrian infrastructure knowledge at the DOT.</p>

<p>6+ months later, I’d found out the final answer from both the NSF and DOT:</p>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/1b609042-4aa5-4958-0cb9-42c85986a300/public" alt="Being Rated by Anonymous Reviewers Sucks." /></p>

<p><strong>“No.”</strong></p>

<p>(On a positive note, this means I’m once again able to be <a href="https://github.com/burningion/bicyclist-defense-jetson/">public</a> about this work, and solicit help.)</p>

<h2 id="taking-a-step-back-from-the-obvious">Taking a Step Back From the Obvious</h2>

<p>And after six months of working on the local video editor, I also hit a wall.</p>

<p><a href="https://www.magiceye.com/stwkdisp.htm"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/b7a79f9b-3506-420d-d76f-3334ebceb600/public" alt="If you Stare into a Screen Long Enough, You can See the Future" /></a></p>

<p>It became apparent that AI as a layer “on top of” the existing video editor workflows didn’t make much sense, given how different ML workflows incorporating large language models had become, and how much engineering had gone into <em>everything else</em> around modern, flagship desktop video editors.</p>

<p>More powerful vision and audio models could of course be used to add features that reduce toil in existing workflows, but the underlying assumptions behind the user interface of the video editors seemed to be constraining the discovery of potential new methods for video creation, <em>and more importantly, the evolution of video as a medium</em>.</p>

<p>It seemed the process of video creation itself had to be rethought, using the power and possibility of LLMs, multi-modal embeddings / search, and computer vision / diffusion models as a collaborator.</p>

<p>Which led me to a thought:</p>

<p><strong>What if video was more personal? More maleable? More collaborative?</strong></p>

<p>This meant rethinking current video editing workflows according to the strengths of these models.</p>

<h2 id="building-a-generative-video-platform">Building A Generative Video Platform</h2>

<p>So I went back to the drawing board, and rethought the whole concept of a video editor. When I did, a thought occured to me:</p>

<blockquote>
  <p>What if instead of a <em>single</em> video out of a video editing process we had a <strong>video generator</strong> out of the video editing process?</p>
</blockquote>

<p>What if instead of a single, static artifact at the end of editing, we had a video generator, capable of rendering each video tailored to the viewer? Created on demand, according to unique the specific needs of the viewer?</p>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/5bc9c33d-5504-45fa-dae2-65255f986f00/webscale" alt="Generative Video Pipeline" /></p>

<p>What if we allowed for the user to collaborate in the experience of their video?</p>

<p>I’d imagined videos would no longer be static outputs, and would instead be like code: dynamic and generated specifically for the viewer, or a specific audience, video becoming more of a medium for play and interaction, rather than the typical, passive consumption model.</p>

<h2 id="creating-a-dynamic-video-generation-pipeline-with-promptflow">Creating a Dynamic Video Generation Pipeline with Promptflow</h2>

<p>With that, I started building a new prototype of generative video using Microsoft’s LLM framework <a href="https://github.com/microsoft/promptflow">Promptflow</a>. It allows mixing calls to an LLM with code, building up whole graph based pipelines for generative AI workflows.</p>

<p><a href="https://microsoft.github.io/promptflow/concepts/concept-flows.html"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/4815de71-3fd2-4fbb-d981-b70f2d94bf00/webscale" alt="Promptflow DAG" /></a></p>

<p>You define these workflows via a <code class="language-plaintext highlighter-rouge">yaml</code> file, in which you can express your variables you’d like passed in to your prompts to your ChatGPT tool. The results from these prompts can then be passed back to Python, or used to gnerate more LLM calls.</p>

<p>With this tool as a basis, I built an initial Horoscope video generator, using the most basic generative approaches to video generation. It took a prompt, injected the user’s variables, called out to an LLM to generate a video script, and then generated images, transformed them into videos, added a voice narrator, subtitles, and finally, put together a video edit.</p>

<p>The flow looks something like this:</p>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/64f92c17-3a60-4c38-4b72-261dd17d1900/webscale" alt="Early Promptflow Prototype" /></p>

<p>In this generator, the Video Generator takes in an <strong>Astrological Sign</strong>, a <strong>Date</strong>, and a <strong>Random Seed</strong>. (LLMs are known to have <a href="https://www.alignmentforum.org/posts/t9svvNPNmFf5Qa3TA/mysteries-of-mode-collapse">difficulty</a> in generating <a href="https://people.csail.mit.edu/renda/llm-sampling-paper">random numbers</a>.)</p>

<p>These are all used to run a pipeline that generates a <strong>unique, on demand video horoscope reading for the user</strong>.</p>

<p>From this initial prototype, I immediately ran into a few limitations.</p>

<p>The Promptflow design expected users to build wrappers on a service like ChatGPT, with mostly static flows. Think of things like customer service bots, using RAG to fill in dynamic information necessary to answer a query.</p>

<p>This didn’t match the level of dynamic video generation and editing processes I envisioned getting to. The design and writing of these static flows didn’t feel like the right layer of abstraction.</p>

<p><strong>So I switched my Promptflow over to a different workflow execution engine, <a href="https://temporal.io">Temporal</a>.</strong></p>

<h2 id="generative-workflows-with-temporal">Generative Workflows with Temporal</h2>

<p>Temporal allowed me to restructure the Generative processes I was building with <a href="https://temporal.io/blog/building-reliable-distributed-systems-in-node-js-part-2">durable execution</a> as a primitive.</p>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/95d1c046-0f45-452f-866e-4d2703b00b00/webscale" alt="Workflow of Video Analysis" /></p>

<p>Rather than building for static graphs of execution, I could build out individual tools / processes, and later allow for the user to decide <em>how</em> to link and execute the tools together for their specific process. (In Temporal, these are called <a href="https://docs.temporal.io/workflows">“Workflows”</a>, and come with automatic retries and more).</p>

<p>With these as a base, writing ML generation workflows becomes a bit more straightforward. I could define Activities as discreet units with retry, and synchronize the execution of these Activites via Workflows.</p>

<p>To give an example, let’s say we want to analyze a video file that has been uploaded. (We want to know what’s in the video as, as well as generate embeddings, text, and tags.)</p>

<p>There are <em>many</em> places where a failure could occur. A web request may fail to download a file, a file may have an unsupported encoding, or a GPU machine may not be schedulable. Each of these events would normally require logic to handle failure, have a set number of retries, and decide how to gracefully fail.</p>

<p>With <a href="https://temporal.io/">Temporal</a>, we instead set our number of retries and failure conditions across each activity. The Temporal platform automatically handles retries when things go wrong:</p>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/ba6da346-832c-4d6a-bea0-f3105f2d7f00/public" alt="Workflow automatically retrying 10 times" /></p>

<p>This is especially useful during development, where if an error occurs, I can usually spot and fix it, and then restart the worker.</p>

<p>The workflow then retries from the last succesful execution, and usually completes.</p>

<p>This substantially speeds up the development flow for my graph execution workflows.</p>

<h2 id="the-challenges-of-building-with-sometimes-unpredictable-llms">The Challenges of Building with (sometimes unpredictable) LLMs</h2>

<p><a href="https://docs.anthropic.com/en/docs/quickstart#start-with-the-workbench"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/29c748d6-8353-4b27-b94d-a5242c60be00/webscale" alt="Anthropic Workbench" /></a></p>

<p>Creating language model prompts for software workflows is a fuzzy process.</p>

<p>Designing a prompt requires significant time investment, to understand how your chosen model performs for your specific use case. And even once you’ve decided on a prompt, each model seems to have it’s own quirks about how it interprets and decides whether and how to to follow instructions.</p>

<p>This means one prompt may end up being more appropriate for one specific model, versus another.</p>

<p>It’s tough to tell ahead of time if one model may be more appropriate for your task than another.</p>

<p>People try to address this by writing <a href="https://cookbook.openai.com/examples/evaluation/getting_started_with_openai_evals">evals</a>.</p>

<p>Evals are tests to see whether or not an LLM comes up with an appropriate answer. You can either write evals using code, or by asking the LLM to determine whether or not its answers are correct.</p>

<p>To help with this, Anthropic now has what it calls a <a href="https://console.anthropic.com/workbench">“Workbench”</a>, from which you can use Claude to generate and analyze specific prompts:</p>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/c8161d24-8d66-423e-9ab0-16c0c16b8300/public" alt="Analysis for your test set in prompts" /></p>

<p>Using Workbench allows you to get a feel for how you can approach your chosen prompt task, and what sort of outputs you can expect while developing.</p>

<p>You can quickly evaluate the performance of these generated prompts against one another within the user interface.</p>

<p>Thanks to the example prompts from Workbench, I added a process for generating prompts into my Video Generator, using Anthropic’s <a href="https://github.com/anthropics/anthropic-cookbook/blob/main/misc/metaprompt.ipynb">metaprompting example</a> from Github.</p>

<p>These metaprompts are a great starting point for helping users get started with a prompt template to build from.</p>

<p><strong>Amazingly, asking an advanced model to generate its own prompts seems to mostly work.</strong></p>

<h2 id="embeddings-might-not-be-the-solution-you-think-they-are">Embeddings Might Not Be the Solution You Think They Are</h2>

<p><a href="https://vickiboykis.com/what_are_embeddings/index.html"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/86582e3e-f39d-4c37-d0ab-fd3ec846f600/public" alt="Embeddings Model from What Are Embeddings?" /></a></p>

<p><a href="https://vickiboykis.com/what_are_embeddings/index.html">Vicky Boykis</a> has written an amazing, free book on building embeddings models.</p>

<p><strong>Prior to reading it, I assumed vector databases would dominate search and retrieval for anyone working with LLMs.</strong> The hype sold in 2023 was that vector databases would be the future of information retrieval.</p>

<p>But as I began working with embeddings and vector databases, the results didn’t seem to match to the hype.</p>

<p>As I dug in, I discovered this is because embeddings are fundamentally a <em>compressing</em> technology, squashing the unique features in your dataset into a fixed length vector output, across the embedding space.</p>

<p>How well these dimensions map to the data related to your business use case depends on how well you’ve built your embedding space.</p>

<p>But! Most people getting started <em>aren’t</em> training their own embeddings models for their specific use case, and are instead relying on off the shelf, generic embeddings models to apply to their business problems.</p>

<p>This blind application of generalized embeddings models over traditional search can lead to worse results, and less easily debuggable systems.</p>

<p>Let’s take a concrete example.</p>

<h2 id="a-song-search-example">A Song Search Example</h2>

<p><strong>Years ago I built a search engine for songs.</strong></p>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/07193c6d-399c-4675-d8c9-b22448e15100/webscale" alt="Search Example" /></p>

<p>One of the challanges I faced was bootstrapping relevant results, for very generic search terms.</p>

<p>See, it turns out <strong>most song titles aren’t very unique</strong>, and so naive text search is terrible on song names, albums, and artists.</p>

<p>Because of this, a generic text embeddings model would be especially challenged to give decent results.</p>

<p>Say our user is searching for the term “stop”:</p>

<p>There may be tens of thousands of songs, albums, and artists with the word “stop” in it.</p>

<p><strong>How do you begin to determine which ones should be most relevant?</strong></p>

<p>To solve the problem, I turned to music top charts. These have been published <a href="https://en.wikipedia.org/wiki/Billboard_charts">since the 40’s</a>, and contain some of the most important songs, culturally.</p>

<p>By adding a weight or bias score to the songs previously in the top charts, I could help bootstrap an initial search system.</p>

<p>If I had instead started with an embeddings model, I’m not sure I would have as easily built a solution. Maybe an off the shelf embedding model already has partial knowledge of the top charts, but how much?</p>

<p>Similarly, in building an automatic video editor, I’ve discovered it’s necessary to have a mix of embeddings models, along with traditional search ideas, and a bit of domain specific ideas / experimentation.</p>

<h2 id="the-wonderful-totally-great-process-of-building-something-new">The Wonderful, Totally Great Process of Building Something New</h2>

<p><a href="https://en.wikipedia.org/wiki/The_Myth_of_Sisyphus#Chapter_4:_The_Myth_of_Sisyphus"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/231b15ae-3158-4eff-e251-b6e9fc71a900/public" alt="Showing up" /></a></p>

<p>Any time I see someone finish a thing, I try to go out of my way to congratulate them.</p>

<p>Getting <em>anything</em> out the door always includes an unseen number of challenges, and entropy works against all of us.</p>

<p>So of course, as I’ve worked towards a vision of building something new the past year, I’ve taken a few detours.</p>

<p>There is a great quote from Jenson Huang, when asked about whether or not he’d start NVIDIA again, he says he wouldn’t, because he now knows how difficult it is:</p>

<!-- Courtesy of embedresponsively.com //-->
<div class="responsive-video-container">

  <iframe src="https://www.youtube.com/embed/URgncvVxxFU" frameborder="0" allowfullscreen=""></iframe>

</div>

<p><strong>Similarly, over the past year I’ve wondered if I’ve been too selfish, too naive in attempting to build something new on my own, rather than build off my existing success and luck, and playing it safe with a full time job.</strong></p>

<p>I don’t know the answer yet, but I am grateful for the chance to find out.</p>]]></content><author><name>Kirk Kaiser</name></author><summary type="html"><![CDATA[Lessons from An Unexpected Year in AI]]></summary></entry><entry><title type="html">$2 million for your idea with no equity– writing your first SBIR application</title><link href="/blog/submitting-sbir-application/" rel="alternate" type="text/html" title="$2 million for your idea with no equity– writing your first SBIR application" /><published>2024-03-04T00:00:00+00:00</published><updated>2024-03-04T00:00:00+00:00</updated><id>/blog/submitting-sbir-application</id><content type="html" xml:base="/blog/submitting-sbir-application/"><![CDATA[<h1 id="discovering-americas-seed-fund">Discovering America’s Seed Fund</h1>

<p><a href="https://www.makeartwithpython.com/blog/building-a-data-pipeline-for-robotics/">In the last post</a>, I introduced how I took my idea for a <a href="https://www.makeartwithpython.com/blog/building-a-remote-controlled-skate-ramp/">self-driving skatepark</a>, and turned it into an invitation to submit an SBIR Phase I proposal.</p>

<p>In this post, I’ll go through the process of preparing and submitting that Phase I proposal from scratch, as someone who had never written a grant proposal before.</p>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/4d7a3faf-dda8-49f8-32e2-2878b4ff9d00/webscale" alt="Americas Seed Fund" /></p>

<p>If you’re not familiar, the SBIR program gives up to $2,000,000 (with zero dilution!) to small businesses based in the United States to develop and bring to market potentially breakthrough technology.</p>

<p>It starts with a <a href="https://seedfund.nsf.gov/project-pitch/">Project Pitch</a>, where someone from the NSF will take your idea, and see whether or not it will be a fit for their currently open programs. My initial proposal was a self-driving robot to protect cyclists, under the AI solicitation, and it was accepted.</p>

<p>When I received that acception email, I immediately became intimidated.</p>

<p>I assumed submitting a formal proposal would mean writing something very similar to an academic paper, something I’d never done before. (I didn’t attend university, instead opted to live in central america and farm for a few years.)</p>

<h1 id="a-stroke-of-luck-meeting-an-expert">A Stroke of Luck, Meeting an Expert</h1>

<p>In a stroke of luck, I met someone who had received two Phase I proposals a few weeks later. This person was an amazing help, and shared their full, successful proposals. This gave me a giant leg up, boosted my confidence, and important context for just how high the standards are for acceptance.</p>

<p>When I read his proposals, it became clear successful proposals <em>do</em> read like an academic papers.</p>

<p>But beyond being academic, these papers also had make a convincing argument for a potential technical breakthrough, show a sustainable business from said breakthrough, show a market need for the technology, and build a realistic budget and team to execute the vision.</p>

<p>This is a lot to take on!</p>

<p>My industry experience has taught me that ideas and strategies must first be tested in the real world. So I set out to start building the physical pieces of what would eventually become my strategy with the SBIR application.</p>

<h1 id="researching-the-problem-space">Researching the Problem Space</h1>

<p>So I started with my initial goal: How do we protect cyclists?</p>

<p>As I dove in to the data around cyclist safety, I found that cyclists and pedestrians <a href="https://www.bicycling.com/news/a46119463/pedestrian-and-cyclist-deaths-at-night-on-the-rise/">tend to get hit at night</a>. Given that computer vision doesn’t work at night (low light sensors are expensive, and infrared models need high powered lights and to be custom trained), my original computer vision only approach wouldn’t work.</p>

<p>Instead, I’d need a sensor that could augment the computer vision pipeline I’d originally applied with in order to truly be effective.</p>

<p>It turns out there’s a sensor that works in the dark, and is capable of augmenting computer vision.</p>

<p>mmWave radar allows for accurate tracking of objects that are occluded, and allows for more accurate speed estimation.</p>

<p>Using a computer vision pipeline augmented with mmWave radar would allow for an extremely resilient approach to tracking objects around a cyclist.</p>

<p>Luckily, the investment in self-driving vehicles has created a market for low-cost mmWave sensors. This means we can take off the shelf sensors meant for tracking vehicles while driving, and repurpose them for bicycle protection.</p>

<h1 id="applying-the-problem-to-the-state-of-the-art-in-artificial-intelligence">Applying the Problem to the State of the Art in Artificial Intelligence</h1>

<p>Of all the industries I know of with rapid change, artificial intelligence in late 2023 and early 2024 have to be in the lead.</p>

<p>Over the course of the month’s research, state of the art architectures for route planning and object tracking moved from transformers, to diffusion, back to diffusion transformers. I continuously updated what my technical approach would be, given the improvements I read from the papers. I tried out a set of models that would work together to target my approach.</p>

<p>As I worked through it, a part of my solution would involve training a custom model. This meant I had to have a realistic approach for how I’d gather my data, train my model, and test that my model would indeed perform well in the expected environment.</p>

<h1 id="bootstrapping-a-data-pipeline">Bootstrapping A Data Pipeline</h1>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/fb3853a3-f361-4d15-0b01-da8a53768b00/webscale" alt="recording from a cycle run" /></p>

<p>How do we bootstrap a training set to build a model (or ensemble of models) to protect cyclists?</p>

<p>I decided that I’d need to collect data from a fleet of cyclists in the real world. It seemed a bicycle mounted device would be best way to collect a dataset.</p>

<p>So I built a prototype using an NVIDIA Jetson Orin Nano, a depth camera, a DeWalt 20v battery, and a mmWave radar. I built a <a href="https://github.com/burningion/bicyclist-defense-jetson">pipeline for recording bicyclists</a>, and scoped out a computer vision pipeline for live tracking of hazards.</p>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/d8b1112b-5132-4aed-bfe1-216878cdd600/webscale" alt="cyclist prototype" /></p>

<p>As I was writing this up for my proposal, I ran into a cyclist who had an interesting device attached to his bicycle. I asked him about it, and it was a radar device from Garmin.</p>

<p>It seemed my idea for mmWave radar was on the right path.</p>

<h1 id="doing-the-paperwork">Doing the Paperwork</h1>

<p>But as the technical details were progressing, the necessary paperwork was also there, waiting.</p>

<p>The NSF has a few requirements for the company who applies for funding. In the extremely helpful webinars they give, they go over all of the needs for your company.</p>

<p>In particular, one of most frustrating parts of the paperwork process is getting a UEI from Sam.gov. You need to provide a set of documents that identify your company, and then follow up in the web application to ensure the UEI is actually created. I almost missed this, because it’s not clear in the UI which stage you’re in, and how you progress!</p>

<p>Additionally, you need to come up with a budget for your proposal. In the webinars (again, extremely helpful!), they emphasized that they’d help with creating a realistic proposal if they were interested in funding the approach.</p>

<p>Despite this, I took the time to try and find what I thought would be a realistic approach. Thinking through the process with the budget to accomplish my Phase I goals actually helped a lot with my overall strategy.</p>

<h1 id="my-biggest-challenge-industry-support">My Biggest Challenge, Industry Support</h1>

<p>A key part of a successful Phase I proposal is having industry recommendations. They’re not required, but highly encouraged. This can take the form of a letter from another company saying there is a market for the product, should it come to market.</p>

<p>This is one area where I really struggled. I’d say I failed here if completely honest.</p>

<p>I started by emailing the person in charge of cyclist safety in Florida. It turns out there’s an elected official who’s entire job is to minimize cyclist deaths.</p>

<p>After five emails and five phone calls to his office and assistants, I got nothing back from this person.</p>

<p>Frustrated, I turned to my contact who’d already received two Phase I proposals. He said it would be hard to get a politician to say something on the record, and recommended I reach out to a local cycling shop.</p>

<p>The first cycling shop I went to, they immediately went into a diatribe about how cyclists getting hit was their own fault. When I presented my approach for reducing cyclist deaths, they simply said “we don’t do technology”.</p>

<p>I went to another bicycle shop, and this time they immediately “got” it. They’d ridden earlier in the day with their Garmin radar device, and knew about the cyclists who’d been hit in the past month.</p>

<p>It seemed after a frustrating few first interactions I’d finally found my letter of support.</p>

<p>This turned out to not be the case.</p>

<p>Despite meeting multiple times, they never followed up. As the deadline window was approaching, I submitted without their letter of recommendation.</p>

<p>In hindsight, I should have been more aggressive about this key component. I wanted to wait until I had a clear vision of what the device would look like, and by the time that vision was there, it was too late in the process to really have time to court a company.</p>

<p>If I could do it again, I’d start with a “less big” ask for a few companies at the beginning. I’d try to get them to publicly commit to public safety, and build a relationship from there. It seemed it was a bridge too far to cold ask for support for such an alien process.</p>

<h1 id="pressing-the-submit-button">Pressing the Submit Button</h1>

<p>By this point, I had what I hoped was a great proposal that matched the style and rigor of the proposal that had been shared with me. I was proud, and confident it was as good as I’d be able to muster.</p>

<p>When I went to submit my proposal in the actual portal for submissions, it became apparent I’d written it in the wrong format. It turned out that the grants.gov site expects the proposal to be broken up into specific PDF pieces. I rewrote what I’d written again, and made it fit the format they’d expected.</p>

<p>But once I’d finished it up, I still had a few days remaining. I tried once more to get a letter from a business, but just didn’t get one in time.</p>

<p>I submitted my proposal despite this, we’ll see whether or not it matters.</p>

<h1 id="was-it-worth-it">Was it worth it?</h1>

<p>The SBIR process requires a person be in a unique situation. They must have a small business, and be willing to commit to working on the proposed project at least 50% of the time for the next 6 months.</p>

<p>They also need a market opportunity, and a team that seems capable of both building a breakthrough technology <em>and</em> bringing it to market.</p>

<p>It’s also a non-zero amount of work to put together this proposal. Given the proposal <em>must be</em> ambitious, it can feel as though you’re not being realistic, and that the proposal itself is mostly a pie in the sky idea.</p>

<p>I certainly struggled with that feeling through the process, and realistically, there was a non-zero cost to attempting the proposal.</p>

<p>But the process of taking an initially scary idea to a full proposal was an amazing learning experience. It helped me form a holistic approach to thinking about a disruptive robot technology, and what market conditions must be in place for it to succeed. It also forced me to put myself out there, despite the lack of buy in from the industry.</p>

<p>I hope my story inspires you to pursue ideas that scare you, no matter the outcome.</p>

<p>If you’re interested in doing a proposal too and have questions, don’t hesitate to reach out. Please reach out via <a href="https://twitter.com/burningion">Twitter</a>.</p>

<p>In the meantime, I’ll share my updates as my proposal gets reviewed.</p>

<center>
<div id="mc_embed_shell">
      <link href="//cdn-images.mailchimp.com/embedcode/classic-061523.css" rel="stylesheet" type="text/css" />
  <style type="text/css">
        #mc_embed_signup{background:#fff; false;clear:left; font:14px Helvetica,Arial,sans-serif; width: 600px;}
        /* Add your own Mailchimp form style overrides in your site stylesheet or in this style block.
           We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */
</style>
<div id="mc_embed_signup">
    <form action="https://makeartwithpython.us6.list-manage.com/subscribe/post?u=3feb377469b9e8fab8d52bd3f&amp;id=a998076775&amp;f_id=00ec22e3f0" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank">
        <div id="mc_embed_signup_scroll">
            <div class="indicates-required"><span class="asterisk">*</span> indicates required</div>
            <div class="mc-field-group"><label for="mce-EMAIL">Email Address <span class="asterisk">*</span></label><input type="email" name="EMAIL" class="required email" id="mce-EMAIL" required="" value="" /></div>
<div hidden=""><input type="hidden" name="tags" value="3379007" /></div>
        <div id="mce-responses" class="clear">
            <div class="response" id="mce-error-response" style="display: none;"></div>
            <div class="response" id="mce-success-response" style="display: none;"></div>
        </div><div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_3feb377469b9e8fab8d52bd3f_a998076775" tabindex="-1" value="" /></div><div class="clear"><input type="submit" name="subscribe" id="mc-embedded-subscribe" class="button" value="Notify Me" /></div>
    </div>
</form>
</div>
<script type="text/javascript" src="//s3.amazonaws.com/downloads.mailchimp.com/js/mc-validate.js"></script><script type="text/javascript">(function($) {window.fnames = new Array(); window.ftypes = new Array();fnames[0]='EMAIL';ftypes[0]='email';fnames[1]='FNAME';ftypes[1]='text';fnames[2]='LNAME';ftypes[2]='text';fnames[3]='ADDRESS';ftypes[3]='address';fnames[4]='PHONE';ftypes[4]='phone';}(jQuery));var $mcj = jQuery.noConflict(true);</script></div>
</center>]]></content><author><name>Kirk Kaiser</name></author><summary type="html"><![CDATA[How I went from an idea to submitted Phase I proposal]]></summary></entry><entry><title type="html">Building a Robot to Protect Cyclists from Bad Drivers</title><link href="/blog/building-a-data-pipeline-for-robotics/" rel="alternate" type="text/html" title="Building a Robot to Protect Cyclists from Bad Drivers" /><published>2024-01-26T00:00:00+00:00</published><updated>2024-01-26T00:00:00+00:00</updated><id>/blog/building-a-data-pipeline-for-robotics</id><content type="html" xml:base="/blog/building-a-data-pipeline-for-robotics/"><![CDATA[<h1 id="what-if-wed-invested-in-safer-cars-instead-of-autonomy">What if we’d invested in safer cars <em>instead of</em> autonomy?</h1>

<p>Last year a question fell into my brain that I haven’t been able to get rid of since:</p>

<blockquote>
  <p>What if we’d invested in safety instead of autonomy?</p>
</blockquote>

<p>According to McKinsey, more than <a href="https://www.mckinsey.com/industries/automotive-and-assembly/our-insights/mobilitys-future-an-investment-reality-check">$100 billion</a> has been spent so far on attempting to build fully autonomous vehicles.</p>

<p>What if we’d invested that money on vehicle safety instead? What if rather than autonomy, we attempted to get the risk of death for drivers, cyclists, and pedestrians to zero? What if we attempted to do it in a way that still allowed for the <em>feeling of</em> 99% autonomy for drivers?</p>

<p>I became obsessed with this idea last year. Could a technical solution exist to eliminate these deaths? Would it be closer, or further away than autonomous driving?</p>

<p>I decided to go looking.</p>

<h1 id="how-are-people-dying-from-vehicles">How are people dying from vehicles?</h1>

<p>Safety advances have mostly benefitted people riding in modern, luxury vehicles.</p>

<p>If you are lucky enough to drive a BMW X3 4WD, Nissan Pathfinder 2WD, or a Lexus ES 350, the NHTSA has a driver death rate of 0 for you.</p>

<p><a href="https://www.iihs.org/ratings/driver-death-rates-by-make-and-model"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/14c92322-f735-40ca-e983-237ec7202300/webscale" alt="IHTS Safety" /></a></p>

<p>Conversely, if you’re driving a Ram 3500 Crew Cab long bed 4WD, you are more than 5x more likely to kill someone else with your car, rather than yourself!</p>

<p><a href="https://www.iihs.org/news/detail/latest-driver-death-rates-highlight-dangers-of-muscle-cars"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/d5552755-444c-422d-53b6-1ead28b19700/webscale" alt="IHTS" /></a></p>

<p>It seems the vehicles most likely to kill you while behind the wheel are big pickup trucks, muscle cars with RWD, or small cars.</p>

<blockquote>
  <p>Sidnote, as a sports car owner: Two wheel drive cars with high horsepower can lose control if you step on the pedal hard. The high horsepower muscle cars with AWD aren’t nearly as deadly for drivers.</p>
</blockquote>

<p>But most alarmingly in all this data, between 2010 and 2021, bicycle fatalities in traffic increased by 58%.</p>

<p>Pedestrians made up 17% (!) of all traffic fatalities in 2021, with 77% of those deaths occurring in the dark.</p>

<p>Clearly we need to do a better job protecting cyclists and pedestrians from cars, and reverse the trend of deaths.</p>

<h1 id="what-are-we-doing-to-prevent-deaths">What are we doing to prevent deaths?</h1>

<p><a href="https://www.iihs.org/media/0bf99f5d-b132-42f3-92c3-4272211ebf8a/JXEC8w/Ratings/Protocols/future%20programs/PAEB_test_protocol_Version4_Draft.pdf"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/83d8011f-9937-461d-8200-1d585df14300/webscale" alt="IIHS" /></a></p>

<p>Thankfully, new cars and trucks are getting a layer of protection for pedestrians and cyclists.</p>

<p>It’s not yet standard across all vehicles, but <a href="https://www.iihs.org/media/0bf99f5d-b132-42f3-92c3-4272211ebf8a/JXEC8w/Ratings/Protocols/future%20programs/PAEB_test_protocol_Version4_Draft.pdf">Pedestrian Autonomous Emergency Braking</a> allows for automatic braking when a collision is detected as iminent with a pedestrian. There’s even a whole set of tests for cars, included in the PDF linked above.</p>

<p>Unfortunately, the average age of vehicles on the road is <a href="https://www.spglobal.com/mobility/en/research-analysis/average-age-of-vehicles-in-the-us-increases-to-122-years.html">12.2 years</a>, so even if we did add the technology to all vehicles tomorrow, we’re still facing at another 12 years to catch up.</p>

<p>Another possibility is building roads to give a larger physical buffer of protection to cyclists.</p>

<p>But building safer roads is prohibitively expensive in the United States. There are some programs looking to bring the cost down, but they cost an average of <a href="https://nextcity.org/urbanist-news/how-five-u.s.-cities-built-335-miles-of-bike-lanes-in-24-months">$133k per mile, and take two years to build if rushed</a>.</p>

<h1 id="what-can-we-scale-quickly-to-prevent-these-deaths">What can we scale quickly to prevent these deaths?</h1>

<p>Given the costs and time frames of the existing approaches to safety, we aren’t realistically going to reverse the trends of death.</p>

<p>What if we had a low-cost, widely distributable way to protect cyclists and pedestrians?</p>

<h1 id="how-high-risk-high-impact-technology-gets-funding-in-the-united-states">How High Risk, High Impact Technology Gets Funding in the United States</h1>

<p><a href="https://seedfund.nsf.gov/"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/4d7a3faf-dda8-49f8-32e2-2878b4ff9d00/webscale" alt="Americas Seed Fund" /></a></p>

<p>Last year I found out about <a href="https://seedfund.nsf.gov/">America’s Seed Fund</a>.</p>

<p>The SBIR  program seeks to fund research and development of high risk, deep tech companies in the United States. Given I’d been interested in cyclist and pedestrian protection, I applied on a whim.</p>

<p>I created a proposal for a self-driving robot platform to protect cyclists and pedestrians.</p>

<p>I imagined a self-driving robot which followed a target person, creating a physical buffer between them and vehicles on the road. The idea scared me, because it’s a very ambitious goal, and not something straightforward to implement.</p>

<p>Ignoring that fear, I submitted my <a href="https://seedfund.nsf.gov/project-pitch/">pitch</a>.</p>

<h1 id="getting-into-just-enough-trouble">Getting into Just Enough Trouble</h1>

<p>Of course, two weeks later I was formally invited to submit a Phase I proposal.</p>

<p>I made up my mind to pursue this idea futher. I needed to think through what an actual approach to eliminating cyclist and pedestrian deaths would look like, and how I’d get it done.</p>

<p>With that, I jumped in to the problem. At first, I didn’t really believe it was possible. But over time, I started to see a path to making the first steps of the project work.</p>

<h1 id="jumping-in-to-the-deep-end">Jumping In to the Deep End</h1>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/fb3853a3-f361-4d15-0b01-da8a53768b00/webscale" alt="rerun" /></p>

<p>It turns out there’s a <a href="https://smoosavi.org/datasets/us_accidents">lot of data</a> about how and where vehicles are crashing in the United States. Every crash is logged into a database with the <a href="https://crashviewer.nhtsa.dot.gov/">NHTSA</a>, among other places. A <a href="https://arxiv.org/abs/1906.05409">great paper</a> showed how we could supplement the existing database with auxilary data to build an even richer dataset for training.</p>

<p>Imagine if you could get an alert that you were about to ride into a high risk zone on your bicycle. How might your change your cycling approach?</p>

<p>We already have weather applications to predict the weather, why not have cycling predictions too?</p>

<p><a href="https://github.com/burningion/bicyclist-defense-jetson"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/d8b1112b-5132-4aed-bfe1-216878cdd600/webscale" alt="Bicycle Mounted Sensor Fusion" /></a></p>

<p>I started building a platform to ingest mmWave radar, depth, and camera data from a cyclist. I used the latest <a href="https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/">NVIDIA Jetson Orin Nano</a> as my platform, with a DeWalt 20v battery as my power supply.</p>

<p>I built a data ingestion pipeline using <a href="https://www.rerun.io/">rerun</a>, and got a <a href="">computer vision model</a> up and running with live feedback over a bicycle network. You can read more about some of the process at my <a href="https://github.com/burningion/bicyclist-defense-jetson">Github repo</a>.</p>

<p>But people in my city kept getting hit by vehicles in broad daylight.</p>

<h1 id="the-stakes-get-higher">The Stakes Get Higher</h1>

<p class="align-right"><iframe src="https://giphy.com/embed/fLnmUWmUTSc6GT5Xlu" width="400" height="480" frameborder="0" class="giphy-embed" allowfullscreen=""></iframe></p>

<p>Since I started work on this project, there’s been a steady stream of pedestrians hit by vehicles in my town. In one day, a child was hit by a school bus and two people were critically injured.</p>

<p>Frustrated, I reached out to the person responsible for cyclist safety in Florida for help with my proposal. I figured nobody would be more interested in helping me out than the person elected to eliminate cyclist deaths.</p>

<p>In my research, I discovered Florida has the highest rate of cyclist fatalities in the entire United States, despite a goal of zero incidents. Given the rapid population growth of the state, it appears to only be set to get worse.</p>

<p>After four emails without any response and three phone calls to the office, I gave up. I wasn’t going to get a response from my elected official who was responsible for safety, and so I needed to find help elsewhere.</p>

<p>I ended up finding my first help where I least expected it.</p>

<blockquote>
  <p>Continued in <a href="https://makeartwithpython.com/blog/submitting-sbir-application/">Part 2</a> where I go in to detail about how to submit a Phase I SBIR proposal.</p>
</blockquote>

<center>
<div id="mc_embed_shell">
      <link href="//cdn-images.mailchimp.com/embedcode/classic-061523.css" rel="stylesheet" type="text/css" />
  <style type="text/css">
        #mc_embed_signup{background:#fff; false;clear:left; font:14px Helvetica,Arial,sans-serif; width: 600px;}
        /* Add your own Mailchimp form style overrides in your site stylesheet or in this style block.
           We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */
</style>
<div id="mc_embed_signup">
    <form action="https://makeartwithpython.us6.list-manage.com/subscribe/post?u=3feb377469b9e8fab8d52bd3f&amp;id=a998076775&amp;f_id=00ec22e3f0" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank">
        <div id="mc_embed_signup_scroll">
            <div class="indicates-required"><span class="asterisk">*</span> indicates required</div>
            <div class="mc-field-group"><label for="mce-EMAIL">Email Address <span class="asterisk">*</span></label><input type="email" name="EMAIL" class="required email" id="mce-EMAIL" required="" value="" /></div>
<div hidden=""><input type="hidden" name="tags" value="3379007" /></div>
        <div id="mce-responses" class="clear">
            <div class="response" id="mce-error-response" style="display: none;"></div>
            <div class="response" id="mce-success-response" style="display: none;"></div>
        </div><div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_3feb377469b9e8fab8d52bd3f_a998076775" tabindex="-1" value="" /></div><div class="clear"><input type="submit" name="subscribe" id="mc-embedded-subscribe" class="button" value="Notify Me" /></div>
    </div>
</form>
</div>
<script type="text/javascript" src="//s3.amazonaws.com/downloads.mailchimp.com/js/mc-validate.js"></script><script type="text/javascript">(function($) {window.fnames = new Array(); window.ftypes = new Array();fnames[0]='EMAIL';ftypes[0]='email';fnames[1]='FNAME';ftypes[1]='text';fnames[2]='LNAME';ftypes[2]='text';fnames[3]='ADDRESS';ftypes[3]='address';fnames[4]='PHONE';ftypes[4]='phone';}(jQuery));var $mcj = jQuery.noConflict(true);</script></div>
</center>

<p>I’d also love to hear from you if you have any ideas or want to help. Please reach out via <a href="https://twitter.com/burningion">Twitter</a>.</p>]]></content><author><name>Kirk Kaiser</name></author><summary type="html"><![CDATA[Part 1: A Data Pipeline to Understand Cyclist Dangers]]></summary></entry><entry><title type="html">10 Things I Didn’t Expect Before Building Generative AI for Six Months</title><link href="/blog/thoughts-on-generative-ai/" rel="alternate" type="text/html" title="10 Things I Didn’t Expect Before Building Generative AI for Six Months" /><published>2023-12-04T00:00:00+00:00</published><updated>2023-12-04T00:00:00+00:00</updated><id>/blog/thoughts-on-generative-ai</id><content type="html" xml:base="/blog/thoughts-on-generative-ai/"><![CDATA[<h1 id="10-things-i-didnt-expect-before-building-generative-ai-for-six-months">10 Things I Didn’t Expect Before Building Generative AI for Six Months</h1>

<p>Six months ago, I started working on a <a href="https://www.makeartwithpython.com/blog/building-an-ai-video-editor/">Generative AI video editor</a>. I began with the assumption that the newest machine learning models would unlock a new sort of software that wasn’t previously possible.</p>

<p>Of course, I didn’t know <em>what</em> that new software would look like, so I decided to start building with the smallest idea I had. Since then, the number of models applicable to video, audio, and text has exploded. It seems every week there is a new, more efficient model I should rewrite my application for.</p>

<p>Despite 15 years of experience developing backend applications, the past 6 months of working in the AI space has shown me that it’s fundamentally different from the rise of cloud and mobile. There are still very many open questions about what the next generation of applications built using these models will look like, and how they’ll be deployed.</p>

<p>But, I want to share some of the things that have surprised me so far about building AI applications.</p>

<h1 id="morality-is-a-top-business-and-product-concern-when-building-ai-products">Morality is a Top Business <em>and</em> Product Concern When Building AI Products</h1>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/9a02485d-802a-4c38-29d6-07dcba34ca00/webscale" class="align-right" width="45%" /></p>

<p>Given I’m working on a video editor, there are <em>many</em> moral and ethical questions which need to be addressed when I’m choosing what to build.</p>

<p>Models can now <a href="https://github.com/topics/voice-cloning">clone voices</a>, <a href="https://github.com/topics/faceswap">replace faces</a>, and <a href="https://replicate.com/lucataco/magic-animate">move bodies</a> in whatever way the user wants. <strong>Giving users the tools to do this on their own is not just a question of morality and consent, but also of long term business feasibility.</strong></p>

<p>What happens if a lawsuit blames you for abuse done by users? It’s easy enough to do simplistic blocks for  nudity, but what else? And beyond the legal reprocussions, can you live with the potential  mob of people who distrust the technology, and believe it’s enabling a loss of personal consent?</p>

<p>There’s no better example for the pontential long term consequences of Generative AI than what’s been happening with the cryptocurrency space. Companies previously operated on fuzzy moral and legal grounds for a long time. The legal system eventually caught up with the ecosystem, and the things which were assumed legal because of a lack of action, weren’t. I expect we’ll see the same with AI products eventually.</p>

<h1 id="the-models-dont-actually-matter-because-better-ones-will-be-here-next-week">The Models Don’t Actually Matter, Because Better Ones Will be Here Next Week</h1>

<p>This is a wild insight, because six months ago I would have laughed at the possibility. But the models don’t really matter much in practice.</p>

<p>Building <em>any</em> model is a race to the bottom and a Red Queen’s Race. At this point a lot of incredibly intelligent people are building models with billions of dollars worth of compute and resources. These mega models will be made largely without input from the many <a href="https://www.semianalysis.com/p/google-gemini-eats-the-world-gemini">GPU poors</a>, and will largely be heavily censored, opaque black boxes for end users.</p>

<p>What remains for the rest of us is to use an ever improving ensemble of smaller models, and to build tools and interfaces atop of them.</p>

<p>And building these interfaces is really where the value lies. Open models will continue to improve, and the state of the art will continue to be pushed. The models themselves must be interchangable, as a better model will come about sooner than you think.</p>

<p>But these models as they exist are currently unpredictable and difficult to understand. Bringing understanding, or at least predictability to models shows potential as a moat. (But of course, if AGI comes, there will be no moats, anywhere, unless you have a source of energy, compute, and water larger than your competitors. Oh and your model is better at stealing their model’s weights and…)</p>

<h1 id="the-immediate-future-is-going-to-be-weird">The Immediate Future is Going to Be Weird</h1>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/4c9fa3ae-ca97-40d7-9006-d0fe77302000/webscale" class="align-left" style="padding-top: 15px;" width="45%" /></p>

<p>At this point, the internet as a source of training data is being filled with generated text and images, and the noise from this model generated output will only continue to grow.</p>

<p>We can assume eventually the output will become so good humans won’t be able to tell the difference from human generated content and not.</p>

<p>What does this mean for us as humans, if the place where we currently have most of our social interactions occur is no longer primarily human content?</p>

<p>Depending on your viewpoint, we may already have the answer. The algorithms for social media platforms are already incredibly sticky and good at capturing our attention.</p>

<p>What if a combination of the algorithms and Generative AI builds the perfect <a href="https://en.wikipedia.org/wiki/Operant_conditioning_chamber">skinner box</a>? Then the algorithms will compete to see who can give the best, most tailored emotional experience currently desired by the user. We then have a carefully orchestrated algorithm to subtly shape the behavior of humanity.</p>

<h1 id="open-source-is-in-a-fundamentally-weaker-position">Open Source is in a Fundamentally Weaker Position</h1>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/4363a6da-949c-4d9a-e933-433ed20a1500/webscale" /></p>

<p>When I started out building software, Open Source gave me the tools that I couldn’t otherwise afford as a young person. A compiler was hundreds of dollars, but a Linux CD gave me access to all the tools I’d need to start building, right away.</p>

<p>Fast forward the decade or two I’ve been in software development, and Open Source powered the cloud. Trillions of dollars in economic value were generated off the back of Open Source contributions.</p>

<p>But for machine learning, there are two fundamental constraints which require access to capital to scale. Building a large model can cost millions of dollars at the low end, and grow from there for a state of the art model. Building a home scale computer for training smaller models can cost thousands, especially if you need multiple high end GPUs.</p>

<p>Additionally, the datasets to train on are large! In the past six months I’ve been working, I’ve hit bandwidth  caps for my ISP multiple times, <em>just using models</em>. If I were to take in training data from the internet, things would look even worse.</p>

<p>Given the high costs of participation, the barrier to entry for Open Source is higher than it’s ever been. There are only so many deep learning labs capable of working with these large models. This means Open Source models will have less eyes and be smaller, until the hardware becomes cheaper, the data is more evenly distributed, or development is subsidized, either by venture capital or governments.</p>

<h1 id="incumbents-have-strong-advantages">Incumbents Have Strong Advantages</h1>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/25fb932f-1f35-4291-d011-40b7572e9f00/webscale" /></p>

<p>One of the conventional narratives about startups is that they are small and nimble, and can build things faster than big companies.</p>

<p>But with Generative AI, this just isn’t true.</p>

<p>Building a better model currently requires access to capital and data. Large companies have both.</p>

<p>Building a good end user experience means having tooling around your models for easy exploration and better human understanding of the model’s behavior. Again, incumbent companies already have interfaces built up over years, which can be used to augment data before and after inference.</p>

<p>But here there is a real weakness in how the best of developers have been treated over the past two years.</p>

<p>With layoffs making the rounds, companies have parted ways with some of the best, most talented developers in their rosters. Without them to navigate the boundary between the existing product and new possibilites created by these models, large companies will lose, despite these inherent advantages.</p>

<h1 id="we-dont-know-where-the-moat-will-come-from">We Don’t Know Where the Moat will Come From</h1>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/c91cbccf-2e32-4b2f-a6a0-d6e870dd2800/webscale" class="align-right" width="45%" style="padding-top: 15px;" /></p>

<p>If you look at AI right now, there <em>appears to be</em> relatively few moats.</p>

<p>NVIDIA, of course, seems to have the biggest one. They’ve built the GPUs, but more importantly, also the libraries and software to support researchers and builders. And they’ve been building the infrastructure for them for over a decade.</p>

<p>No other company was making as deep of an investment in the tooling to build accelerated computing, with as consistent of a vision.</p>

<p>Since then, of course, there is the story of OpenAI, who built a model that brought them a billion dollars in revenue in a year.</p>

<p>But how defensible is OpenAI’s moat? Open Source models are catching up, and at OpenAI’s last demo day they showcased products like <a href="https://twitter-thread.com/t/1725712220955586899">LaundryBuddy</a>, a far cry from the next step after GPT-4 to AGI.</p>

<p>The truth is, we don’t know where the moat will come from with Generative AI. In the meantime, the pickaxe and shovel companies will do well. Platforms like <a href="https://modal.com/">Modal</a> and <a href="https://replicate.com/">Replicate</a> will make ML tooling approachable for developers, and we’ll soon see what the Uber of machine learning looks like.</p>

<h1 id="robotics-are-probably-the-next-moat">Robotics Are Probably the Next Moat</h1>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/8e215c04-4c97-44d6-993a-aa3a21e34800/webscale" /></p>

<p>Building and testing robots is expensive, as the real world is much more difficult than software to model. A <a href="https://hello-robot.com/product">basic robot for automation</a> can start at $20k+, and the development iteration loop can be extremely slow, when you factor in having to test each software change in the real world, and hardware which can break unexpectedly.</p>

<p>To address this, NVIDIA has been building and pushing its next generation platform, Omniverse.</p>

<p>Omniverse is a platform to model and simulate environments. As an example, you could use <a href="https://www.nvidia.com/en-us/omniverse/solutions/digital-twins/">digital twins</a>, to recreate and test your drone’s performance in a high resolution scan of Seattle.</p>

<p>Using ray-tracing and digital envionments, you can model, test, and more importantly generate realistic training data for your robots virtually, allowing you to run tens of thousands of simulated tests in a digital environment.</p>

<p>Between this and the growth of model capabilities, a sharp team who can navigate the boundaries of physical, cloud, and models should be able to build an iPhone like technical coordination moat. It remains to be seen whether this is a bipedal robot, or something else.</p>

<h1 id="nobody-can-keep-up-with-progress">Nobody Can Keep Up with Progress</h1>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/d982f69a-affd-4a96-3822-048187df2c00/webscale" width="45%" class="align-right" /></p>

<p>Even the most intelligent and deliberate of my peers can’t seem to keep up with the speed of advancements in the space. It seems every week we get a new breakthrough, one which <em>may</em> have applications, or contribute to a breakthrough in the current problem we’re solving.</p>

<p>Because of this, it’s easy to develop an underlying unease about our chosen problem spaces. Is it a dead end? Is there somewhere else that might have better results? Is there a completely different architecture I should be chasing?</p>

<p>Being in technology, there has always been an unease about the pace of learning the latest technology. But in the AI space, this feels faster than anything I’ve ever experienced. How you manage to stay focused, while not getting locked into dead ends is a core part of navigating the space effectively.</p>

<h1 id="there-are-more-vibes-than-hard-data-at-the-edges">There Are More Vibes than Hard Data at the Edges</h1>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/d9f416b0-d9db-4456-b8e5-58cda7e9fa00/webscale" /></p>

<p>How do you measure the performance of a Large Language Model?</p>

<p>More importantly, how do you measure it against another language model?</p>

<p>Right now there are tradeoffs across the available models, and there are tools to try them all out using the same prompt, to see the difference in results.</p>

<p>But largely, these opinions on the “correctness” of an output for the highest performing models is mostly a gut opinion. And over time, people have <a href="https://twitter.com/emollick/status/1729358803425001702">opinions</a> that they’ve changed for the worse, while black box model providers insist nothing has changed. (There are, of course, formal tests of a model’s capabilities, but most experts agree these are flawed.)</p>

<p>Because of the relative gap between a “correct” answer, and the one a person <em>personally</em> deems correct, there won’t really be an absolute measure of what the “correct” answer is. For instance, if a street level drug dealer asked your language model questions about strategies for growing their market presence, what response should be deemed “correct”?</p>

<h1 id="secrecy-and-security-matter-in-ways-they-dont-normally">Secrecy and Security Matter in Ways They Don’t Normally</h1>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/b2a92f0a-b463-44ba-368a-5528dc6c7f00/webscale" width="45%" class="align-left" /></p>

<p>What do you think the market value for the raw weights of GPT-4 is?</p>

<p>If someone leaked them as a Torrent (like <a href="https://github.com/shawwn/llama-dl/">LLama</a>), how soon would it be before it was optimized to run on consumer hardware?</p>

<p>The current moat of language model companies revolves around the premise of their weights never being leaked. This means they have to trust their cloud providers, their employees, and the security of their systems to protect each layer of their infrastructure, and the entirety of their business.</p>

<p>I’m certain intelligence agencies from all over the world are interested in the applications these of advanced language models and their employees. I also don’t expect these companies to be defending their technology on their own.</p>

<p>Between this and hundred million dollar plus training runs, the upper echelons of machine learning are a bit scary! Throw into the mix people who are convinced the training runs are potentially catastrophic for humanity’s future, and there’s surely to be a bit of intrigue in these companies.</p>

<h1 id="finding-your-position-in-the-coming-ai-landscape">Finding <em>Your</em> Position in the Coming AI Landscape</h1>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/e76c9944-05d9-4fa6-93f9-dd710155da00/webscale" /></p>

<p>I was recently interviewed on a podcast, and asked about the future of machine learning.</p>

<p>At the time, I felt uncertain about offering any sort of advice. After six months of working in the space, I didn’t really have any purely optimistic, encouraging advice for people entering the space. There are genuine traps here for builders, and incumbents really <em>do</em> have non-trivial edges in the space, in ways they didn’t for cloud or mobile.</p>

<p>Despite this, I still want to build, and encourage others to do the same. When software began taking over the world, it had the potential to alienate people who didn’t understand how it was built, and thus couldn’t model how it misbehaved.</p>

<p>But AI threatens to do the same to everyone else. Except for a few thousand engineers and researchers, the rest of humanity will be captive to the decisions made about what to prioritize, censor, and mark as the correct answer for these giant models. That’s too important of a collective decision to be left to so few.</p>

<p>Although Open Source and the GPU Poors may not have the same advantages, I believe we must try.</p>

<p><em>all the artwork in this post by <a href="https://en.wikipedia.org/wiki/Ren%C3%A9_Magritte">magritte</a></em></p>]]></content><author><name>Kirk Kaiser</name></author><summary type="html"><![CDATA[A lot of short term wins, but no enduring victories]]></summary></entry><entry><title type="html">Building an AI Video Editor Prototype in 100 Days</title><link href="/blog/building-an-ai-video-editor/" rel="alternate" type="text/html" title="Building an AI Video Editor Prototype in 100 Days" /><published>2023-09-19T00:00:00+00:00</published><updated>2023-09-19T00:00:00+00:00</updated><id>/blog/building-an-ai-video-editor</id><content type="html" xml:base="/blog/building-an-ai-video-editor/"><![CDATA[<h1 id="building-an-ai-video-editor-prototype-in-100-daysish">Building an AI Video Editor Prototype in 100 Days(ish)</h1>

<video width="350" autoplay="" muted="" loop="" playsinline="" class="align-left" style="padding-bottom: 15px;">
    <source src="https://pub-f17786433d2849ff86c458a4019a0ed6.r2.dev/better-first-smaller.mp4" type="video/mp4" />
</video>

<p><strong>The current iteration of of Generative AI doesn’t feel built for the benefit of artists.</strong></p>

<p>Instead, the focus seems to be on maximizing shareholder value. Training the models used in Generative AI costs large amounts of capital, data, electricity, and water.</p>

<p>The models powering this generation are trained on a giant corpus of art, but with very <a href="https://newsroom.gettyimages.com/en/getty-images/getty-images-launches-commercially-safe-generative-ai-offering?mkt_tok=MTU2LU9GTi03NDIAAAGPIEzdeqmKiSLH_bZP0gHsH2_5tETo8EIo3QZlKWKfFlpF5vGeyWo20RZzckl0RGBkemXJAw1eX9-epjFBrn62p4RxsQLnLhU92zADFFVil10xLrzLAA">few</a> exceptions, aren’t compensating the source artists at all.</p>

<p>But it doesn’t have to be this way. Generative AI can be used instead to empower artists and make art more valuable.</p>

<h1 id="artists-and-their-relationship-with-disruptive-technology">Artists and their Relationship with Disruptive Technology</h1>

<video width="65%" autoplay="" muted="" loop="" playsinline="" class="align-right" style="padding-bottom: 15px;">
    <source src="https://pub-f17786433d2849ff86c458a4019a0ed6.r2.dev/alltogether.mp4" type="video/mp4" />
</video>

<p>When streaming rose to prominence, the creatives who had created shows and <a href="https://www.latimes.com/entertainment-arts/music/story/2021-04-19/spotify-artists-royalty-rate-apple-music">music</a> were mostly left out of streaming deals. <a href="https://www.indiewire.com/features/general/dave-chappelle-pulls-chappelles-show-netflix-paid-1234600532/">Dave Chappelle</a> famously had his show streamed on Netflix without ever seeing compensation, and fought hard to convince executives to take down the show, on the principle of fairness. <a href="https://www.pon.harvard.edu/daily/dispute-resolution/dispute-resolution-with-spotify-taylor-swift-shakes-it-off/">Taylor Swift</a> also had to negotiate hard to establish fair royalties, pulling her catalog off Spotify completely until Spotify agreed to restructure rates paid to her.</p>

<p><strong>Given this context, generative tools for creatives have to thread a very thin needle.</strong> If they are to be embraced by artists they need to show that they can be used to enhance the value of art, and grow the market for independent artists.</p>

<p>Capital tends to push artists where ever possible. Any new technology will inevitably be used to try to put even <em>more</em> financial pressure on artists (who mostly lack capital to defend themselves legally), and maximize larger returns to existing pools of capital.</p>

<p>Storytelling, music, and the visual arts enrich our collective human experience. Generative AI designed poorly has the potential to muddy our main collective commons (the internet) with bland secondary generated content, created en masse for pennies.</p>

<h1 id="giving-artists-a-fighting-chance-with-generative-ai">Giving Artists a Fighting Chance with Generative AI</h1>

<p>Given such a daunting set of challenges coming for artists, where do you begin? <strong>If Generative AI is going to disrupt creative processes, how do we ensure artists get a seat at the table?</strong></p>

<p>I don’t have any answers.</p>

<p>But, I’m willing to explore.</p>

<p>I’ve found that when I want to learn about something, it helps to just start builiding a thing. And when it comes to building something new, it’s best to start building with the tiniest possible idea.</p>

<p>Rather than building a giant machine to empower artists, what if we used the existing models to enable new methods of creativity?</p>

<p>My experience is mostly with computer vision, so I started there. I’ve always been a fan of artists like <a href="https://www.youtube.com/user/cyriak">cyriak</a>, and would love to build a tool to make the world a bit more cyriakish.</p>

<p>As a start, I saw a style I was impressed by:</p>

<center><blockquote class="instagram-media" data-instgrm-captioned="" data-instgrm-permalink="https://www.instagram.com/p/B_LPm9MHxxn/?utm_source=ig_embed&amp;utm_campaign=loading" data-instgrm-version="14" style=" background:#FFF; border:0; border-radius:3px; box-shadow:0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width:540px; min-width:326px; padding:0; width:99.375%; width:-webkit-calc(100% - 2px); width:calc(100% - 2px);"><div style="padding:16px;"> <a href="https://www.instagram.com/p/B_LPm9MHxxn/?utm_source=ig_embed&amp;utm_campaign=loading" style=" background:#FFFFFF; line-height:0; padding:0 0; text-align:center; text-decoration:none; width:100%;" target="_blank"> <div style=" display: flex; flex-direction: row; align-items: center;"> <div style="background-color: #F4F4F4; border-radius: 50%; flex-grow: 0; height: 40px; margin-right: 14px; width: 40px;"></div> <div style="display: flex; flex-direction: column; flex-grow: 1; justify-content: center;"> <div style=" background-color: #F4F4F4; border-radius: 4px; flex-grow: 0; height: 14px; margin-bottom: 6px; width: 100px;"></div> <div style=" background-color: #F4F4F4; border-radius: 4px; flex-grow: 0; height: 14px; width: 60px;"></div></div></div><div style="padding: 19% 0;"></div> <div style="display:block; height:50px; margin:0 auto 12px; width:50px;"><svg width="50px" height="50px" viewBox="0 0 60 60" version="1.1" xmlns="https://www.w3.org/2000/svg" xmlns:xlink="https://www.w3.org/1999/xlink"><g stroke="none" stroke-width="1" fill="none" fill-rule="evenodd"><g transform="translate(-511.000000, -20.000000)" fill="#000000"><g><path d="M556.869,30.41 C554.814,30.41 553.148,32.076 553.148,34.131 C553.148,36.186 554.814,37.852 556.869,37.852 C558.924,37.852 560.59,36.186 560.59,34.131 C560.59,32.076 558.924,30.41 556.869,30.41 M541,60.657 C535.114,60.657 530.342,55.887 530.342,50 C530.342,44.114 535.114,39.342 541,39.342 C546.887,39.342 551.658,44.114 551.658,50 C551.658,55.887 546.887,60.657 541,60.657 M541,33.886 C532.1,33.886 524.886,41.1 524.886,50 C524.886,58.899 532.1,66.113 541,66.113 C549.9,66.113 557.115,58.899 557.115,50 C557.115,41.1 549.9,33.886 541,33.886 M565.378,62.101 C565.244,65.022 564.756,66.606 564.346,67.663 C563.803,69.06 563.154,70.057 562.106,71.106 C561.058,72.155 560.06,72.803 558.662,73.347 C557.607,73.757 556.021,74.244 553.102,74.378 C549.944,74.521 548.997,74.552 541,74.552 C533.003,74.552 532.056,74.521 528.898,74.378 C525.979,74.244 524.393,73.757 523.338,73.347 C521.94,72.803 520.942,72.155 519.894,71.106 C518.846,70.057 518.197,69.06 517.654,67.663 C517.244,66.606 516.755,65.022 516.623,62.101 C516.479,58.943 516.448,57.996 516.448,50 C516.448,42.003 516.479,41.056 516.623,37.899 C516.755,34.978 517.244,33.391 517.654,32.338 C518.197,30.938 518.846,29.942 519.894,28.894 C520.942,27.846 521.94,27.196 523.338,26.654 C524.393,26.244 525.979,25.756 528.898,25.623 C532.057,25.479 533.004,25.448 541,25.448 C548.997,25.448 549.943,25.479 553.102,25.623 C556.021,25.756 557.607,26.244 558.662,26.654 C560.06,27.196 561.058,27.846 562.106,28.894 C563.154,29.942 563.803,30.938 564.346,32.338 C564.756,33.391 565.244,34.978 565.378,37.899 C565.522,41.056 565.552,42.003 565.552,50 C565.552,57.996 565.522,58.943 565.378,62.101 M570.82,37.631 C570.674,34.438 570.167,32.258 569.425,30.349 C568.659,28.377 567.633,26.702 565.965,25.035 C564.297,23.368 562.623,22.342 560.652,21.575 C558.743,20.834 556.562,20.326 553.369,20.18 C550.169,20.033 549.148,20 541,20 C532.853,20 531.831,20.033 528.631,20.18 C525.438,20.326 523.257,20.834 521.349,21.575 C519.376,22.342 517.703,23.368 516.035,25.035 C514.368,26.702 513.342,28.377 512.574,30.349 C511.834,32.258 511.326,34.438 511.181,37.631 C511.035,40.831 511,41.851 511,50 C511,58.147 511.035,59.17 511.181,62.369 C511.326,65.562 511.834,67.743 512.574,69.651 C513.342,71.625 514.368,73.296 516.035,74.965 C517.703,76.634 519.376,77.658 521.349,78.425 C523.257,79.167 525.438,79.673 528.631,79.82 C531.831,79.965 532.853,80.001 541,80.001 C549.148,80.001 550.169,79.965 553.369,79.82 C556.562,79.673 558.743,79.167 560.652,78.425 C562.623,77.658 564.297,76.634 565.965,74.965 C567.633,73.296 568.659,71.625 569.425,69.651 C570.167,67.743 570.674,65.562 570.82,62.369 C570.966,59.17 571,58.147 571,50 C571,41.851 570.966,40.831 570.82,37.631"></path></g></g></g></svg></div><div style="padding-top: 8px;"> <div style=" color:#3897f0; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:550; line-height:18px;">View this post on Instagram</div></div><div style="padding: 12.5% 0;"></div> <div style="display: flex; flex-direction: row; margin-bottom: 14px; align-items: center;"><div> <div style="background-color: #F4F4F4; border-radius: 50%; height: 12.5px; width: 12.5px; transform: translateX(0px) translateY(7px);"></div> <div style="background-color: #F4F4F4; height: 12.5px; transform: rotate(-45deg) translateX(3px) translateY(1px); width: 12.5px; flex-grow: 0; margin-right: 14px; margin-left: 2px;"></div> <div style="background-color: #F4F4F4; border-radius: 50%; height: 12.5px; width: 12.5px; transform: translateX(9px) translateY(-18px);"></div></div><div style="margin-left: 8px;"> <div style=" background-color: #F4F4F4; border-radius: 50%; flex-grow: 0; height: 20px; width: 20px;"></div> <div style=" width: 0; height: 0; border-top: 2px solid transparent; border-left: 6px solid #f4f4f4; border-bottom: 2px solid transparent; transform: translateX(16px) translateY(-4px) rotate(30deg)"></div></div><div style="margin-left: auto;"> <div style=" width: 0px; border-top: 8px solid #F4F4F4; border-right: 8px solid transparent; transform: translateY(16px);"></div> <div style=" background-color: #F4F4F4; flex-grow: 0; height: 12px; width: 16px; transform: translateY(-4px);"></div> <div style=" width: 0; height: 0; border-top: 8px solid #F4F4F4; border-left: 8px solid transparent; transform: translateY(-4px) translateX(8px);"></div></div></div> <div style="display: flex; flex-direction: column; flex-grow: 1; justify-content: center; margin-bottom: 24px;"> <div style=" background-color: #F4F4F4; border-radius: 4px; flex-grow: 0; height: 14px; margin-bottom: 6px; width: 224px;"></div> <div style=" background-color: #F4F4F4; border-radius: 4px; flex-grow: 0; height: 14px; width: 144px;"></div></div></a><p style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; line-height:17px; margin-bottom:0; margin-top:8px; overflow:hidden; padding:8px 0 7px; text-align:center; text-overflow:ellipsis; white-space:nowrap;"><a href="https://www.instagram.com/p/B_LPm9MHxxn/?utm_source=ig_embed&amp;utm_campaign=loading" style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px; text-decoration:none;" target="_blank">A post shared by Clay Boonthanakit (@claydohboon)</a></p></div></blockquote></center>
<script async="" src="//www.instagram.com/embed.js"></script>

<p>Facebook’s <a href="https://segment-anything.com/">Segment Anything</a> model would be a great tool for making this sort of effect easier.</p>

<p>The original artist made in After Effects, with a lot of patience and manual masking of himself. Of course, Clay is an incredibly talented dancer, and made his own story line to fit the effect too. But(!) the manipulation of the outlines of people and masking is a chore.</p>

<p>A tool to help you creatively explore the segments of your video with better masking would be enough of a start.</p>

<p>So with that, I was off.. (ish)</p>

<h2 id="building-an-open-source-ai-video-editor-in-python">Building an Open Source AI Video Editor in… Python?</h2>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/5fd535b2-841a-41a6-6ae0-4161706b0a00/webscale" alt="AI video editor" /></p>

<p>Given the progress of machine learning models, <strong>it seemed the shortest path to building a piece of software capable of helping an artist make a video <em>like</em> Clay’s was to try to make as much of it in Python</strong>. Python is a lot of things, but I’ve mostly used it for backend infrastructure development, <em>not</em> desktop applications.</p>

<p>So I started by looking for a toolkit to build a video editor in Python, to see whether or not it would even be possible.</p>

<p>It turns out, <a href="https://moderngl.readthedocs.io/en/5.8.2/">ModernGL</a> along with <a href="https://moderngl-window.readthedocs.io/en/latest/">ModernGL-Window</a> make for a great way to get an OpenGL interface across Mac, Windows, and Linux. NVIDIA also has a <a href="https://github.com/NVIDIA/VideoProcessingFramework">Python library</a> for hardware decoding and encoding of videos when using its video cards. Given the two, it seemed like enough to get started.</p>

<p>Of course, I’d need some way to interact with my videos I’m editing. For that, I used <a href="https://github.com/pyimgui/pyimgui">PyImgui</a> to build a user interface on top of my OpenGL window.</p>

<h2 id="building-a-prototype-interface-with-imgui">Building a Prototype Interface with Imgui</h2>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/90b8865b-c821-41d6-a745-4907d0d16500/webscale" alt="ModernGL with Imgui" /></p>

<p>With moderngl-window, you’re given a few functions to define and build your render loop:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">moderngl_window</span>
<span class="kn">from</span> <span class="n">moderngl_window.text.bitmapped</span> <span class="kn">import</span> <span class="n">TextWriter2D</span>


<span class="k">class</span> <span class="nc">App</span><span class="p">(</span><span class="n">moderngl_window</span><span class="p">.</span><span class="n">WindowConfig</span><span class="p">):</span>
    <span class="n">title</span> <span class="o">=</span> <span class="s">"Text"</span>
    <span class="n">aspect_ratio</span> <span class="o">=</span> <span class="bp">None</span>

    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="nf">super</span><span class="p">().</span><span class="nf">__init__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
        <span class="n">self</span><span class="p">.</span><span class="n">writer</span> <span class="o">=</span> <span class="nc">TextWriter2D</span><span class="p">()</span>
        <span class="n">self</span><span class="p">.</span><span class="n">writer</span><span class="p">.</span><span class="n">text</span> <span class="o">=</span> <span class="s">"Hello ModernGL!"</span>

    <span class="k">def</span> <span class="nf">render</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">time</span><span class="p">,</span> <span class="n">frame_time</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">writer</span><span class="p">.</span><span class="nf">draw</span><span class="p">((</span><span class="mi">240</span><span class="p">,</span> <span class="mi">380</span><span class="p">),</span> <span class="n">size</span><span class="o">=</span><span class="mi">120</span><span class="p">)</span>


<span class="n">App</span><span class="p">.</span><span class="nf">run</span><span class="p">()</span>
</code></pre></div></div>

<p>In your <code class="language-plaintext highlighter-rouge">__init__</code> you can set the resolution of your window, along with any other configurations you may need.</p>

<p>Moderngl-window also comes with an imgui integration. Adding it is easy enough:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">moderngl_window</span>
<span class="kn">import</span> <span class="n">imgui</span>
<span class="kn">from</span> <span class="n">moderngl_window.integrations.imgui</span> <span class="kn">import</span> <span class="n">ModernglWindowRenderer</span>
<span class="kn">from</span> <span class="n">moderngl_window.text.bitmapped</span> <span class="kn">import</span> <span class="n">TextWriter2D</span>


<span class="k">class</span> <span class="nc">App</span><span class="p">(</span><span class="n">moderngl_window</span><span class="p">.</span><span class="n">WindowConfig</span><span class="p">):</span>
    <span class="n">title</span> <span class="o">=</span> <span class="s">"Text"</span>
    <span class="n">aspect_ratio</span> <span class="o">=</span> <span class="bp">None</span>

    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="nf">super</span><span class="p">().</span><span class="nf">__init__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
        <span class="n">imgui</span><span class="p">.</span><span class="nf">create_context</span><span class="p">()</span>
        <span class="n">self</span><span class="p">.</span><span class="n">imgui</span> <span class="o">=</span> <span class="nc">ModernglWindowRenderer</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">wnd</span><span class="p">)</span>
        <span class="n">self</span><span class="p">.</span><span class="n">writer</span> <span class="o">=</span> <span class="nc">TextWriter2D</span><span class="p">()</span>
        <span class="n">self</span><span class="p">.</span><span class="n">writer</span><span class="p">.</span><span class="n">text</span> <span class="o">=</span> <span class="s">"Hello ModernGL!"</span>

    <span class="k">def</span> <span class="nf">render</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">time</span><span class="p">,</span> <span class="n">frame_time</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">writer</span><span class="p">.</span><span class="nf">draw</span><span class="p">((</span><span class="mi">240</span><span class="p">,</span> <span class="mi">380</span><span class="p">),</span> <span class="n">size</span><span class="o">=</span><span class="mi">120</span><span class="p">)</span>
        <span class="n">imgui</span><span class="p">.</span><span class="nf">new_frame</span><span class="p">()</span>
        <span class="n">imgui</span><span class="p">.</span><span class="nf">begin</span><span class="p">(</span><span class="s">"Custom window"</span><span class="p">)</span>
        <span class="n">imgui</span><span class="p">.</span><span class="nf">text</span><span class="p">(</span><span class="s">"hello world"</span><span class="p">)</span>
        <span class="n">imgui</span><span class="p">.</span><span class="nf">end</span><span class="p">()</span>
        <span class="n">imgui</span><span class="p">.</span><span class="nf">render</span><span class="p">()</span>
        <span class="n">self</span><span class="p">.</span><span class="n">imgui</span><span class="p">.</span><span class="nf">render</span><span class="p">(</span><span class="n">imgui</span><span class="p">.</span><span class="nf">get_draw_data</span><span class="p">())</span>

    <span class="k">def</span> <span class="nf">resize</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">width</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">height</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">imgui</span><span class="p">.</span><span class="nf">resize</span><span class="p">(</span><span class="n">width</span><span class="p">,</span> <span class="n">height</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">key_event</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">action</span><span class="p">,</span> <span class="n">modifiers</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">imgui</span><span class="p">.</span><span class="nf">key_event</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">action</span><span class="p">,</span> <span class="n">modifiers</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">mouse_position_event</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">dx</span><span class="p">,</span> <span class="n">dy</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">imgui</span><span class="p">.</span><span class="nf">mouse_position_event</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">dx</span><span class="p">,</span> <span class="n">dy</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">mouse_drag_event</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">dx</span><span class="p">,</span> <span class="n">dy</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">imgui</span><span class="p">.</span><span class="nf">mouse_drag_event</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">dx</span><span class="p">,</span> <span class="n">dy</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">mouse_scroll_event</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">x_offset</span><span class="p">,</span> <span class="n">y_offset</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">imgui</span><span class="p">.</span><span class="nf">mouse_scroll_event</span><span class="p">(</span><span class="n">x_offset</span><span class="p">,</span> <span class="n">y_offset</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">mouse_press_event</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">button</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">imgui</span><span class="p">.</span><span class="nf">mouse_press_event</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">button</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">mouse_release_event</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">x</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">button</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">imgui</span><span class="p">.</span><span class="nf">mouse_release_event</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">button</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">unicode_char_entered</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">char</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">imgui</span><span class="p">.</span><span class="nf">unicode_char_entered</span><span class="p">(</span><span class="n">char</span><span class="p">)</span>

<span class="n">App</span><span class="p">.</span><span class="nf">run</span><span class="p">()</span>
</code></pre></div></div>

<p>We’ve added a lot of code, but it’s mostly just passing events through to our imgui instance, so when we click in places imgui knows when and where we clicked.</p>

<p>Imgui uses the immediate mode, allowing you to define your UI directly within a loop, which makes for a bit quicker iteration speed when building a prototype. With it you can build and test new ideas very rapidly.</p>

<h2 id="loading-and-scrubbing-video-in-opengl">Loading and Scrubbing Video in OpenGL</h2>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/7407c181-1702-429e-5e6f-89be6b09df00/webscale" alt="Architecture of Video loading with NVIDIA VideoProcessingFramework" /></p>

<p>Of course, any creative tool is really only useful if it feels real time.</p>

<p>To do this effectively, I used the hardware accelerated decoder built into most recent NVIDIA graphics cards.</p>

<p>This will allow us to use a dedicated chip on the GPU to do the decoding of each video frame, allowing us to seek and play with less latency, leading to better, lower latency playback, usually without having to do conversions of loaded files.</p>

<p>On NVIDIA hardware, you can just use the great <a href="https://github.com/NVIDIA/VideoProcessingFramework/">VideoProcessingFramework</a>. It even comes with example code, showcasing how to decode a video from the input colorspace to either a Pytorch tensor or an OpenGL texture:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># to seek to frame
</span><span class="n">src_surface</span> <span class="o">=</span> <span class="n">nvDec</span><span class="p">.</span><span class="nc">DecodeSingleSurface</span><span class="p">(</span><span class="n">nvc</span><span class="p">.</span><span class="nc">SeekContext</span><span class="p">(</span><span class="n">frame_no</span><span class="p">))</span>

<span class="c1"># convert to rgb color space using pipeline
</span><span class="n">rgb_pln</span> <span class="o">=</span> <span class="n">to_rgb</span><span class="p">.</span><span class="nf">run</span><span class="p">(</span><span class="n">src_surface</span><span class="p">)</span>

<span class="c1"># convert to pytorch tensor
</span><span class="n">src_tensor</span> <span class="o">=</span> <span class="nf">surface_to_tensor</span><span class="p">(</span><span class="n">rgb_pln</span><span class="p">)</span>

<span class="c1"># push to CPU, as a numpy array
</span><span class="n">b</span> <span class="o">=</span> <span class="n">src_tensor</span><span class="p">.</span><span class="nf">cpu</span><span class="p">().</span><span class="nf">numpy</span><span class="p">()</span>
</code></pre></div></div>

<p>This gives us something we can then manipulate in <a href="https://pillow.readthedocs.io/en/stable/">Pillow</a> or PyTorch / Numpy while prototyping, and save out as frames.</p>

<p>With this, we can then hook up Segment Anything, and use imgui to pick our points within the OpenGL frame. But before we can do that, ModernGL-Window supports MacOS. But MacOS doesn’t have any NVDIA hardware at all.</p>

<h2 id="building-hardware-video-decoding-and-stable-diffusion-xl-in-macos">Building Hardware Video Decoding and Stable Diffusion XL in MacOS</h2>

<div class="align-right">
<img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/a716a1ec-dcbd-4849-6118-f8fe7a53d200/webscale" width="350px" />
</div>

<p>Despite looking for a while, I couldn’t seem to be able to find a way to do hardware decoding on MacOS, despite there being hardware support for it implemented on the M1 and M2 series processors.</p>

<p>So I decided to try and write a Python hardware playback library for MacOS. Apple has <a href="https://developer.apple.com/documentation/avfoundation/media_playback?language=objc">extensions written in Objective-C to do hardware decoding</a>, but unfortunately hasn’t released a Python interface too these APIs.</p>

<p>In order to do this, I used Cython, and took from <a href="https://github.com/openframeworks/openFrameworks/blob/cce8428e1b6754f0457b14a81aa19d7434be06a3/addons/ofxiOS/src/video/AVFoundationVideoPlayer.m">OpenFramework’s implementation</a> of their video player. This allowed me to convert the Objective-C code into C++, and then into a Python API that could play back the video.</p>

<p>After a lot of trial and error, I eventually had a library I could import and run on my MacOS machine, with a reasonable API that mostly matched NVIDIA’s VideoProcessingFramework:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">videoplayback</span>

<span class="n">player</span> <span class="o">=</span> <span class="n">videoplayback</span><span class="p">.</span><span class="nc">AVFPlayer</span><span class="p">()</span>
<span class="n">player</span><span class="p">.</span><span class="nf">load</span><span class="p">(</span><span class="s">"filename"</span><span class="p">)</span>

<span class="c1"># unfortunately haven't made loading a file synchronous yet
</span><span class="n">time</span><span class="p">.</span><span class="nf">sleep</span><span class="p">(.</span><span class="mi">3</span><span class="p">)</span>

<span class="n">numFrames</span> <span class="o">=</span> <span class="n">player</span><span class="p">.</span><span class="nf">length_in_frames</span><span class="p">()</span>
<span class="n">w</span> <span class="o">=</span> <span class="n">player</span><span class="p">.</span><span class="nf">width</span><span class="p">()</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">player</span><span class="p">.</span><span class="nf">height</span><span class="p">()</span>

<span class="n">destination_frame</span> <span class="o">=</span> <span class="mi">1</span>
<span class="c1"># get a frame 
</span><span class="n">player</span><span class="p">.</span><span class="nf">seek</span><span class="p">(</span><span class="n">destination_frame</span><span class="p">)</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">asarray</span><span class="p">(</span><span class="n">player</span><span class="p">.</span><span class="nf">imageframe</span><span class="p">())</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">image</span><span class="p">.</span><span class="nf">view</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">uint8</span><span class="p">).</span><span class="nf">reshape</span><span class="p">(</span><span class="n">image</span><span class="p">.</span><span class="n">shape</span> <span class="o">+</span> <span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,))</span>
<span class="c1"># shape (1080, 1920, 4)
</span><span class="n">image</span> <span class="o">=</span> <span class="n">image</span><span class="p">[:,:,</span><span class="mi">0</span><span class="p">:</span><span class="mi">3</span><span class="p">]</span>
<span class="n">b</span><span class="p">[:,:,[</span><span class="mi">0</span><span class="p">,</span><span class="mi">2</span><span class="p">]]</span> <span class="o">=</span> <span class="n">b</span><span class="p">[:,:,[</span><span class="mi">2</span><span class="p">,</span><span class="mi">0</span><span class="p">]]</span>
<span class="c1"># shape (1080, 1920, 3)
</span><span class="n">image</span> <span class="o">=</span> <span class="n">image</span><span class="p">.</span><span class="nf">copy</span><span class="p">(</span><span class="n">order</span><span class="o">=</span><span class="s">'C'</span><span class="p">)</span>
</code></pre></div></div>

<p>As for Stable Diffusion XL, Apple has released a repository with <em>some</em> optimizations, making it possible to generate images using hardware acceleration on M1 and M2 hardware.</p>

<p>Unfortunately, <a href="https://github.com/apple/ml-stable-diffusion/pull/277">these optimizations</a> bring the time to generate a single image to around 2 minutes on my M1 Macbook Pro with 64GB of memory. Not a great feedback loop for creatives. For reference, generating a Stable Diffusion XL image on my desktop computer with a 4090 takes a few seconds.</p>

<p>However, using a service like <a href="https://modal.com/">Modal</a>, I was able to get inference down to a few seconds by using a serverless GPU instance.</p>

<h2 id="getting-early-feedback">Getting Early Feedback</h2>

<video width="350" autoplay="" muted="" loop="" playsinline="" class="align-right">
    <source src="https://pub-f17786433d2849ff86c458a4019a0ed6.r2.dev/transparent_out.webm" type="video/mp4" />
    <source src="https://pub-f17786433d2849ff86c458a4019a0ed6.r2.dev/transparent_out.mov" type="video/webm" />
</video>

<p>When I showed a version of the editor prototype to a friend, he was excited about the ability to segment objects out of a video, while moving.</p>

<p>So I built out the tools to be able to do this, using the <a href="https://arxiv.org/abs/2210.09782">De-AOT</a> model. This allows for predicting up to around the next 20 frames of a video at a time, and mostly works as well as you could ask.</p>

<p>As I built this out, I started seeing some potential uses of Stable Diffusion that could be a tool to empower creatives, rather than imitating their work whole cloth.</p>

<h2 id="building-a-model-pipeline-for-asset-generation">Building a Model Pipeline for Asset Generation</h2>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/a3dece87-5763-4ddc-d217-2d8f4642ca00/webscale" alt="Stable Diffusion in Action" /></p>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/bad3a98c-1815-461d-1719-2f5e654fa600/webscale" alt="Stable Diffusion Asset Pipeline" /></p>

<p>See normally, Diffusion models aren’t built to live within an existing frame or context. Instead, they’re built to create an image whole cloth from a token prompt, imagining from noise what a picture may look like, iteratively.</p>

<p>In order to generate a transparent asset for a video (like a UFO or an arrow, or…), we’d need to be able to isolate and segment what we want out of an image, hopefully automatically.</p>

<p>With this, we can now see and generate assets:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">generate_segmented_diffusion</span><span class="p">(</span><span class="nb">object</span><span class="p">,</span> <span class="n">prompt</span><span class="p">,</span> <span class="n">negative_prompt</span><span class="o">=</span><span class="s">""</span><span class="p">,</span> <span class="n">auto</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">seed</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
    <span class="k">if</span> <span class="n">seed</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
        <span class="n">generator</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="nc">Generator</span><span class="p">(</span><span class="n">device</span><span class="o">=</span><span class="s">"cuda"</span><span class="p">).</span><span class="nf">manual_seed</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span> 
        <span class="n">pipeline_text2image</span> <span class="o">=</span> <span class="n">AutoPipelineForText2Image</span><span class="p">.</span><span class="nf">from_pretrained</span><span class="p">(</span>
            <span class="s">"stabilityai/stable-diffusion-xl-base-1.0"</span><span class="p">,</span> <span class="n">torch_dtype</span><span class="o">=</span><span class="n">torch</span><span class="p">.</span><span class="n">float16</span><span class="p">,</span> <span class="n">variant</span><span class="o">=</span><span class="s">"fp16"</span><span class="p">,</span> <span class="n">use_safetensors</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
            <span class="n">cache_dir</span><span class="o">=</span><span class="s">"/app/.cache"</span><span class="p">,</span> <span class="n">generator</span><span class="o">=</span><span class="n">generator</span>
            <span class="p">).</span><span class="nf">to</span><span class="p">(</span><span class="s">"cuda"</span><span class="p">)</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">pipeline_text2image</span> <span class="o">=</span> <span class="n">AutoPipelineForText2Image</span><span class="p">.</span><span class="nf">from_pretrained</span><span class="p">(</span>
            <span class="s">"stabilityai/stable-diffusion-xl-base-1.0"</span><span class="p">,</span> <span class="n">torch_dtype</span><span class="o">=</span><span class="n">torch</span><span class="p">.</span><span class="n">float16</span><span class="p">,</span> <span class="n">variant</span><span class="o">=</span><span class="s">"fp16"</span><span class="p">,</span> <span class="n">use_safetensors</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
            <span class="n">cache_dir</span><span class="o">=</span><span class="s">"/app/.cache"</span>
            <span class="p">).</span><span class="nf">to</span><span class="p">(</span><span class="s">"cuda"</span><span class="p">)</span>

    <span class="n">img</span> <span class="o">=</span> <span class="nf">pipeline_text2image</span><span class="p">(</span><span class="n">prompt</span><span class="o">=</span><span class="n">prompt</span><span class="p">,</span><span class="n">negative_prompt</span><span class="o">=</span><span class="n">negative_prompt</span><span class="p">).</span><span class="n">images</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>

    <span class="k">if</span> <span class="n">auto</span><span class="p">:</span> <span class="c1"># just take first mask from SAM
</span>        <span class="n">model</span> <span class="o">=</span> <span class="nf">get_dino_model</span><span class="p">()</span>
        <span class="n">img_ground</span> <span class="o">=</span> <span class="nf">transform_pil_image_for_grounding</span><span class="p">(</span><span class="n">img</span><span class="p">)</span>
        <span class="n">b</span><span class="p">,</span> <span class="n">p</span> <span class="o">=</span> <span class="nf">get_grounding_output</span><span class="p">(</span><span class="n">model</span><span class="o">=</span><span class="n">model</span><span class="p">,</span> <span class="n">image</span><span class="o">=</span><span class="n">img_ground</span><span class="p">,</span> <span class="n">caption</span><span class="o">=</span><span class="nb">object</span><span class="p">,</span> <span class="n">box_threshold</span><span class="o">=</span><span class="p">.</span><span class="mi">35</span><span class="p">,</span> <span class="n">text_threshold</span><span class="o">=</span><span class="p">.</span><span class="mi">25</span><span class="p">)</span>
        <span class="n">boxes</span> <span class="o">=</span> <span class="n">b</span> <span class="o">*</span> <span class="mi">1024</span> <span class="c1"># 1024 x 1024 for sdxl images
</span>        <span class="n">boxes</span> <span class="o">=</span> <span class="nf">box_convert</span><span class="p">(</span><span class="n">boxes</span><span class="o">=</span><span class="n">boxes</span><span class="p">,</span> <span class="n">in_fmt</span><span class="o">=</span><span class="s">"cxcywh"</span><span class="p">,</span> <span class="n">out_fmt</span><span class="o">=</span><span class="s">"xyxy"</span><span class="p">).</span><span class="nf">numpy</span><span class="p">()</span>
        <span class="n">predictor</span><span class="p">.</span><span class="nf">set_image</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">asarray</span><span class="p">(</span><span class="n">img</span><span class="p">))</span>
        <span class="n">masks</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">predictor</span><span class="p">.</span><span class="nf">predict</span><span class="p">(</span><span class="n">box</span><span class="o">=</span><span class="n">boxes</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="c1"># just take first for auto
</span>        <span class="n">masks</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">where</span><span class="p">(</span><span class="n">masks</span><span class="p">,</span> <span class="mi">255</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
        <span class="n">masks</span> <span class="o">=</span> <span class="n">masks</span><span class="p">.</span><span class="nf">copy</span><span class="p">(</span><span class="n">order</span><span class="o">=</span><span class="s">'C'</span><span class="p">)</span>
        <span class="n">img_cutout</span> <span class="o">=</span> <span class="n">Image</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s">'RGBA'</span><span class="p">,</span> <span class="p">(</span><span class="mi">1024</span><span class="p">,</span> <span class="mi">1024</span><span class="p">),</span> <span class="n">color</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">))</span>
        <span class="n">mask</span> <span class="o">=</span> <span class="n">Image</span><span class="p">.</span><span class="nf">fromarray</span><span class="p">(</span><span class="n">masks</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nf">astype</span><span class="p">(</span><span class="s">'uint8'</span><span class="p">))</span>
        <span class="n">img_cutout</span><span class="p">.</span><span class="nf">paste</span><span class="p">(</span><span class="n">img</span><span class="p">,</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">),</span> <span class="n">mask</span><span class="o">=</span><span class="n">mask</span><span class="p">)</span>
        <span class="n">img_cutout</span> <span class="o">=</span> <span class="n">img_cutout</span><span class="p">.</span><span class="nf">transpose</span><span class="p">(</span><span class="n">Image</span><span class="p">.</span><span class="n">Transpose</span><span class="p">.</span><span class="n">FLIP_TOP_BOTTOM</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">img_cutout</span>
    <span class="k">else</span><span class="p">:</span> <span class="c1"># return image with masks
</span>        <span class="n">model</span> <span class="o">=</span> <span class="nf">get_dino_model</span><span class="p">()</span>
        <span class="n">img_ground</span> <span class="o">=</span> <span class="nf">transform_pil_image_for_grounding</span><span class="p">(</span><span class="n">img</span><span class="p">)</span>
        <span class="n">b</span><span class="p">,</span> <span class="n">p</span> <span class="o">=</span> <span class="nf">get_grounding_output</span><span class="p">(</span><span class="n">model</span><span class="o">=</span><span class="n">model</span><span class="p">,</span> <span class="n">image</span><span class="o">=</span><span class="n">img_ground</span><span class="p">,</span> <span class="n">caption</span><span class="o">=</span><span class="nb">object</span><span class="p">,</span> <span class="n">box_threshold</span><span class="o">=</span><span class="p">.</span><span class="mi">35</span><span class="p">,</span> <span class="n">text_threshold</span><span class="o">=</span><span class="p">.</span><span class="mi">25</span><span class="p">)</span>
        <span class="n">boxes</span> <span class="o">=</span> <span class="n">b</span> <span class="o">*</span> <span class="mi">1024</span> <span class="c1"># 1024 x 1024 for sdxl images
</span>        <span class="n">boxes</span> <span class="o">=</span> <span class="nf">box_convert</span><span class="p">(</span><span class="n">boxes</span><span class="o">=</span><span class="n">boxes</span><span class="p">,</span> <span class="n">in_fmt</span><span class="o">=</span><span class="s">"cxcywh"</span><span class="p">,</span> <span class="n">out_fmt</span><span class="o">=</span><span class="s">"xyxy"</span><span class="p">).</span><span class="nf">numpy</span><span class="p">()</span>
        <span class="n">predictor</span><span class="p">.</span><span class="nf">set_image</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">asarray</span><span class="p">(</span><span class="n">img</span><span class="p">))</span>
        <span class="n">masks</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">predictor</span><span class="p">.</span><span class="nf">predict</span><span class="p">(</span><span class="n">box</span><span class="o">=</span><span class="n">boxes</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="c1"># just take first for auto
</span>        <span class="n">masks</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">where</span><span class="p">(</span><span class="n">masks</span><span class="p">,</span> <span class="mi">255</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
        <span class="n">masks</span> <span class="o">=</span> <span class="n">masks</span><span class="p">.</span><span class="nf">copy</span><span class="p">(</span><span class="n">order</span><span class="o">=</span><span class="s">'C'</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">img</span><span class="p">,</span> <span class="n">masks</span>

</code></pre></div></div>

<p>Rather than creating a final image whole cloth, we can now explore our existing artistic vision with this. (Sidenote, I mentioned at the beginning skepticism with Stable Diffusion and its training data set. I think (I might be wrong!) that incorporating feedback with it like is a better approach than just generating things whole cloth. I’m still not sure here.)</p>

<h1 id="using-a-controlnet-to-collaborate-with-your-image">Using a ControlNet to Collaborate with your Image</h1>

<video width="95%" autoplay="" muted="" loop="" playsinline="">
    <source src="https://pub-f17786433d2849ff86c458a4019a0ed6.r2.dev/taildoneanother.mp4" type="video/mp4" />
</video>

<p>ControlNets are a way to steer and direct the diffusion process as it occurs. You can train a model to steer output using your own input, so there’s a bit more control of how your diffusion model generates.</p>

<p>For example, you can use a model called OpenPose to show how the people in your images should be posed. You can use ControlNet to show exactly how many people should be in your image, and how they should oriented.</p>

<p>Alternatively, if you’ve got an object you’d like to transform into a diffusion, you can use something like Canny Edge detection to get an outline of your object as part of the steering of the generation.</p>

<p>We can use these models to change assets within our video, while retaining their original proportions. In the video above I’ve turned an orange barrier into a concrete barrier.</p>

<p>But if you look at it closely, there’s still a few issues. Ideally I’d have a virtual 3D asset I could move with the camera’s movement.</p>

<h1 id="using-a-diffusion-network-for-video-with-tokenflow">Using a Diffusion Network for Video with TokenFlow</h1>

<video width="350" autoplay="" muted="" loop="" playsinline="" class="align-right" style="padding-bottom: 15px;">
  <source src="https://pub-f17786433d2849ff86c458a4019a0ed6.r2.dev/tail-rerender.mp4" type="video/mp4" />  
</video>

<p>But using a ControlNet on it’s own doesn’t work very well for video. If you look at the video on the right, you’ll see you get consistently flickering frames, as the diffusion model guesses a different outcome based upon the input noise. Even if you keep the same initial seed image consistent across frames, the way Diffusion models work means you’ll get flicker.</p>

<p>There are techniques to improve the flicker, a commonly used technique is <a href="https://github.com/guoyww/animatediff/">AnimateDiff</a>, which adds a temporal layer in the middle of the Diffuser. But these techniques aren’t perfect.</p>

<p>The best implementation for temporal stability I’ve seen so far is TokenFlow, which uses a <em>different</em> method called <a href="https://arxiv.org/abs/2211.12572">Plug and Play</a> to steer the generation of an image while retaining the spatial meaning. You can see a snipped of the above video with a prompt of “a man surfing a wave”.</p>

<video width="350" autoplay="" muted="" loop="" playsinline="" class="align-left" style="padding-bottom: 15px;">
  <source src="https://pub-f17786433d2849ff86c458a4019a0ed6.r2.dev/tokenflow_PnP_fps_30.MP4" type="video/mp4" />
</video>

<p>Plug and Play uses an initial image which is inverted to noise via DDIM, and then run through the diffusion process, allowing the features relevant to generating the image to be extracted. These features are then injected in the self-attention layers of the diffusion model, while using the same inverted noise of the original image.</p>

<p>By combining this technique with frame to frame correlation, you can then have a temporally consistent video, relatively free of flicker, as shown above. But still, both of these techniques appear very artificial, and have the stigma associated with “AI video”, that sort of artificially generated artifacts.</p>

<h2 id="building-a-model-exporter-for-ebsynth">Building a Model Exporter for ebsynth</h2>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/190d07bc-b142-4f94-491a-1cec977dd400/webscale" alt="Stable Diffusion Controlet for ebsynth" /></p>

<video width="350" autoplay="" muted="" loop="" playsinline="" class="align-right" style="padding-bottom: 15px;">
    <source src="https://pub-f17786433d2849ff86c458a4019a0ed6.r2.dev/longer.mp4" type="video/mp4" />
</video>

<p><a href="https://ebsynth.com/">Ebsynth</a> is a non-deep learning method for replacing the textures of your video with another art style. It works by taking an input series of video frames, along with some keyframes that have been painted to match the style you’d like your video to be replaced by.</p>

<p>We can use ControlNet along with Stable Diffusion to take our video, and explore different text based ideas for textures.</p>

<p>In the example here, I used “red hot lava volcano fire flames” as my prompt, as I wanted to create a glowing effect to myself.</p>

<p>For the ControlNet, I detect whether or not there are people present in the image, and if so, add a weighted ControlNet for OpenPose, in addition to the Canny Image ControlNet.</p>

<p>Given all the other tools we’ve already built, adding ebsynth generation is straightforward enough. We need to select a subset of frames, and then run Stable Diffusion on them. This allows us to ensure they all come from the same pathway generated by the diffusion model. I’ve also added a mask mode to the video, allowing us to isolate a specific person for texture creation, versus applying it to the whole video.</p>

<p><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/086cc792-476a-4ecd-12dc-57ec550b2f00/webscale" alt="Architecture Diagram of Ebsynth and Stable Diffusion" /></p>

<p>Again, this allows for us to generate special effects, ensuring we have a single effect applied to a single subject in our video.</p>

<p>With this, we can put in our generated versions of our characters back in to our videos.</p>

<h2 id="building-ai-for-artists">Building AI for Artists</h2>

<video width="95%" autoplay="" muted="" loop="" playsinline="">&gt;
    <source src="https://pub-f17786433d2849ff86c458a4019a0ed6.r2.dev/verycoolhippie.mp4" type="video/mp4" />
</video>

<p><strong>After having spent a few months using computer vision models and generative AI to build an editor, what have I learned?</strong></p>

<p>Video is inherently multi-modal. You have a sequence of images, audio, and a narrative. Some machine learning models are now capable of working with multi-modal inputs. But the day to day work of video creation doesn’t fit well into a single model just yet. The most interesting results I’ve gotten have come from a mixture of traditional video editing, along with an orchestration of models.</p>

<p><strong>But these tools don’t replace creativity!</strong> Creativity is mostly just continuing to show up for work, every day. Some days you show up and the results seem to come easily, and others it feels impossible to do anything.</p>

<p>We’re still so early in the process of Generative AI, that I still can’t really tell what the future will look like. I can see the tools starting to take shape, but it’s not clear yet what will win, or how things will work.</p>

<p><strong>I have hope that we’ll be able to beat the massive pools of capital seeking to replace creatives, and that the creatives will win.</strong></p>

<p>If you want to follow along as I keep exploring, I encourage you to share this article, and sign up below for early access to the video editor.</p>

<center>
<div id="mc_embed_shell">
      <link href="//cdn-images.mailchimp.com/embedcode/classic-061523.css" rel="stylesheet" type="text/css" />
  <style type="text/css">
        #mc_embed_signup{background:#fff; false;clear:left; font:14px Helvetica,Arial,sans-serif; width: 600px;}
        /* Add your own Mailchimp form style overrides in your site stylesheet or in this style block.
           We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */
</style>
<div id="mc_embed_signup">
    <form action="https://makeartwithpython.us6.list-manage.com/subscribe/post?u=3feb377469b9e8fab8d52bd3f&amp;id=a998076775&amp;f_id=00ec22e3f0" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank">
        <div id="mc_embed_signup_scroll">
            <div class="indicates-required"><span class="asterisk">*</span> indicates required</div>
            <div class="mc-field-group"><label for="mce-EMAIL">Email Address <span class="asterisk">*</span></label><input type="email" name="EMAIL" class="required email" id="mce-EMAIL" required="" value="" /></div>
<div hidden=""><input type="hidden" name="tags" value="3379007" /></div>
        <div id="mce-responses" class="clear">
            <div class="response" id="mce-error-response" style="display: none;"></div>
            <div class="response" id="mce-success-response" style="display: none;"></div>
        </div><div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_3feb377469b9e8fab8d52bd3f_a998076775" tabindex="-1" value="" /></div><div class="clear"><input type="submit" name="subscribe" id="mc-embedded-subscribe" class="button" value="Notify Me" /></div>
    </div>
</form>
</div>
<script type="text/javascript" src="//s3.amazonaws.com/downloads.mailchimp.com/js/mc-validate.js"></script><script type="text/javascript">(function($) {window.fnames = new Array(); window.ftypes = new Array();fnames[0]='EMAIL';ftypes[0]='email';fnames[1]='FNAME';ftypes[1]='text';fnames[2]='LNAME';ftypes[2]='text';fnames[3]='ADDRESS';ftypes[3]='address';fnames[4]='PHONE';ftypes[4]='phone';}(jQuery));var $mcj = jQuery.noConflict(true);</script></div>
</center>

<p>I’d also love to hear from you if you have any ideas, please reach out via <a href="https://twitter.com/burningion">Twitter</a>.</p>]]></content><author><name>Kirk Kaiser</name></author><summary type="html"><![CDATA[Jumping into the deep end of the (GPU poor) machine learning pool]]></summary></entry><entry><title type="html">Open Source and the Battle of the GPU Poors</title><link href="/blog/what-is-happening-with-gpus/" rel="alternate" type="text/html" title="Open Source and the Battle of the GPU Poors" /><published>2023-09-07T00:00:00+00:00</published><updated>2023-09-07T00:00:00+00:00</updated><id>/blog/what-is-happening-with-gpus</id><content type="html" xml:base="/blog/what-is-happening-with-gpus/"><![CDATA[<h1 id="the-ai-battle-really-begins">The AI Battle <em>Really</em> Begins</h1>

<p><a href="https://www.washingtonpost.com/business/2023/07/28/crypto-and-ai-my-eyeball-met-with-sam-altman-s-worldcoin-iris-scanner/a06c47f6-2d01-11ee-a948-a5b8a9b62d84_story.html"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/cb8e5a5b-7ec7-4c1e-8acc-668eaccf3f00/webscale" alt="worldcoin" /></a></p>

<p>In late 2022, there was a ton of <a href="https://www.theregister.com/2022/07/04/azure_capacity_issues/">Azure capacity issues</a> reported in the media. At the time, the lack of capacity was blamed on “supply chain” problems.</p>

<p>But a quarter or two later, OpenAI released <a href="https://openai.com/gpt-4">GPT-4</a>, which showcased new capabilities previously unseen in Large Language Models (LLMs), and apparently required a <a href="https://blogs.microsoft.com/blog/2023/01/23/microsoftandopenaiextendpartnership/">massive amount of computation</a> from Azure to train.</p>

<p>The novelty and usefulness of GPT-4 were apparent immediately. An incredible <a href="https://arxiv.org/abs/2303.12712">early paper from Microsoft</a> showcased how GPT-4 <em>could be</em> the beginning of Artificial General Intelligence, a multi-modal, general purpose reasoning machine with a mostly human level understanding of the world.</p>

<p>GPT-4 was so disruptive, in fact, that OpenAI is now on track to generate <a href="https://www.businessinsider.com/how-much-money-does-chatgpt-openai-make-2023-8?op=1">$1 billion</a> in annual revenue from its ChatGPT product less than a year after the launch.</p>

<p>Other companies are now waking up.</p>

<h1 id="the-high-costs-of-doing-machine-learning">The High Costs of Doing Machine Learning</h1>

<p><a href="https://www.nvidia.com/en-us/data-center/h100/"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/1f3fa6b8-976b-4e03-42a0-501806c4c400/webscale" alt="H100 GPU Cluster" /></a></p>

<p>If you’re not actively following the AI space, you may have missed NVIDIA’s Q2 2024 results. They were unbelievably good, given the size and scale of the company. Revenue was <a href="https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-second-quarter-fiscal-2024">up 141% from the previous quarter, and up 171% from a year ago</a>.</p>

<p><strong>It seems every large company is now scrambling to catch up in the AI game, and in the process are spending outlandish sums of money to build datacenters filled with the latest NVIDIA GPUs necessary for training.</strong></p>

<p>To be clear, each of NVIDIA’s latest GPUs (the <a href="https://www.nvidia.com/en-us/data-center/h100/">H100</a>) cost around $34,000, but generally is used in machines designed for clusters of 8 in a single machine. <strong>These machines cost between $300k-400k each.</strong></p>

<p>For reference, a <em>single</em> training run for a 70B parameter language model (<a href="https://scontent-lax3-1.xx.fbcdn.net/v/t39.2365-6/10000000_662098952474184_2584067087619170692_n.pdf?_nc_cat=105&amp;ccb=1-7&amp;_nc_sid=3c67a6&amp;_nc_ohc=LDQRf06eBIEAX_2jIdd&amp;_nc_ht=scontent-lax3-1.xx&amp;oh=00_AfD1bavGQroZhVWQdHJbAJMlSvicJ8KwhORQldio7GtrVg&amp;oe=6501A37F">LLama2</a> in this example) uses 1,720,320 GPU hours worth of compute at 400W of energy usage each.</p>

<p>(In this case, LLama2 used A100s, the prior generation to train. But the hours / investment are comparable for training a given model.)</p>

<p><strong>This would take 10,240 GPUs, running 24/7 (at a cost of around $350 mil in GPUs alone!) to train the model in a week.</strong></p>

<p>If we wanted to avoid <a href="https://www.investopedia.com/terms/c/capitalexpenditure.asp">CapEx</a> and instead use <a href="https://aws.amazon.com/">AWS</a>, a <a href="https://aws.amazon.com/ec2/instance-types/p4/">p4d.24xlarge</a> costs $32.77 per hour, and comes with 8 A100 GPUs each. That’s an affordable $7 million to train one model run, ignoring the costs of data transfer, debugging, and setting up of data pipelines. (<em>If</em> there are any <a href="https://gpus.llm-utils.org/nvidia-h100-gpus-supply-and-demand/">available</a>).</p>

<p>Given the extremely high costs associated with developing these state of the art models, <a href="https://www.semianalysis.com/p/google-gemini-eats-the-world-gemini">SemiAnalysis</a> has coined the term <strong>“GPU rich”</strong> vs <strong>“GPU poor”</strong>. Companies which have invested heavily in GPU infrastructure prior to the GPT-4 explosion are considered GPU rich, and capable of building state of the art models, while everyone else is considered GPU poor and scrambling to catch up, but currently locked out of building these models.</p>

<p><strong>This is a wild experience, because it’s the first time in my life it’s been <em>impossible</em> to develop a kind of software without access to a giant pool of capital and proprietary data.</strong></p>

<p><strong>The thing that made software interesting to me as a young person was the lack of costs and low barrier to entry.</strong> <em>Anyone</em> anywhere in the world could build software and contribute to the global conversation of software.</p>

<p>A compiler to build programs was free, and an operating system to run them was free, thanks to GNU and Linux. Everyone working in or contributing to open source could take and give freely to the overal value available in Open Source. This open ecosystem led to the amazing growth of the cloud and software in general.</p>

<p><strong>Trillions of dollars in tech company valuations and returns were only possible because of the open source ecosystem.</strong></p>

<h1 id="ai-success-is-a-perfect-storm-for-capital-concentration-and-increased-inequality">AI Success Is a Perfect Storm for Capital Concentration and Increased Inequality</h1>

<p><a href="https://www.goldengate.org/bridge/photo-gallery/golden-gate-bridge-contemporary-photos/"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/5307924b-cff9-420a-1eca-651eee96e100/webscale" alt="funding secured" /></a></p>

<p>Given the massive costs associated with building or renting a GPU cluster, the potential moat available to companies with access to the giant pools of capital necessary for GPU clusters seems unparalleled.</p>

<p>And indeed, this is why we’re seeing startups that are pre-product raising <a href="https://www.cnbc.com/2022/05/16/inflection-ai-linkedin-and-deepmind-co-founders-raise-225-million.html">hundreds of millions of dollars</a>, again <strong>because the cost of entry into the state of the art machine learning model business is literally hundreds of millions in compute costs.</strong></p>

<p>For someone used to the previous, cloud based and open source software paradigm, there couldn’t be a wider difference.</p>

<p>Suddenly the costs of building a product aren’t concentrated in the humans necessary to build the software itself, but rather in the raw costs of materials, energy, and data necessary to participate.</p>

<p>Which sets artificial intelligence up for a socially dangerous feedback loop.</p>

<p>There will only be so many superclusters in the world capable of training these giant models, and there will be only a relative few engineers who are using and gaining experience with these superclusters.</p>

<p>The talent pool will concentrate and shrink, and the pressure to deliver will grow on the people running and training these models.</p>

<p><strong>As models like GPT-4 show the potential to automate and destroy <em>the vast majority of</em> the very best paying of jobs for workers available, (that is, knowledge work in general), we’re looking at a future where existing capital structures have the potential to become permanently self-perpetrating, largely off training data off the back of the public’s work (which comprises the majority of training data used on the GPUs).</strong></p>

<p>As AI’s capabilities grow, a substanial amount of previously well employed knowledge workers may actually be permanently captured by these models, trained on the data taken from them, without compensating them for their efforts.</p>

<h1 id="the-open-source-machine-learning-hero-is-meta">The Open Source Machine Learning Hero is… Meta?!</h1>

<p><a href="https://ai.meta.com/blog/llama-2/"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/5b50cb2d-fa3c-4841-aec0-28728826dc00/webscale" alt="Llama-2" /></a></p>

<p>Given the costs associated with training these models, the moat for OpenAI’s GPT-4 appeared to be enormous and durable. But then Meta (prev Facebook) released <a href="https://ai.meta.com/blog/large-language-model-llama-meta-ai/">Llama</a>.</p>

<p>Llama was originally supposed to be a large language model released just to researchers, on a case by case basis. Inevitably though, the model leaked, and showed up on Torrent websites. <strong>Soon enough, a model that cost millions to create was in the hands of everyone who wanted to try it out, and knew how to use a torrent link.</strong></p>

<p><strong>This led to a rapid explosion in the <em>public</em> progress around large language models.</strong> Researchers fine tuned the leaked model on GPT-4 output and made it public. Shortly afterwards a reasonably useful model was made to run on consumer level hardware, thanks to projects like <a href="https://github.com/ggerganov/llama.cpp">llama.cpp</a>.</p>

<p>With that rapid public progress, Meta has continued to <a href="https://segment-anything.com/">release</a> <a href="https://ai.meta.com/blog/code-llama-large-language-model-coding/">powerful</a> <a href="https://github.com/facebookresearch/co-tracker">models</a>, focused on computer vision, language, and more.</p>

<p><strong>Better still, they’ve released some of their models in a way that allows for commercial use. And most recently, they released a completely free large language model, Llama 2.</strong></p>

<p>The moat inherent in machine learning’s costs is becoming less apparent. Especially when you consider a trained model could always leak.</p>

<h1 id="the-moat-for-ai-is-now-secrecy-and-paranoia">The Moat for AI is Now Secrecy and Paranoia</h1>

<p><a href="https://youtu.be/Nlkk3glap_U?t=2620"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/6909f497-609e-4409-c25e-dbfefbdcd600/webscale" alt="Dario Amodei" /></a></p>

<p>In an interview the CEO of Anthropic, <a href="https://www.dwarkeshpatel.com/i/135814349/cybersecurity">Dario Amodei</a> spoke a bit about his approach to cyber security with training his models.</p>

<p>He mentioned that some of the edge for his company relies on what he calls “compute multipliers”. The core idea is that they have discovered training optimizations that are the equivalent of having more compute.</p>

<p>At his organization, <strong>he’s implemented a compartamentalization strategy, similar to a spy agency</strong>, where no one person could leak the overall approaches necessary to train the Antropic Claude models.</p>

<p>What’s interesting is that in the same interview, he tends to beat around the bush of <em>how much</em> the business model of AI relies on the model <em>never</em> being leaked or hacked. There is an acknowledgement that governments will be able to hack and steal any model they like, given the model in question shows high enough value. But there isn’t an equivalent belief that someone will <em>leak</em> the same.</p>

<p>Similarly, for now there is a relatively minor moat in larger language models. It may require multiple, $34,000 GPUs to do execution after training. But again, the open source work of projects like <a href="https://github.com/ggerganov/llama.cpp">llama.cpp</a>, <a href="https://huggingface.co/">huggingface</a>, and others is chipping away at the moat.</p>

<h1 id="the-vibes-of-training-ai-models-are-kinda-wack">The Vibes of Training AI Models are Kinda Wack</h1>

<p><a href="https://www.youtube.com/watch?v=w7aSybHRa6s"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/893427e4-2b90-49dd-a4d1-182aa8410e00/webscale" alt="the future is great" /></a></p>

<p>So let’s recap the dynamics of privately funded AI:</p>

<ul>
  <li><a href="https://inflection.ai/inflection-ai-announces-1-3-billion-of-funding">Extremely high costs</a></li>
  <li>Secrecy concerns / Model Leaks</li>
  <li><a href="https://arstechnica.com/security/2023/09/hack-of-a-microsoft-corporate-account-led-to-azure-breach-by-chinese-hackers/">Defense / Spy considerations</a></li>
  <li><a href="https://en.wikipedia.org/wiki/AI_alignment">Alignment risks</a></li>
  <li><a href="https://www.techrepublic.com/article/openai-microsoft-class-action/">Source training data liabilities</a></li>
</ul>

<p>All of this is happening behind closed doors, with giant pools of capital building super machines, with the usage of the humanity’s collective knowledge and knowledge labor as input.</p>

<p>To capital, AI seems to be the ultimate sort of privatization of public knowledge, along with the ability to steer the source of what will certainly become “the reference” for an “unbiased” source of information for a substantial portion of humanity.</p>

<p>Control of the knowledge base from which humanity has its reference is of course, slightly interesting to intelligence agencies and governments.</p>

<p>Given the vibes and given the costs, why even try to participate? Why not leave the AI game to the “grown ups”, and let them tell you the future?</p>

<h1 id="open-source-and-the-battle-against-techno-disillusionment">Open Source And the Battle Against Techno-Disillusionment</h1>

<p><a href="https://www.youtube.com/watch?v=w7aSybHRa6s"><img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/4b72f8bb-4885-4842-7e64-e84598da7e00/webscale" alt="a real battle" /></a></p>

<p><strong>I’ve been lucky enough over the past decade’s worth of technological improvements to have had the tiniest seat at the table.</strong> I’ve been able to steer at least a part of the conversation for what the future of technology and software will look like.</p>

<p>My peers outside of tech, however, have mostly not had a voice. For an increasingly online and software driven world, most people living within it don’t have a say for how it should work.</p>

<p><strong>Instead, it’s mostly been venture funded organizations seeking massive growth who have built the digital worlds everyone inhabits.</strong></p>

<p>This mostly worked out fine for the past 10 years because VC was mostly in the business of giving money away. Apps like Uber were famously money losing, scaling in an effort to build out market dominance. In the meantime, humanity got a sweet discount on rides.</p>

<p>But this new generation of technological improvments threatens to turn the disillusionment dials up to 11. Imagine only a few thousand people steering <em>all of</em> the most important software used by the rest of the world. Imagine the sorts of power games and pressure with such a limited number of people working in the space. Imagine the immense pressure to return capital on investment for billions.</p>

<p>And add the lack of accountability inherent in a machine learning model that is incomprehensible to humans.</p>

<p>The stakes are generally too high for all of us to have artificial intelligence <em>not be</em> developed in the open.</p>

<p><strong>The GPU Poors are our best hope.</strong></p>

<p><strong>Let’s build a better future together, in the open.</strong></p>

<!-- Begin MailChimp Signup Form -->
<link href="//cdn-images.mailchimp.com/embedcode/horizontal-slim-10_7.css" rel="stylesheet" type="text/css" />

<style type="text/css">
 #mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; width:100%;}
 /* Add your own MailChimp form style overrides in your site stylesheet or in this style block.
	  We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */
</style>

<div id="mc_embed_signup">
    <form action="https://buddhamindapp.us6.list-manage.com/subscribe/post?u=3feb377469b9e8fab8d52bd3f&amp;id=fb4cd887a4" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank" novalidate="">
        <div id="mc_embed_signup_scroll">
	          <label for="mce-EMAIL">Enter Your Email to Receive More Posts Like This</label>
	          <input type="email" value="" name="EMAIL" class="email" id="mce-EMAIL" placeholder="email address" required="" />
            <!-- real people should not fill this in and expect good things - do not remove this or risk form bot signups-->
            <div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_3feb377469b9e8fab8d52bd3f_fb4cd887a4" tabindex="-1" value="" /></div>
            <div class="clear"><input type="submit" value="Subscribe" name="subscribe" id="mc-embedded-subscribe" class="button" /></div>
        </div>
    </form>
</div>

<!--End mc_embed_signup-->]]></content><author><name>Kirk Kaiser</name></author><summary type="html"><![CDATA[Will the GPU Poors Be Allowed to Participate in Artificial Intelligence?]]></summary></entry><entry><title type="html">Building a remote controlled skateboard ramp</title><link href="/blog/building-a-remote-controlled-skate-ramp/" rel="alternate" type="text/html" title="Building a remote controlled skateboard ramp" /><published>2023-03-25T00:00:00+00:00</published><updated>2023-03-25T00:00:00+00:00</updated><id>/blog/building-a-remote-controlled-skate-ramp</id><content type="html" xml:base="/blog/building-a-remote-controlled-skate-ramp/"><![CDATA[<h1 id="creating-a-platform-for-a-self-driving-skatepark-with-python">Creating a Platform for a Self-Driving Skatepark with Python</h1>

<p><div style="width:100%;height:0;padding-bottom:56%;position:relative;"><iframe src="https://giphy.com/embed/iAjLGkSFBW6B4ErCCL" width="100%" height="100%" style="position:absolute" frameborder="0" class="giphy-embed" allowfullscreen=""></iframe></div></p>

<p>Ever since I first played <a href="https://en.wikipedia.org/wiki/Paperboy_(video_game)">Paperboy</a> as a kid, I’ve wanted a way to transform the streets of suburbia into something more exciting. My current neighborhood has a sidewalk along the entire street, and I often take my dog on runs, where he pulls me on my skateboard.</p>

<p>When we ride together, I’ve often wished there were ramps sprinkled at houses along the way, like in Paperboy. On one of these runs, the thought occurred to me that I could possibly build a skate ramp on a motorized platform to ride with us. This idea grew, and eventually I came up with the goal to make an entire self-driving skatepark that could journey with you to your destination, ramps taking turns stopping and getting back in front of you along the way.</p>

<p>In today’s blog post we’ll step through the process so far, from idea to a few prototypes, and how I’ve built a platform for remote controlled skate ramps. At the end there will be a Github repository if you want to help out, or try building your own.</p>

<p>With that, let’s get started!</p>

<h2 id="architecture-of-a-self-driving-skate-ramp">Architecture of a Self-Driving Skate Ramp</h2>

<figure>
<img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/7f14e69c-4180-4032-c802-f48c3f666600/webscale" />
</figure>

<p>When I first got the idea for a self-driving skate ramp, I immediately thought of a golf cart being repurposed into a ramp on wheels. There was something very appealing about the absurdity of a gigantic skate ramp driving down the street.</p>

<p>But as I looked for <a href="https://neilnie.com/self-driving-golf-cart/">prior art</a>, I quickly realized that the <a href="https://neilnie.com/2018/12/15/electronically-control-golf-cart-steering-using-a-linear-actuator-part-1/">linear actuator</a> and mechanics of the steering system seemed like a bit too much of an investment to get to a prototype to test an idea. So instead, I focused on a mix between an electric skateboard platform and the NVIDIA <a href="https://github.com/NVIDIA-AI-IOT/jetracer">JetRacer</a>.</p>

<p>The NVIDIA JetRacer is a <a href="https://amzn.to/3nsJhcV">Jetson Nano</a> powered self-driving RC car. It runs on ROS, and follows a basic track around a loop, using computer vision. It seemed promising as prior art.</p>

<p>Really, the JetRacer was a backup to prove that if I could get the hardware down, the software <em>should</em> be implementable, given the performance of JetRacers. (It also helped that I have two Jetson Nanos laying around, with the shortage of chips everywhere.)</p>

<p>With that decision, it was time to figure out how to get started. So I went on eBay and bought a broken electric skateboard to begin my journey.</p>

<h2 id="controlling-electric-skateboard-motors-in-software">Controlling Electric Skateboard Motors in Software</h2>

<p><div style="width:100%;height:0;padding-bottom:133%;position:relative;"><iframe src="https://giphy.com/embed/CZPJzxGgKlvM9IzYh2" width="100%" height="100%" style="position:absolute" frameborder="0" class="giphy-embed" allowfullscreen=""></iframe></div></p>

<p>I spent some time last year building FPV drones, but when I began this project didn’t have any familiarity with electric skateboards. Luckily, it seems there’s an open source movement for their <a href="https://vesc-project.com/">motor control systems</a>, similar to the one found on <a href="https://betaflight.com/">hobbyist drones</a>.</p>

<p>Electric skateboards mostly use <a href="https://en.wikipedia.org/wiki/Brushless_DC_electric_motor">BDLC motors</a> to provide an enormous amount of power in combination with LiPo batteries. The trio of motor controller software, motor, and LiPo batteries is largely what powers the eBike, eScooter, eSkate, and drone markets. So the architecture across platforms is awfully similar.</p>

<p>Getting the motors controlled via USB in Python was straightforward enough. I hooked up my benchtop power supply, strapped down my skateboard trucks, and was off to the races with the pyVESC library:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">pyvesc</span>

<span class="n">serial_port</span> <span class="o">=</span> <span class="s">'/dev/ttyACM1'</span>
<span class="n">front_motor</span> <span class="o">=</span> <span class="n">pyvesc</span><span class="p">.</span><span class="nc">VESC</span><span class="p">(</span><span class="n">serial_port</span><span class="o">=</span><span class="n">serial_port</span><span class="p">)</span>

<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
  <span class="n">front_motor</span><span class="p">.</span><span class="nf">set_rpm</span><span class="p">(</span><span class="mi">3000</span><span class="p">)</span>
  <span class="nf">sleep</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
  <span class="n">front_motor</span><span class="p">.</span><span class="nf">set_rpm</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</code></pre></div></div>

<p>My original plan for steering the platform was to use two different motors, mounted on opposite sides of my skateboard. Depending on which direction I needed to turn, I’d spin them at different speeds.</p>

<p>This really didn’t work well in practice at all. When I built a first version, I had problems with the inconsistency of grip with my wheels, couldn’t use it to reliably steer at all. Instead, my ramp kind of hopped and spun out in either direction. That wouldn’t work for driving in and out of sidewalks.</p>

<p><div style="width:100%;height:0;padding-bottom:133%;position:relative;"><iframe src="https://giphy.com/embed/5COCCG4SoHOvNWLBNn" width="100%" height="100%" style="position:absolute" frameborder="0" class="giphy-embed" allowfullscreen=""></iframe></div></p>

<p>Another problem I had was with the synchronization of my motors. When I first tried out my platform, I hooked both motors up via USB. This led to a bit of a delay in activation for the motors, and a slight difference in speeds. To fix this, I wired the controllers in a master / slave configuration with CANBUS.</p>

<p>This proved to be a bit tricky to do over USB, as it wasn’t very well documented, and was completely unsupported by PyVesc. So I ended up searching a bit, and eventually found the command to send a CANBUS slave message over USB (and control the second controller via a first controller):</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">SetRPM</span><span class="p">(</span><span class="n">metaclass</span><span class="o">=</span><span class="n">pyvesc</span><span class="p">.</span><span class="n">VESCMessage</span><span class="p">):</span>
    <span class="s">"""
    Sets the RPM on a CANBUS connected device. 
    Messages have to be sent repeatedly, as CANBUS has a timeout that's configurable
    within the VESC application. In my case, I set it to 5 seconds, which means the
    motor only works for 5 seconds after sending this signal.
    """</span>
    <span class="nb">id</span> <span class="o">=</span> <span class="mi">34</span>
    <span class="n">fields</span> <span class="o">=</span> <span class="p">[</span>
        <span class="p">(</span><span class="s">'motor_id'</span><span class="p">,</span> <span class="s">'B'</span><span class="p">),</span> <span class="c1"># my slave is set to 1
</span>        <span class="p">(</span><span class="s">'command'</span><span class="p">,</span> <span class="s">'B'</span><span class="p">),</span> <span class="c1"># 8, thanks to this page: https://www.vesc-project.com/node/774
</span>        <span class="p">(</span><span class="s">'rpm'</span><span class="p">,</span> <span class="s">'i'</span><span class="p">)</span> <span class="c1"># because we're assuming RPM setting
</span>    <span class="p">]</span>
</code></pre></div></div>

<p>Because of the requirements for rider safety, you can’t just send a single command via CANBUS on the VESC platform. Instead, you need to continuously send the message.</p>

<p>In order to do this, I had to make some updates to the way my Python program ran. I added a thread to send a heartbeat of the current message:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">send_target_message</span><span class="p">():</span>
    <span class="n">myData</span> <span class="o">=</span> <span class="n">threading</span><span class="p">.</span><span class="nf">local</span><span class="p">()</span>
    <span class="n">myData</span><span class="p">.</span><span class="n">drive_speed</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
        <span class="k">try</span><span class="p">:</span>
            <span class="n">val</span> <span class="o">=</span> <span class="n">queue</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">block</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
            <span class="n">myData</span><span class="p">.</span><span class="n">drive_speed</span> <span class="o">=</span> <span class="n">val</span>
        <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
            <span class="n">time</span><span class="p">.</span><span class="nf">sleep</span><span class="p">(.</span><span class="mi">001</span><span class="p">)</span>
        <span class="n">front_motor</span><span class="p">.</span><span class="nf">write</span><span class="p">(</span><span class="nf">encode</span><span class="p">(</span><span class="nc">SetRPM</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="n">myData</span><span class="p">.</span><span class="n">drive_speed</span><span class="p">)))</span>
        <span class="n">time</span><span class="p">.</span><span class="nf">sleep</span><span class="p">(.</span><span class="mi">001</span><span class="p">)</span>
        <span class="n">front_motor</span><span class="p">.</span><span class="nf">set_rpm</span><span class="p">(</span><span class="n">myData</span><span class="p">.</span><span class="n">drive_speed</span><span class="p">)</span>

<span class="n">controller</span> <span class="o">=</span> <span class="nc">RampController</span><span class="p">(</span><span class="n">interface</span><span class="o">=</span><span class="s">"/dev/input/js0"</span><span class="p">,</span> <span class="n">connecting_using_ds4drv</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">threading</span><span class="p">.</span><span class="nc">Thread</span><span class="p">(</span><span class="n">target</span><span class="o">=</span><span class="n">send_target_message</span><span class="p">)</span>
<span class="n">t</span><span class="p">.</span><span class="nf">start</span><span class="p">()</span>
<span class="n">controller</span><span class="p">.</span><span class="nf">listen</span><span class="p">()</span>
</code></pre></div></div>

<p>With this, I’m able to send messages to both controllers via a single USB connection.</p>

<h2 id="learning-from-the-first-iteration-prototype">Learning from the First Iteration Prototype</h2>

<p><div style="width:100%;height:0;padding-bottom:75%;position:relative;"><iframe src="https://giphy.com/embed/7v3YBsdEbT7dUTdeih" width="100%" height="100%" style="position:absolute" frameborder="0" class="giphy-embed" allowfullscreen=""></iframe></div></p>

<p>Despite the obvious shortcomings, I took my ramp for a spin. In practice, it ended up being <em>much</em> scarier than I’d anticipated.</p>

<p>Normal skate ramps don’t move at all. In this case, when I hit the ramp, it moved on two axis. One towards me, and the other side to side. Between this and the lack of steering, it meant I didn’t really have a ramp capable of steering itself. At best, I’d have a ramp I could use to pull me to a place, and try to steer by leaning on.</p>

<p><div style="width:100%;height:0;padding-bottom:56%;position:relative;"><iframe src="https://giphy.com/embed/3aBhEpgQiCpVUieajS" width="100%" height="100%" style="position:absolute" frameborder="0" class="giphy-embed" allowfullscreen=""></iframe></div></p>

<p>My next version had to have some sort of way to put the ramp on the ground completely for when I skated it, and also be reasonably steerable. Both of these led me to my first metalworking projects.</p>

<h2 id="bringing-steering-and-ramp-lifting--lowering-to-the-platform">Bringing Steering and Ramp Lifting / Lowering to the Platform</h2>

<figure>
<img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/da6aa95a-9715-4ce8-195e-21acb9250c00/webscale" />
</figure>

<p>Given the failures, I didn’t want to give up the ecosystem of parts available with electric skateboards with the second prototype. So I settled on trying to attach a <a href="https://amzn.to/3KaFSbS">linear actuator</a> to the trucks on the skateboard for steering, using a <a href="https://amzn.to/3Kcz5OF">motor mount</a>.</p>

<p>This worked surprisingly well, but built up some serious forces on my screws, and ripped them right out. So I realized I needed to make some bolts go through the plywood base, to keep the linear actuator from ripping the mounts right off.</p>

<p>This led me to the next problem, how to raise and lower the ramp itself. It seemed like overkill to get another linear actuator, and try to fabricate another mount. Instead, I went looking, and found an interesting <a href="https://amzn.to/42Em085">electric jack</a> on Amazon.</p>

<p>I ordered the jack, and decided to fabricate a metal mount between the skateboard and the ramp.</p>

<figure>
<img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/f5e4eed5-d003-4d3b-cc77-86ef07c5bf00/webscale" />
</figure>

<p>This was my first welding attempt ever, and also my first time using a metal brake. Both mostly went off without a hitch for the first round. YouTube helped with learning MIG welding, along with a low-cost <a href="https://www.harborfreight.com/welding/welders/mig-flux-welders/mig-170-professional-welder-with-120240v-input-57864.html">Harbor Freight flux-core welder</a>. (I’d always wanted to learn welding, but assumed it would be thousands of dollars to get started.)</p>

<p><div style="width:100%;height:0;padding-bottom:178%;position:relative;"><iframe src="https://giphy.com/embed/3XKuzeoLTK3y9P89Tu" width="100%" height="100%" style="position:absolute" frameborder="0" class="giphy-embed" allowfullscreen=""></iframe></div></p>

<p>Once the parts were fabricated, the only thing was left to build the software to control a linear actuator and an electric jack.</p>

<p>Luckily, both of these are basically the same, simple circuit. You put electricity in one way for up, down for the other. Two relays wired correctly are enough to control the linear actuator going out or in, and the jack going up or down. We can control the relays with a Teensy microcontroller and setting two pins either <code class="language-plaintext highlighter-rouge">HIGH</code> or <code class="language-plaintext highlighter-rouge">LOW</code> for each actuator:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;Packet.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;PacketCRC.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;SerialTransfer.h&gt;</span><span class="cp">
</span>
<span class="n">SerialTransfer</span> <span class="n">steerTransfer</span><span class="p">;</span>

<span class="k">const</span> <span class="kt">long</span> <span class="n">interval</span> <span class="o">=</span> <span class="mi">1000</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">previousMillis</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">ledState</span> <span class="o">=</span> <span class="n">LOW</span><span class="p">;</span>

<span class="k">struct</span> <span class="nc">__attribute__</span><span class="p">((</span><span class="n">packed</span><span class="p">))</span> <span class="n">STRUCT</span> <span class="p">{</span>
  <span class="kt">char</span> <span class="n">d</span><span class="p">;</span>
  <span class="kt">int</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span> <span class="n">steerStruct</span><span class="p">;</span>

<span class="kt">void</span> <span class="nf">setup</span><span class="p">()</span> <span class="p">{</span>
  <span class="n">Serial</span><span class="p">.</span><span class="n">begin</span><span class="p">(</span><span class="mi">115200</span><span class="p">);</span>
  <span class="n">steerTransfer</span><span class="p">.</span><span class="n">begin</span><span class="p">(</span><span class="n">Serial</span><span class="p">);</span>
  <span class="c1">// left and right</span>
  <span class="n">pinMode</span><span class="p">(</span><span class="n">A9</span><span class="p">,</span> <span class="n">OUTPUT</span><span class="p">);</span>
  <span class="n">pinMode</span><span class="p">(</span><span class="n">A8</span><span class="p">,</span> <span class="n">OUTPUT</span><span class="p">);</span>
  <span class="c1">// up and down</span>
  <span class="n">pinMode</span><span class="p">(</span><span class="n">A7</span><span class="p">,</span> <span class="n">OUTPUT</span><span class="p">);</span>
  <span class="n">pinMode</span><span class="p">(</span><span class="n">A6</span><span class="p">,</span> <span class="n">OUTPUT</span><span class="p">);</span>
  <span class="n">pinMode</span><span class="p">(</span><span class="mi">13</span><span class="p">,</span> <span class="n">OUTPUT</span><span class="p">);</span>
  <span class="n">digitalWrite</span><span class="p">(</span><span class="n">A9</span><span class="p">,</span> <span class="n">LOW</span><span class="p">);</span>
  <span class="n">digitalWrite</span><span class="p">(</span><span class="n">A8</span><span class="p">,</span> <span class="n">LOW</span><span class="p">);</span>
  <span class="n">digitalWrite</span><span class="p">(</span><span class="n">A7</span><span class="p">,</span> <span class="n">LOW</span><span class="p">);</span>
  <span class="n">digitalWrite</span><span class="p">(</span><span class="n">A6</span><span class="p">,</span> <span class="n">LOW</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">loop</span><span class="p">()</span> <span class="p">{</span>
 <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">currentMillis</span> <span class="o">=</span> <span class="n">millis</span><span class="p">();</span>

 <span class="k">if</span> <span class="p">(</span><span class="n">currentMillis</span> <span class="o">-</span> <span class="n">previousMillis</span> <span class="o">&gt;=</span> <span class="n">interval</span><span class="p">)</span> <span class="p">{</span>
  <span class="n">previousMillis</span> <span class="o">=</span> <span class="n">currentMillis</span><span class="p">;</span> 
  <span class="c1">// blink if teensy has power</span>
  <span class="k">if</span> <span class="p">(</span><span class="n">ledState</span> <span class="o">==</span> <span class="n">LOW</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">ledState</span> <span class="o">=</span> <span class="n">HIGH</span><span class="p">;</span>
  <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
    <span class="n">ledState</span> <span class="o">=</span> <span class="n">LOW</span><span class="p">;</span>
  <span class="p">}</span>
  <span class="n">digitalWrite</span><span class="p">(</span><span class="mi">13</span><span class="p">,</span> <span class="n">ledState</span><span class="p">);</span>
 <span class="p">}</span>
 
 <span class="k">if</span><span class="p">(</span><span class="n">steerTransfer</span><span class="p">.</span><span class="n">available</span><span class="p">())</span> <span class="p">{</span>
  <span class="kt">uint16_t</span> <span class="n">recSize</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
  <span class="n">recSize</span> <span class="o">=</span> <span class="n">steerTransfer</span><span class="p">.</span><span class="n">rxObj</span><span class="p">(</span><span class="n">steerStruct</span><span class="p">,</span> <span class="n">recSize</span><span class="p">);</span>
  <span class="k">if</span> <span class="p">(</span><span class="n">steerStruct</span><span class="p">.</span><span class="n">d</span> <span class="o">==</span> <span class="sc">'R'</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">steerStruct</span><span class="p">.</span><span class="n">x</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
      <span class="n">digitalWrite</span><span class="p">(</span><span class="n">A9</span><span class="p">,</span> <span class="n">HIGH</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="k">else</span> <span class="p">{</span>
      <span class="n">digitalWrite</span><span class="p">(</span><span class="n">A9</span><span class="p">,</span> <span class="n">LOW</span><span class="p">);</span>
      <span class="n">digitalWrite</span><span class="p">(</span><span class="n">A8</span><span class="p">,</span> <span class="n">LOW</span><span class="p">);</span>
    <span class="p">}</span>
  <span class="p">}</span>
  <span class="k">else</span> <span class="nf">if</span> <span class="p">(</span><span class="n">steerStruct</span><span class="p">.</span><span class="n">d</span> <span class="o">==</span> <span class="sc">'L'</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">steerStruct</span><span class="p">.</span><span class="n">x</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
      <span class="n">digitalWrite</span><span class="p">(</span><span class="n">A8</span><span class="p">,</span> <span class="n">HIGH</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="k">else</span> <span class="p">{</span>
      <span class="n">digitalWrite</span><span class="p">(</span><span class="n">A9</span><span class="p">,</span> <span class="n">LOW</span><span class="p">);</span>
      <span class="n">digitalWrite</span><span class="p">(</span><span class="n">A8</span><span class="p">,</span> <span class="n">LOW</span><span class="p">);</span>
    <span class="p">}</span>
  <span class="p">}</span>
    <span class="k">else</span> <span class="nf">if</span> <span class="p">(</span><span class="n">steerStruct</span><span class="p">.</span><span class="n">d</span> <span class="o">==</span> <span class="sc">'U'</span><span class="p">)</span> <span class="p">{</span>
      <span class="k">if</span> <span class="p">(</span><span class="n">steerStruct</span><span class="p">.</span><span class="n">x</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">digitalWrite</span><span class="p">(</span><span class="n">A6</span><span class="p">,</span> <span class="n">HIGH</span><span class="p">);</span>
      <span class="p">}</span>
      <span class="k">else</span> <span class="p">{</span>
        <span class="n">digitalWrite</span><span class="p">(</span><span class="n">A7</span><span class="p">,</span> <span class="n">LOW</span><span class="p">);</span>
        <span class="n">digitalWrite</span><span class="p">(</span><span class="n">A6</span><span class="p">,</span> <span class="n">LOW</span><span class="p">);</span>
      <span class="p">}</span>
    <span class="p">}</span>
    <span class="k">else</span> <span class="nf">if</span> <span class="p">(</span><span class="n">steerStruct</span><span class="p">.</span><span class="n">d</span> <span class="o">==</span> <span class="sc">'D'</span><span class="p">)</span> <span class="p">{</span>
      <span class="k">if</span> <span class="p">(</span><span class="n">steerStruct</span><span class="p">.</span><span class="n">x</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">digitalWrite</span><span class="p">(</span><span class="n">A7</span><span class="p">,</span> <span class="n">HIGH</span><span class="p">);</span>
      <span class="p">}</span>
      <span class="k">else</span> <span class="p">{</span>
        <span class="n">digitalWrite</span><span class="p">(</span><span class="n">A7</span><span class="p">,</span> <span class="n">LOW</span><span class="p">);</span>
        <span class="n">digitalWrite</span><span class="p">(</span><span class="n">A6</span><span class="p">,</span> <span class="n">LOW</span><span class="p">);</span>
      <span class="p">}</span>
    <span class="p">}</span>
 <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>If you read the code, you’ll notice I have commands that I’m sending to Arduino from Python, <code class="language-plaintext highlighter-rouge">U</code> or <code class="language-plaintext highlighter-rouge">D</code> for up or down, and <code class="language-plaintext highlighter-rouge">L</code> or <code class="language-plaintext highlighter-rouge">R</code> for left and right. To send these custom commands from Python, I used one of my new favorite libraries, <a href="https://github.com/PowerBroker2/SerialTransfer">SerialTransfer</a>.</p>

<p>It allows 
you to specify a custom command to send, and it just works. Sending my message from Python is literally this easy:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">pySerialTransfer</span> <span class="kn">import</span> <span class="n">pySerialTransfer</span>

<span class="n">link</span> <span class="o">=</span> <span class="n">pySerialTransfer</span><span class="p">.</span><span class="nc">SerialTransfer</span><span class="p">(</span><span class="s">'/dev/ttyACM0'</span><span class="p">)</span>
<span class="n">link</span><span class="p">.</span><span class="nf">open</span><span class="p">()</span>

<span class="c1"># define our data structure (direction and on or off)
</span><span class="k">class</span> <span class="nc">dataStruct</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
    <span class="n">d</span> <span class="o">=</span> <span class="s">'L'</span>
    <span class="n">x</span> <span class="o">=</span> <span class="mi">1</span>

<span class="c1"># send the up command on 
</span><span class="n">testStruct</span> <span class="o">=</span> <span class="n">dataStruct</span>
<span class="n">testStruct</span><span class="p">.</span><span class="n">d</span> <span class="o">=</span> <span class="s">'U'</span>
<span class="n">testStruct</span><span class="p">.</span><span class="n">x</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">sendSize</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">sendSize</span> <span class="o">=</span> <span class="n">link</span><span class="p">.</span><span class="nf">tx_obj</span><span class="p">(</span><span class="n">testStruct</span><span class="p">.</span><span class="n">d</span><span class="p">,</span> <span class="n">start_pos</span><span class="o">=</span><span class="n">sendSize</span><span class="p">)</span>
<span class="n">sendSize</span> <span class="o">=</span> <span class="n">link</span><span class="p">.</span><span class="nf">tx_obj</span><span class="p">(</span><span class="n">testStruct</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">start_pos</span><span class="o">=</span><span class="n">sendSize</span><span class="p">)</span>
<span class="n">link</span><span class="p">.</span><span class="nf">send</span><span class="p">(</span><span class="n">sendSize</span><span class="p">)</span>
</code></pre></div></div>

<p>On the bench, this setup works great. The ramp platform is super responsive, and has about as great of performance as I could ask for. But in practice, once I throw a ramp on top of the platform, there’s still so much to do.</p>

<h2 id="debugging-a-remote-control-ramp">Debugging a Remote Control Ramp</h2>

<figure class="full">
<img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/92b67013-8d47-46b6-2bca-30b73e165c00/webscale" />
</figure>

<p>Once a ramp is placed on the robot, the platform itself becomes largely inaccessible. This means debugging can be especially tricky, given the wireless setup. Sometimes there seems to be a delay in bluetooth controlling the robot itself. Again, of right now, the platform is controlled with a PS4 controller via a Bluetooth connection to the Jetson Nano.</p>

<p>This ended up being a challenge on my maiden voyage with the new platform. As I drove the ramp into place, I pressed the button on the PS4 controller to lower the ramp. Nothing happened, so I pressed it again. Still nothing. Curious, I started walking to the ramp, only to have it start getting lowered, and then continue lowering itself until the jack ripped the threadscrew out of itself entirely and broke.</p>

<figure class="full">
<img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/ef40fed2-c347-44c0-f35e-12cc48bf3600/webscale" />
</figure>

<p>Disaster! This meant I had to completely refabricate the jack and mount, which really is the most work of any component on the ramp. This time, I added limited switches to my Arduino code. (But if you look close, they still don’t work!)</p>

<p>Choosing a PS4 controller means I have a limited range for communication with the platform, especially when the ramp is on it. One of my next moves will be to put a real radio controller on it, along with a proper RC controller.</p>

<p>Initially, I had my Jetson connect via Wifi to my home network. I’ve since added <a href="https://tailscale.com/">Tailscale</a> to the Jetson Nano, and made it a permanent machine. This allows me to remotely access it from any network. I then added a long lived ephemeral key to <a href="https://www.gitpod.io/">Gitpod</a> as a secret key. With this, I’m able to remotely log in and debug from a Wifi hotspot on my iPhone. This means I can remotely debug, and run longer distances with my ramp, away from the home as long as there’s cellular reception. (And my cell phone is around.)</p>

<figure class="full">
<img src="https://imagedelivery.net/_KKQ2p8Uk9OruvuF07KWqw/01dc0867-e08b-4893-ab55-8f37a46cb600/webscale" />
</figure>

<h2 id="a-second-successful-voyage">A Second, Successful Voyage</h2>

<p><div style="width:100%;height:0;padding-bottom:56%;position:relative;"><iframe src="https://giphy.com/embed/SERoiwqA2GJxOLIRgp" width="100%" height="100%" style="position:absolute" frameborder="0" class="giphy-embed" allowfullscreen=""></iframe></div></p>

<p>Finally, I rebuilt my ramp jack, and wired up limit switches to prevent disaster. And, last Friday I was finally able to raise, drive, lower, and skate my ramp! There’s still a lot to be done before we get to self-driving, but it hey, let’s celebrate this milestone.</p>

<p>As always, the repository for the project is on <a href="https://github.com/burningion/self-driving-skate-ramp">Github</a>, and has some more technical details. You should be able to mostly run with the project, as the code itself is pretty straightforward for now, and launchable within <a href="https://gitpod.io/#https://github.com/burningion/self-driving-skate-ramp">Gitpod</a>. (You obviously won’t be able to connect to my Tailscale network.)</p>

<p>Just in case, the steps to get it running are: connect power for the Jetson and linear actuators, connect the PS4 controller via Bluetooth, connect the 10s battery power to the ESCs, and then run the <code class="language-plaintext highlighter-rouge">loop_with_canbus.py</code> Python script. Then the pressing <code class="language-plaintext highlighter-rouge">R1</code> on the PS4 disables the safety switch, and allows you to steer / control the platform.</p>

<p>If you’d like to be notified of the next blog post, enter your info below:</p>

<!-- Begin MailChimp Signup Form -->
<link href="//cdn-images.mailchimp.com/embedcode/horizontal-slim-10_7.css" rel="stylesheet" type="text/css" />

<style type="text/css">
 #mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; width:100%;}
 /* Add your own MailChimp form style overrides in your site stylesheet or in this style block.
	  We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */
</style>

<div id="mc_embed_signup">
    <form action="https://buddhamindapp.us6.list-manage.com/subscribe/post?u=3feb377469b9e8fab8d52bd3f&amp;id=fb4cd887a4" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank" novalidate="">
        <div id="mc_embed_signup_scroll">
	          <label for="mce-EMAIL">Enter Your Email to Receive More Posts Like This</label>
	          <input type="email" value="" name="EMAIL" class="email" id="mce-EMAIL" placeholder="email address" required="" />
            <!-- real people should not fill this in and expect good things - do not remove this or risk form bot signups-->
            <div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_3feb377469b9e8fab8d52bd3f_fb4cd887a4" tabindex="-1" value="" /></div>
            <div class="clear"><input type="submit" value="Subscribe" name="subscribe" id="mc-embedded-subscribe" class="button" /></div>
        </div>
    </form>
</div>

<!--End mc_embed_signup-->]]></content><author><name>Kirk Kaiser</name></author><summary type="html"><![CDATA[Creating a platform for a self-driving skatepark]]></summary></entry><entry><title type="html">Tools Matter More than You Think</title><link href="/blog/tools-matter-more-than-you-think/" rel="alternate" type="text/html" title="Tools Matter More than You Think" /><published>2022-09-10T00:00:00+00:00</published><updated>2022-09-10T00:00:00+00:00</updated><id>/blog/tools-matter-more-than-you-think</id><content type="html" xml:base="/blog/tools-matter-more-than-you-think/"><![CDATA[<h2 id="tools-dont-matter">Tools Don’t Matter</h2>

<p><strong>Guitarists have this idea: “tone” is in the fingertips.</strong></p>

<p>The argument is, the distinctive sound of your favorite musician isn’t because of some expensive guitar, fancy pedals, or boutique amplifier.</p>

<p>Instead, their talent comes from years of practice, and no amount of expensive equipment is a shortcut to sounding great. So for new practitioners, it’s best to focus on your competency and technique, and forget about the tools.</p>

<p>Regardless of the tools you use, you’ll become great with practice, and that greatness will translate regardless of tools with mastery.</p>

<h2 id="but-tools-can-improve-the-quality-of-practice">But Tools Can Improve the Quality of Practice</h2>

<p>Believing this to be the case, I spent the first 7 years of playing electric guitar with a decent, but not great amplifier.</p>

<p>Why invest in an expensive amplifier and guitar when your skills don’t justify it?</p>

<p>But then the pandemic hit, and I got bored. Many of the “reasonable” guitars I wanted were sold out, and weren’t expected to be in stock any time soon. So I started looking at more expensive guitars and amplifiers, as they were the only ones left.</p>

<p>Out of desperation I bought a substantially more expensive setup than I ever dreamed of– an <a href="https://www.sweetwater.com/store/detail/EIIHORQMFRRDB--esp-e-ii-horizon-fr-reindeer-blue">ESP E-II Horizon FR</a> and a <a href="https://www.sweetwater.com/store/detail/Mark535C--mesa-boogie-mark-five35-35-25-10-watt-1x12-inch-tube-combo-amp-black-taurus">Mesa Boogie Mark V:35</a>. When I plugged in and started playing, my mind was blown.</p>

<p>Why didn’t anyone tell me the difference was so big?!</p>

<p>I could now sound just like the musicians I grew up with, and my practice routine went from being a grind to a pleasurable experience. Thanks to my improved tools, I also started picking up on subtle sloppiness in my playing.</p>

<p>I literally bought myself a better practice routine, and a substantially better sound. Tone may be in the fingertips, but it helps to be able to hear what’s happening in those fingertips!</p>

<h2 id="the-best-says-he-cant-work-without-his-tools">The Best Says He Can’t Work without His Tools</h2>

<p>Given this personal discovery, I was again surprised while watching Tom Morello’s Masterclass on guitar playing. For those of you who aren’t familiar, Tom is widely considered to be one of the greatest guitar players of all time.</p>

<p>In his Masterclass session, Tom recounts a time when he and the rest of Rage Against the Machine went to South America, separated from their specific instruments, pedals, and amplifiers. They went to practice on rented instruments, and by his account, they sounded terrible.</p>

<p>If you’re one of the best guitarists in the world, and tools don’t matter, how could a guitar and amplifier make you sound terrible?</p>

<h2 id="choosing-tools-as-creative-constraints">Choosing Tools as Creative Constraints</h2>

<p>The answer in Tom’s case is that he has deliberately chosen his tools as his constraints. Tom’s sounds are unique to his tools, built through their limitations. <strong>Those constraints are what have determined his creative boundaries and possibilities.</strong></p>

<p>Tom made a bold choice early on, deciding not to fiddle with settings on his amplifier, chasing the “perfect” tone.</p>

<p>Instead, he’s got a few specific guitar pedals, and importantly, a killswitch (which lets him mute with a press) built into his guitars that he’s decided as the constraints of what will shape his sound.</p>

<p>Early on, he decided to stop bouncing back and forth between tweaking  and optimizing tools, and instead focused on what was possible using a deliberate, static set of tools. He dialed in a specific setting for his amplifier, and left it at that.</p>

<p>By minimizing his toolset, he <strong>determined the boundaries he’d explore within his creative world.</strong></p>

<h2 id="tools-shape-creative-possibilities">Tools Shape Creative Possibilities</h2>

<p>I’ve since become increasingly convinced that a few deliberately selected tools as constraints, pushed to mastery, is what pushes people into creative breakthroughs and outlier performance.</p>

<p>And again, developers specifically have wildly different, deeply held beliefs about all of their tools.</p>

<p>These beliefs about tools are often in conflict with other, competent developers. A given tool is seen as either terrible or great, depending on the person.</p>

<p>With Python, I feel as though <strong>I’m supposed to explore my way to a solution</strong>. Rather than understand an entire problem space on a first go. I feel as though I can stumble along, and discover what my data structures should be as I write.</p>

<p>Python allows me to create a rough sketch for what my program should do, and iterate my way to an understanding of the solution.</p>

<p>But importantly, very little of the code I’m writing nowadays lives in any kind of production systems with tight requirements for performance or collaboration.</p>

<p>Someone who is working in a different problem space, with different constraints might find my tools quite terrible, or even impossible for the job.</p>

<h2 id="tool-switching-has-a-deceptively-high-cost-and-were-doing-it-all-the-time">Tool Switching Has a Deceptively High Cost (and we’re doing it all the time)</h2>

<p>What has only recently become apparent to me is how often we developers are expected to substantially change our tools and processes. Tom Morello’s thousands of hours with his limited tool set empowered him to create his own sound of excellence for his chosen tools.</p>

<p>It seems every new engineering role comes with a new build process, release methods, and methods of working.</p>

<p>And those jobs we’re usually working within legacy systems, attempting to navigate our way around an old set of constraints, while introducing new constraints closer to the ways we think we’ll be most effective.</p>

<p>The way people are gated for these roles is usually via a coding challenge, in an interactive notebook, with minimal context. (We’re expected to solve a problem with an algorithm we’ve got memorized on the spot.)</p>

<p>Of course, the distance to the role itself has been a topic of debate, and the transferable nature of the work doubly so.</p>

<p>But what of the tools of the role? What about that specific knowledge?</p>

<h2 id="the-musician-and-the-music">The Musician and the Music</h2>

<p>There is a difference between being able to play a piece of music, and being able to write one.</p>

<p>Some people achieve technical proficiency at playing music, without the context for writing their own. They master the restatement, but none of the creation.</p>

<p>What is the role we’re optimizing for? Is it obedience and adherence to the planning? Or is it something else?</p>

<p>Ideally, a person is capable of building the music to be played. Creating new music to be played, writing new things, is the role itself,. And the way you build up context for building new things is by building things.</p>

<p>And in an interview, we should be able to “jam” just like in a musical try out for a band. We’re seeing whether or not we can make music together.</p>

<h2 id="were-imprecise-when-we-talk-about-and-evaluate-tools">We’re Imprecise When We Talk About and Evaluate Tools</h2>

<p>In my current role, I often get the chance to see people experiment with new tech stacks.</p>

<p>And without fail, the people trying out new tech stacks underestimate the secondary knowledge costs associated with changing their stack. Beyond the programming language changes, the secondary knowledge of How the new language’s package manager works, which libraries are the best, what types of build and release process are most effective, and how to debug, all show up and end up destroying productivity.</p>

<p>And when we start paying attention to tools as constraints, it becomes apparent that each software development role we get into truly requires discovering its own@abut  set of tools and approaches.</p>

<p><strong>Your tools should narrow the fences of possibility</strong>, so you can focus on the immediate problem at hand.</p>

<p>Hypothetically, individual tools are fungible, and with enough talent and time, all of them are interchangeable. You could write great software in Python and Javascript and Rust and everything else. You could spend the time to relearn the intricacies of how Python’s pip, and Node’s NPM, and Rust’s Cargo all behave, how their debuggers work best for you workflow, and everything else.</p>

<p>But to truly break out creatively, you need to close doors aggressively. You need to cut until there is only the right amount of context, and nothing else.</p>

<p>At the very least, be deliberate about the tools you decide to pursue excellence in.</p>

<!-- Begin MailChimp Signup Form -->
<link href="//cdn-images.mailchimp.com/embedcode/horizontal-slim-10_7.css" rel="stylesheet" type="text/css" />

<style type="text/css">
 #mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; width:100%;}
 /* Add your own MailChimp form style overrides in your site stylesheet or in this style block.
	  We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */
</style>

<div id="mc_embed_signup">
    <form action="https://buddhamindapp.us6.list-manage.com/subscribe/post?u=3feb377469b9e8fab8d52bd3f&amp;id=fb4cd887a4" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank" novalidate="">
        <div id="mc_embed_signup_scroll">
	          <label for="mce-EMAIL">Enter Your Email to Receive More Posts Like This</label>
	          <input type="email" value="" name="EMAIL" class="email" id="mce-EMAIL" placeholder="email address" required="" />
            <!-- real people should not fill this in and expect good things - do not remove this or risk form bot signups-->
            <div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_3feb377469b9e8fab8d52bd3f_fb4cd887a4" tabindex="-1" value="" /></div>
            <div class="clear"><input type="submit" value="Subscribe" name="subscribe" id="mc-embedded-subscribe" class="button" /></div>
        </div>
    </form>
</div>

<!--End mc_embed_signup-->]]></content><author><name>Kirk Kaiser</name></author><summary type="html"><![CDATA[Tools determine what's possible]]></summary></entry><entry><title type="html">Is Engineering Management Bullshit?</title><link href="/blog/is-engineering-management-bullshit/" rel="alternate" type="text/html" title="Is Engineering Management Bullshit?" /><published>2022-05-30T00:00:00+00:00</published><updated>2022-05-30T00:00:00+00:00</updated><id>/blog/is-engineering-management-bullshit</id><content type="html" xml:base="/blog/is-engineering-management-bullshit/"><![CDATA[<h2 id="what-is-management-supposed-to-do-anyways">What is “Management” Supposed to Do Anyways?</h2>

<p>Modern management largely has its genesis in the work of <a href="https://en.wikipedia.org/wiki/Frederick_Winslow_Taylor">Frederick Winslow Taylor</a>, who created the idea of Scientific Management.</p>

<p>His main insight was that industrial labor wasn’t being nearly as rigorously designed as the work being done by manufacturing machines.</p>

<p>In his mind, all human labor should be deliberately measured, studied, and orchestrated by expert observers.</p>

<p>The goal of these observers should be to <strong>optimize the overall system of human labor for higher quality and volume output, with lower labor costs</strong>. These newly designed, higher output systems should then justify higher wages for fewer, vastly more effective laborers than non-designed labor. (He recommended shooting for 60% more pay for these deliberately orchestrated workers than the industry average).</p>

<p>His main tool for implementing these designs was accurate time study, watching and observing the most effective people as they went about their day. In his book <a href="https://books.google.com/books?id=Am4I-N4XN2QC">Shop Management</a>, he shows a case study giving a <strong>70% increase in wages to the workers</strong>, and by a partial re-design of their work, having each worker moving <strong>3.5x the regular worker’s tonnage</strong> of work through the factory.</p>

<p>So, if we look at modern management, the genesis of the work of the manager is the <em>design</em> of the work to be done. Management focused on the choreography of work, so work <em>could more effectively be accomplished</em>.</p>

<h2 id="the-problem-with-applying-scientific-management-to-software-we-dont-know-how-to-quantifiably-measure-output">The Problem with Applying Scientific Management to Software: We Don’t Know how to Quantifiably Measure Output</h2>

<p>But there is a major problem with this model in software development: we are not moving a measurable amount of coal from one part of the factory to another.</p>

<p>An experienced software developer anticipates the secondary effects of every change, and weighs tradeoffs between a multitude of approaches in solving problems. By continuously modeling software systems, the organization they work for, and customer behavior, the most effective developers readjust their approach to problems daily.</p>

<p>And that daily work tends to be unique and vague, with a large search space for solutions. This means it’s difficult to observe and optimize the system of software development itself. <a href="https://dl.acm.org/doi/pdf/10.1145/3454122.3454124">Despite</a> years of effort, and a market value for individual software companies of over trillions of dollars, we’ve yet to formalize a way of measuring how effective a developer is or isn’t. <strong>So we can’t measure our way to higher team performance</strong>.</p>

<p>But if we can’t use an adaptation of scientific management for improving the effectiveness of software creation, what can we use? How do we optimize the system of work for something <em>like</em> high wages and low costs, when we can’t even determine whether the work is good or not, done reasonably quickly, or not?</p>

<h2 id="its-sometimes-worse-than-not-being-able-to-measure-output">It’s (Sometimes) Worse than Not Being Able to Measure Output</h2>

<p>Of course, one of the other fundamental assumptions of scientific management is that minimizing costs and maximizing profits and wages is our goal. This assumes that we work in a business which needs profitability to survive, and that imperative must be held higher than any other value.</p>

<p>In practice, this isn’t the case for many venture funded technology companies.</p>

<p>Instead, depending on the maturity of the company, the main goal may be: capturing market share, gaining users, increasing revenue, or absurdly, building headcount. All of these approaches smell of unsustainability, but importantly, <em>can become the actual goal of management</em>. Absurdly, some of the people who make it into management at companies losing billions per year later position themselves as thought leaders for effective engineering management.</p>

<p>So if profitably sometimes doesn’t matter, and we can’t measure engineers, how does someone in management build a framework for effective engineering management? And what the hell is that management doing all day anyways?</p>

<h2 id="managers-should-gather-and-push-context-to-the-edges">Managers Should Gather and Push Context to the Edges</h2>

<p>Because measurement for effective creative work is such an obvious, big problem, <em>somebody</em> has thought about it before. In 1967, <a href="https://en.wikipedia.org/wiki/Peter_Drucker">Peter Drucker</a> created the idea of “knowledge work”, and more importantly, the “knowledge worker” in <a href="https://amzn.to/3ORjF1R">The Effective Executive</a>. In it, he redefined the role of the worker, and the supervisor’s relation to work.</p>

<p><strong>Even in 1967, humans had begun working on technologies where it just wasn’t possible for the people directing and managing the work to have enough context to orchestrate it’s entire implementation.</strong> Looking ahead, the complexity inherent in modern engineering projects was only going to grow.</p>

<p>The solution, he decided, is that knowledge workers had to become <em>their own</em> managers (in the book referred to as executives), <em>both</em> designing the work to be done, while thinking through the tradeoffs and minding the overall business context.</p>

<p>Critically, he defined what the goal of effective knowledge work <em>was</em> in this world devoid of absolutes:</p>

<blockquote>
  <p>Knowledge work is not defined by quantity. Neither is defined by its costs. It is defined by its results.</p>
</blockquote>

<p>Results. That’s an odd and fuzzy definition for an output for a manager to optimize for. But it seems closer to the reality both engineering managers and their subordinates find themselves in. Leadership expects management to deliver “results”, but the definition of “results” changes constantly.</p>

<p>It is then management’s job to ensure there is enough context for everyone on the team to be able to pursue the correct results. They do this by gathering, and then distributing context for the team. This ideally empowers the team to lead themselves, directing their efforts to the most effective outputs, given the current situation at hand. (For a good example of this in practice, check out <a href="https://amzn.to/3wMbygE">Turn the Ship Around</a>)</p>

<p>But how do managers identify the right “results”? How do they know what to build, given the multitude of options of things to build? And after things are built, how do we evaluate how effectively (or not!) those results were delivered?</p>

<h2 id="help-the-team-identify-and-pursue-objectives-and-key-results">Help the Team Identify and Pursue “Objectives and Key Results”</h2>

<p>Depending on where you work, many tech organizations are steered by quarterly objectives. These are known as OKRs, and are a method for communicating what quantifiable results you are going to chase after, and achieve as a group for the next three months. They’re also famously a part of <a href="https://amzn.to/3azxe7P">High Output Management</a>, often referred to as one of the best modern engineering management books, written by the ex-CEO of Intel.</p>

<p>OKRs are meant to be ambitious, and by default are meant to have a significant portion of them fail every quarter. They are a medium for continuously identifying new <em>results</em> for your business unit to achieve. Part of leadership’s job is to evaluate these and ensure the new results being pursued will be effective for the current strategy of the company. But really, the <em>process</em> of thinking through what would be the most effective result for your group is the point of the exercise.</p>

<p>At the end of the quarter, results are “graded”, and the manager (and team) are evaluated on how well they accomplished the results they set out to do. There is an unwritten, somewhat arbitrary rule that an effective team should be making <strong>70% of their objectives</strong>. (This hypothetically helps ensure the team is being ambitious enough in their goal setting.)</p>

<p>So we’ve seen the role of a software engineering manager is to build context, push context to the edges, and create objectives for the team. Finally, the team is kept in line by being held to account for what was promised in the agreed upon objectives.</p>

<p>But what about that failure rate for OKRs? In practice, since the graded OKRs are measurable (you hit a certain percentage of your goals), we’ve <strong>transformed the unmeasurable productivity of a developer into a measurable thing</strong>. You told us you’d accomplish something, and you either hit, or didn’t hit the things you’d promised. The grade you get from your OKRs gives a definite number for your team, in a space that is fundamentally not measurable.</p>

<p>Is this good? Does this lead to increased effectiveness?</p>

<h2 id="but-whats-missing-from-the-objectives-and-key-results-loop">But What’s Missing from the Objectives and Key Results Loop?</h2>

<p>Taylor focused on more effective physical labor, with the goal of lower overall cost, whereas Drucker focused on more effective knowledge work, with the goal of more effective <em>results</em>.</p>

<p>But without an overall strategy, those results don’t matter.</p>

<p>For the overall organization, the user growth must lead to another round of investment, or the new features shipped must lead to better user churn, or the latest marketing copy must improve the cost of acquisition. So even with perfectly delivered results, there is no guarantee of <em>effectiveness for the overall organization</em>.</p>

<p>This brings up another few problems:</p>

<p>As an organization grows, some of the results we deliver might be the wrong ones for the <em>effectiveness of the overall organization</em>. They may also not get delivered, for reasons outside of our control. Maybe we needed to coordinate with another team who didn’t have bandwidth, or maybe we had a critical team member quit.</p>

<p>Worse still, if <strong>two teams must coordinate together</strong> for a goal, and each team targets an average of 70% of overall goal accomplishment, we should expect a <strong>49% chance of accomplishment</strong>. Increase the number of teams who must coordinate, and you rapidly diminish the chances of the goal ever being hit. Obviously, as an organization grows, the number of teams coordinating tends to grow too.</p>

<p>Again, even if we do get past the coordination costs with the right goal, the business environment may change, rendering our results useless. Luckily, in these large enough organizations, our larger organization’s general effectiveness can help smooth over a missed quarter.</p>

<h2 id="the-arena-of-bullshit">The Arena of Bullshit</h2>

<p>We can do all the right things well, and still be wrong in creative work. <strong>The uncertainty of success is part of doing truly creative work.</strong></p>

<p>There are, of course, ways to minimize the risk of doing the wrong things. But what happens when your team delivers the “wrong” results two quarters in a row? Three? Your focus becomes the narrative of the work, and why externalities are conspiring against you. Or about resources you need that you don’t have.</p>

<p>Because, as we’ve seen, there is no way of measuring creative output well and consistently. So <strong>creating narratives about why things work, or what went wrong, despite the obvious results in the short term is <em>really</em> the only thing that matters when everyone else is executing well</strong>, or if your organization has built itself a monopoly position. Or if you have access to a ton of cheap capital and losses don’t matter. Or…</p>

<p>Consider two teams, one responsible for delivering a critical component for the company. Midway through the quarter, it becomes apparent they need the support of an adjacent team, and none of the work was captured in that team’s OKRs.</p>

<p>Should that adjacent team help? Should they drop their committed results for the health of the overall organization? Or should they protect their department and keep the undelivered results contained in the other team? Should that other team have planned better?</p>

<h2 id="the-effective-rent-seekers">The “Effective” Rent Seekers</h2>

<p>My favorite book about the <em>rationality</em> of “ineffective” management is <a href="https://amzn.to/3a8V6yG">Barbarians at the Gate</a>. It’s the story of how <a href="https://en.wikipedia.org/wiki/F._Ross_Johnson">F. Ross Johnson</a> became the CEO of RJ Reynolds through a merger with Nabisco, and then proceeded to massively enrich himself and his friends at the expense of the company itself.</p>

<p>The book goes through the details of how, after failing to deliver results to the stock price, he decided to do a <a href="https://en.wikipedia.org/wiki/Leveraged_buyout">leveraged buyout</a> to take RJ Reynolds private.</p>

<p>His original deal would net him over <a href="https://money.cnn.com/magazines/fortune/fortune_archive/1989/04/24/71880/index.htm">$100 million</a> for the transaction. (Potentially worth over a billion if revenue targets were hit in the next few years.) By the end of the book, it’s apparent <strong>he attempted to extract as much as possible for himself, his friends, and the bankers involved in the deal</strong>. His flagrant greed led to the board working on other deals to sell the company, and eventually, nearly doubling his original offer for a purchase price of the company. After the board’s efforts, Johnson walked away with a mere $23 million instead of his potential billion.</p>

<p>For those of us who expect to contribute a net positive with our labor, the story of Barbarians at the Gate flips the model on its head.</p>

<p>Seeing how pure self interest to the detriment of the whole can work in practice was an eye opener for me. How do you effectively defend against hiring these kinds of people?</p>

<p>Larger organizations tend to accumulate these types of people on a smaller scale. Rather than focus on delivering results (which again is difficult(!), might be wrong, and often not easily identified by leadership), these people spend time making alliances and crafting narratives about outcomes instead of delivering. <strong>Title acquisition becomes the goal, seeking to maximize the expected organizational rewards, at the expense of direct results, all while working to minimize accountability.</strong></p>

<p>Importantly for <em>all</em> managers, this game of bullshit management is a <em>game to be played and defended against</em>, regardless of your overall strategy and values.</p>

<p><strong>You may genuinely care about delivering results, and they may be the right ones, but without a compelling narrative and web of support, your influence and overall effectiveness are stripped from you.</strong> Eventually, without your fellow teams continuing to deliver results, the larger organization suffers, and worst case, eventually dies.</p>

<h2 id="so-is-engineering-management-bullshit">So, Is Engineering Management Bullshit?</h2>

<p>We’ve seen that portions of engineering management’s responsibilities are unquestionably bullshit. In organizations of a large enough size, the “ineffective” management persona occupies a certain percentage of the organization. Their efforts and jockeying for political capital and stolen results inflict an increasing cost from the “effective” managers. Operating to mitigate their impact becomes operational overhead. Without leadership holding management accountable, the problem only grows.</p>

<p>So if you’ve wondered why so much of management seems to be occupied by <a href="https://www.ribbonfarm.com/2009/10/07/the-gervais-principle-or-the-office-according-to-the-office/">sociopaths</a>, it’s because of this very pattern. <strong>The minority of people who deliver results are drowned out by the people focusing on controlling the narratives. And for those who stick around, the operational overhead of the ineffective management only grows.</strong></p>

<p>But there is a flip side. If we allow the sociopaths to capture all the organizational authority and power, there’s nobody else left. We’re stuck in dysfunctional organizations everywhere, with no ability to effect change, other than dropping out of society.</p>

<p>Increasingly, the organizations themselves become rent seeking, rather than value producing.</p>

<h2 id="keeping-bad-leadership-at-bay">Keeping Bad Leadership at Bay</h2>

<p>It feels as though the costs of bullshit leadership continue to extract growing damages from society, while dodging any real consequences for said “bad” behavior. So how do we minimize their impact, and effect change here in a way that improves the world we live in?</p>

<p><strong>As consumers, we can’t see the costs of good or bad management in the price of our products.</strong> We can’t look at two different boxes of cereal and decide to pay an extra dollar for the one with management that doesn’t blame and inflict psychological violence on their employees.</p>

<p>As someone who cares (you do, right?!), it’s your job to run towards accountability and responsibility. It’s also your duty to be aware of the methods and strategies of sociopaths, along with their strategies for influence and blame.</p>

<p>Minimizing your personal costs while maximizing your personal effectiveness allows you optionality. Being the person who does the work this time, gives you the ability to accomplish it better next time. Eventually, you end up a key person, capable of choosing <em>where</em> to invest your specific talents.</p>

<p>Be deliberate about these investments, and they will have a very real impact.</p>

<h2 id="so-what-is-engineering-managments-job">So what is Engineering Managment’s Job?</h2>

<p>We’ve touched upon a vague set of goals, but haven’t really settled on what an effective manager’s daily job really is.</p>

<p>Going back to Drucker, a <strong>manager’s job is to prepare people to perform, and to give them freedom to do so</strong>.</p>

<p>Creative people need psychological safety in order to perform. An effective manager provides this, along with coaching to help reports become more effective versions of themselves. This should then allow them to reap long term rewards.</p>

<p>An effective manager builds trust and space for their reports to execute, along with guidance for where their efforts will have the largest impact, helping them to continue growing in their career.</p>

<p>An effective manager cares for their team, their work, and its effect beyond the immediate organization. They are stewards first, realizing that much is beyond their control.</p>

<p>The ones who care are just trying to keep the bullshit at bay.</p>

<!-- Begin MailChimp Signup Form -->
<link href="//cdn-images.mailchimp.com/embedcode/horizontal-slim-10_7.css" rel="stylesheet" type="text/css" />

<style type="text/css">
 #mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; width:100%;}
 /* Add your own MailChimp form style overrides in your site stylesheet or in this style block.
	  We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */
</style>

<div id="mc_embed_signup">
    <form action="https://buddhamindapp.us6.list-manage.com/subscribe/post?u=3feb377469b9e8fab8d52bd3f&amp;id=fb4cd887a4" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank" novalidate="">
        <div id="mc_embed_signup_scroll">
	          <label for="mce-EMAIL">Enter Your Email to Receive More Posts Like This</label>
	          <input type="email" value="" name="EMAIL" class="email" id="mce-EMAIL" placeholder="email address" required="" />
            <!-- real people should not fill this in and expect good things - do not remove this or risk form bot signups-->
            <div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_3feb377469b9e8fab8d52bd3f_fb4cd887a4" tabindex="-1" value="" /></div>
            <div class="clear"><input type="submit" value="Subscribe" name="subscribe" id="mc-embedded-subscribe" class="button" /></div>
        </div>
    </form>
</div>

<!--End mc_embed_signup-->]]></content><author><name>Kirk Kaiser</name></author><summary type="html"><![CDATA[Gather all of the authority, none of the accountability]]></summary></entry></feed>