Power-aware scheduler for AI clusters on variable energy, turning stranded megawatts into sellable GPU capacity.
AI infrastructure builders increasingly have megawatts that are intermittent, remote, or operationally unusual, but still sell compute as if power were flat and guaranteed. Generic schedulers optimize GPUs, not the changing power envelope behind them, so operators either overprovision, idle expensive hardware, or miss customer SLAs.
Why now
- Power availability is now explicitly identified as the bottleneck for larger AI models and new data centers.
- Offshore data-center infrastructure is moving from thought experiment to funded deployment, creating a new class of operators with novel scheduling needs.
- Renewable and ocean-energy companies are now designing directly for AI compute demand instead of only selling electricity.
- Lead investors are financing alternative AI-power architectures, which increases the number of sites that need software to commercialize variable energy.
Catalyst. Panthalassa's $140 million funding round shows that AI operators are actively moving compute toward nontraditional power sources because power availability has become the limiting factor.
The idea
The product sits above Slurm, Kubernetes, or proprietary cluster managers and turns real-time power availability into sellable compute tiers. It forecasts how many GPU-hours a site can safely promise, gates premium workloads from flex workloads, and reprices queue slots as energy conditions change. For operators, it converts stranded or intermittent power into a new revenue line without risking core SLAs. For customers, it exposes cheaper flex capacity with transparent completion windows and energy provenance. Over time, the company can build the benchmark dataset for how variable-power sites actually translate megawatts into AI throughput.
What's different. Existing cluster schedulers assume power is a static constraint, while utility software assumes compute demand is someone else's problem. This company is built around the translation layer between volatile megawatts and contractual GPU products. The moat comes from site-level data on how power variability, queue mix, and hardware behavior interact, which improves forecasting, pricing, and SLA design over time. That dataset becomes valuable not just to operators but to financiers, insurers, and customers evaluating nontraditional AI capacity.
| Beachhead | Independent GPU cloud operators commissioning first clusters on offshore, behind-the-meter renewable, or battery-backed power and initially selling batch training and offline inference jobs with flexible completion windows |
|---|---|
| Wedge | A power-aware scheduling and commercialization layer that ingests live power envelopes, forecasts available GPU-hours, and automatically admits, prices, and routes only the jobs that fit that energy profile |
| Non-obvious insight | The scarce resource in AI is no longer just GPUs but bankable power predictability. As offshore, battery-backed, and behind-the-meter sites become viable, the winning control layer will convert variable megawatts into contractual GPU products rather than assume grid-like stability. |
| Venture-scale path | Start as the control plane for flexible AI workloads on variable-power clusters, then expand into standard scheduling, energy provenance, capacity marketplaces, and financing-grade performance data for all power-constrained AI infrastructure. |
| Primary user | VP Infrastructure or Head of Capacity Engineering at an independent GPU cloud operator bringing 10 to 100 MW of nontraditional power-backed compute online |
|---|---|
| Secondary user | Capacity planning lead at a foundation model lab willing to buy discounted flexible training capacity |
| Economic buyer | COO, VP Infrastructure, or GM of cloud capacity at an AI infrastructure provider |
| First customer | A Series A or B GPU cloud startup launching its first 20 to 50 MW cluster on behind-the-meter renewable, battery-backed, or offshore-adjacent power and trying to sell enterprise training contracts before utilization is stable |
|---|---|
| Buying trigger | A new site comes online with non-flat power delivery, and the sales team needs a credible way to contract capacity without overcommitting uptime |
| Current alternative | Generic cluster schedulers plus spreadsheet-based capacity planning, manual throttling, and conservative overprovisioning |
| Switching reason | The wedge lets operators monetize power variability as discounted flex compute while protecting premium queues, so they can sell more capacity faster than with manual gating |
| Pricing hypothesis | Platform fee per MW under management plus a usage fee on GPU-hours scheduled into flex capacity tiers |
Jobs to be done
| Job | Current alternative | Success metric |
|---|---|---|
| When a new variable-power AI site goes live, help the infrastructure operator promise the right jobs to the right customers, so they can sell more capacity without missing SLAs. | Manual throttling and conservative overbooking inside generic schedulers | Increase in sellable utilization and flex-capacity revenue with no rise in missed job commitments |
flowchart LR Buyer[GPU cloud operator] --> Pain[Variable power makes capacity hard to sell] Pain --> Product[Power-aware GPU scheduler] Product --> Outcome[More sellable utilization without SLA failures]
- Signal · 5/5The cluster is anchored in a large financing event and repeated source language that power is now the AI bottleneck.
- Pain · 4/5Operators can lose millions through idle GPUs or broken contracts when power variability is unmanaged.
- Wedge · 5/5Power-aware admission control for flexible AI workloads is a narrow first product tied to a concrete workflow.
- Defense · 4/5Proprietary operating data linking power envelopes to throughput and SLA outcomes can compound into a strong moat.
- Scale · 5/5If nontraditional power becomes a major source of AI capacity, the scheduling and commercialization layer can become core infrastructure.
- GPU cloud startups
- Energy asset developers
- Data-center operators
- Cluster software vendors
- Integrating with cluster control planes
- Forecasting power envelopes
- Packaging compute tiers
- Scheduling software
- Power-to-throughput forecasting models
- Integrations with cluster managers
- Turn intermittent power into sellable compute
- Protect premium SLAs while monetizing flex workloads
- Forecast and prove energy-backed capacity
- High-touch deployment
- Technical account management
- Capacity planning reviews
- Direct sales
- Data-center and energy project partners
- GPU cloud ecosystem
- Independent GPU cloud operators
- AI infrastructure developers
- Foundation model labs buying flex capacity
- Engineering
- Deployment and support
- Cloud infrastructure
- Sales to AI infrastructure operators
- Platform fee per MW
- Usage fee per GPU-hour scheduled
- Premium analytics modules
Market
| TAM | $74.6M Bottom-up, conservative visible-MW method: 900 MW announced at Crusoe's new Abilene campus plus 166 MW for Soluna's Project Kati = 1,066 MW of publicly visible energy-first AI capacity; applying an estimated $70k per MW-year software spend yields about $74.6M. This intentionally undercounts other sites and assumes pricing that is a small fraction of implied GPU compute revenue. [22][25] |
|---|---|
| SAM | $30.0M Constraint the TAM to the first-wave beachhead: roughly 500 MW of North American independent GPU-cloud and renewable-backed sites likely to sell third-party capacity rather than internalize the capability; at $60k per MW-year, SAM is about $30.0M. [1][22][24][25] |
| SOM | $4.8M Reachable year-3 SOM assumes 80 MW under management at roughly $60k per MW-year, equivalent to 8 sites averaging 10 MW or 4 sites averaging 20 MW. This is modest relative to visible public site sizes and fits a beachhead-first rollout. [16][18][22][25] |
Executive takeaways
- The beachhead is real but narrow: public energy-first AI buildouts now span fresh Panthalassa funding and site announcements that jumped from tens of MW to 166 MW and 900 MW-class projects, but the first buyer set is still a small club of neoclouds and power-backed operators.
- The strongest wedge is commercialization, not scheduling alone: hyperscaler spot/preemptible products prove demand for discounted flexible compute, while generic schedulers still treat power as a static constraint rather than a sellable product attribute.
- Incumbent risk is feature absorption by cloud and scheduler vendors, so the startup must own live power forecasting, job admission, and SLA packaging specific to variable-energy sites rather than compete as yet another cluster dashboard.
- Public GPU pricing shows that even a thin software take-rate can matter economically because each MW of AI capacity can produce high revenue density before any flex-capacity uplift.
- Adoption friction is operational rather than conceptual: infra teams already buy managed Slurm, Kubernetes, and cluster tooling, but they are cautious about inserting a new control layer into the critical path.
- Why now is stronger than the original offshore narrative alone: batteries, onsite generation, and renewable-curtailment strategies are becoming part of AI infrastructure design, expanding the number of sites that need a power-aware control plane.
Market definition
The market is software that turns constrained or variable-power AI clusters into sellable compute products for independent GPU clouds, renewable-backed data-center developers, and other operators who cannot assume flat grid-like capacity. The first buyer geography is North America, where public spot/preemptible compute markets are mature and many visible energy-first AI projects are being announced. Included are live power forecasting, admission control, flex-tier packaging, and queue orchestration above existing cluster managers. Excluded are generic schedulers, general DCIM/energy-management tools, retail GPU brokers, and vertically integrated clouds where the operator internalizes the problem instead of selling neutral control software. [3][4][6][8][10][12][22][24][27]
Customer and buyer
Primary users are capacity-engineering and cluster-operations teams who must decide which jobs can be promised under a non-flat power envelope. The economic buyer is typically the COO, VP Infrastructure, or GM of capacity at a GPU cloud or power-backed compute operator. Their urgent jobs are to protect premium SLAs, monetize flexible workloads, and stop sales teams from overcommitting a site whose available power, cooling, or energization schedule is still changing. Current alternatives are generic Slurm/Kubernetes-based queues, manual throttling, and spreadsheet capacity planning. [3][4][6][8][9][10][11][12][14][17][19][21]
Buying triggers
- A new site or expansion phase comes online with non-flat power delivery or staged energization, and the operator needs a credible way to contract capacity before utilization is stable. [1][22][25]
- The sales team wants to offer discounted flex capacity analogous to spot/preemptible compute rather than leave intermittent capacity idle. [3][6][8][20]
- Existing Slurm or Kubernetes queues stop being enough because the operator now needs policy around who gets scarce capacity, when preemption is acceptable, and how to expose those tradeoffs commercially. [10][11][12][14][17][19]
Willingness to pay
Willingness to pay is supported indirectly by existing spend on flexible compute and cluster operations. AWS and Google market interruptible capacity as 90%+ discounts to on-demand pricing, while Lambda and Crusoe publish H100/H200 pricing in the roughly $3.9 to $6.2 per GPU-hour range. That means even modest improvements in sellable utilization or safer flex-capacity packaging can justify meaningful software spend without requiring a new budget category. [3][6][16][20][32] [3][6][16][20][32]
Category dynamics
Tailwinds
- Spot and preemptible compute markets already normalize the idea that some AI workloads can trade certainty for lower cost.
- Energy-first operators are pairing AI compute with onsite generation, storage, or curtailed renewables, increasing the number of non-flat-power sites.
- Scheduler primitives are commoditizing, making a commercialization layer more plausible than building a net-new cluster manager.
Headwinds
- The initial buyer set is still small, and some of the most sophisticated operators may prefer vertically integrated solutions or in-house tooling.
- Any software that touches admission control faces trust, integration, and rollback concerns from operators running expensive GPU fleets.
Validation signals
- Panthalassa's $140M round shows investors will finance nontraditional AI-power architectures rather than assume all future capacity sits on conventional grid-backed campuses.
- Crusoe announced a new 900 MW AI campus for Microsoft and says the broader Abilene site is projected to reach 2.1 GW, signaling campus-scale demand for power-shaped AI infrastructure.
- Crusoe and Form Energy announced 12 GWh of iron-air batteries for AI data centers, indicating power-shaping is moving into the operating design of new sites.
- Soluna explicitly pitches AI loads as flexible demand and says Project Kati is a 166 MW wind-powered data center with an 83 MW first phase.
- Lambda and Runpod both expose managed Slurm / cluster offerings, which confirms operators already buy higher-level control and operational tooling on top of raw GPUs.
- AWS, Google, and Azure all market interruptible capacity products, proving demand for lower-cost compute with weaker delivery guarantees.
Regulatory & technical constraints
- Data-center energy intensity and large-load planning remain real constraints, so any commercialization layer must map promised compute to actual site energy and cooling limits.
- Flexible-capacity products only work for workloads tolerant of interruption, preemption, or completion windows.
- Integration risk is high because the product must ingest or influence Slurm, Kubernetes, or managed-cluster control planes without creating instability.
- Forecast quality depends on access to live power, cooling, and queue telemetry that often lives in separate operational systems.
- Security and operator trust matter because the software touches scheduling policy, capacity allocation, and potentially customer-facing commitments.
Competition
Competition comes from four directions: cloud platforms that already sell interruptible or reserved capacity, open-source schedulers that own job placement, neoclouds that bundle operations with capacity sales, and vertically integrated energy-first operators that solve the problem inside their own stack. The startup should not try to replace Slurm or become another GPU broker; it should own the translation layer between volatile megawatts and contractual GPU products. [3][4][8][10][11][16][18][20][21][29][35]
| Competitor | Stage | Wedge | Pricing | Strength | Weakness vs. us |
|---|---|---|---|---|---|
| SchedMD / Slurm | incumbent | Default open-source workload manager for HPC and AI clusters with built-in QoS, reservations, and preemption. | Open-source software; enterprise support and services sold separately. | Deep operational credibility and broad deployment in GPU/HPC environments. | Optimizes scheduling primitives but does not package variable-power capacity into customer-facing products or revenue-aware admission rules. |
| AWS EC2 Spot + Capacity Blocks | incumbent | Commercializes interruptible and reserved GPU capacity directly inside a hyperscaler cloud. | Spot priced at up to 90% below on-demand; Capacity Blocks reserve ML GPU capacity. | Buyer familiarity, mature APIs, and immediate proof that flexible compute is a real buying behavior. | Only solves the problem inside AWS and does not help third-party operators monetize their own variable-power sites. |
| Lambda | scale-up | Neocloud with public GPU pricing and managed Slurm/Kubernetes offerings. | Public H100 and B200 instance / cluster pricing; H100 cluster pricing published from roughly $5.54-$6.16 per GPU-hour for larger reserved clusters. | Transparent pricing, operational maturity, and managed cluster distribution into exactly the kind of buyer the startup wants. | Lambda sells capacity and managed infrastructure, not a neutral power-aware commercialization layer for third-party sites. |
| Runpod | scale-up | Developer-friendly GPU cloud with instant clusters and Slurm-based cluster docs. | Public pricing for pods and serverless plus instant-cluster documentation. | Fast time-to-value and strong fit for operators or buyers who prioritize speed and flexibility. | Not centered on power-envelope forecasting or flex-capacity packaging tied to site energy constraints. |
| Crusoe | scale-up | Vertically integrated energy-first AI cloud combining data centers, power strategy, cloud pricing, and operations tooling. | Public cloud pricing including H100/H200 rates and spot / on-demand options. | Most complete substitute because it can solve the problem via vertical integration rather than neutral software. | A neutral startup can serve operators that want the capability without buying Crusoe's full infrastructure stack or competing clouds. |
Why incumbents do not win by default
- Cloud platforms. AWS, Google Cloud, and Azure already train buyers to accept spot, preemptible, and reserved GPU capacity, but their products are tied to their own infrastructure and do not optimize around site-specific power envelopes, phased energization, or neutral multi-site commercialization.
- Open-source schedulers. Slurm, Kueue, and Kubernetes handle quotas, preemption, queue fairness, and job placement, but they do not natively convert changing power availability into product tiers, delivery windows, or revenue-aware admission rules.
- Neocloud operators. Lambda and Runpod sell managed clusters and capacity quickly, yet their economic center is selling standard compute inventory; a neutral control plane can still win at sites that need commercialization software before they look like a conventional cloud region.
- Vertically integrated power-first AI clouds. Crusoe is the strongest strategic substitute because it combines energy, data centers, cloud pricing, and an operations platform, but that vertical model does not win by default when operators want the software capability without outsourcing the entire infrastructure stack.
- In-house operations. Many operators can stitch together scheduler controls, spreadsheets, and manual throttling, but the operational burden of forecasting, packaging, and proving flexible capacity is precisely where specialist software should add value.
Business plan
This company should sell a power-aware commercialization layer to North American GPU cloud operators bringing 10-50 MW variable-power clusters online and needing to contract capacity before site performance is stable. The initial product is not a new scheduler; it is a read-only forecasting, flex-queue, and SLA-packaging layer above Slurm, Kubernetes, or managed cluster control planes. The first customer should be a Series A or B neocloud or power-backed operator launching a behind-the-meter, battery-backed, curtailed-renewable, or offshore-adjacent site and trying to protect premium queues while monetizing cheaper flexible training capacity. The buying trigger is a site launch or expansion with staged energization or non-flat power delivery, because that is when spreadsheet planning and generic schedulers stop being commercially sufficient. Research supports a real wedge and clear willingness to pay, but the near-term market is still narrow at roughly $30.0M SAM, so expansion into additional power-constrained AI infrastructure categories is required for venture scale. Product and GTM therefore need to sequence from read-only proof, to flex-capacity packaging, to admission control only after the company has site-level forecasting data and operator trust. The biggest open market risk is volume: the inputs do not establish how many non-hyperscaler variable-power AI sites will be commercially live in the next 24 months, so early success should be judged by paid pilots and managed MW rather than broad logo count.
Problem
- Independent GPU clouds and energy-backed AI sites increasingly have usable megawatts that are intermittent, staged, or operationally unusual, but they still need to sell compute with credible delivery commitments.
- Generic schedulers and manual capacity planning allocate jobs, but they do not convert changing power envelopes into sellable product tiers, pricing, and SLA rules, so operators either idle GPUs, overconstrain sales, or risk missed commitments.
Solution
- Provide a control layer above existing cluster managers that forecasts available GPU-hours from live power and site telemetry, packages premium and flex capacity tiers, and recommends which workloads fit the current envelope.
- Start with read-only forecasting, flex-queue creation, and commercial policy tooling, then graduate to automated admission control once operators trust the forecasts and exception handling.
Why we win
- The beachhead pain is tied to a specific trigger, buyer, and workflow: new variable-power sites need a way to contract capacity safely before utilization is stable.
- Incumbent schedulers own queue mechanics, and clouds own their own spot products, but neither gives third-party operators a neutral layer that translates site-specific power volatility into contractual GPU products.
- Early deployments can compound a proprietary dataset linking power variability, queue mix, and delivered GPU-hours, which improves forecasting, pricing, and SLA design over time.
| Beachhead | North American independent GPU cloud operators launching their first 10-50 MW behind-the-meter renewable, battery-backed, curtailed-power, or offshore-adjacent clusters and initially selling batch training or offline inference with flexible completion windows. |
|---|---|
| Wedge rationale | This slice has the clearest buying trigger, a small number of reachable technical buyers, and proof metrics such as managed MW, sellable utilization lift, forecast error, flex-queue fill rate, and SLA miss avoidance; it is faster to validate than selling a generic AI infrastructure optimization platform. |
| Sequencing | Start with forecasting and flex-capacity packaging because operators will tolerate a read-only overlay sooner than a new control-plane dependency, then add admission control, broader site integrations, and expansion analytics only after the company proves commercial lift and earns operational trust. |
| Not yet | Full replacement of Slurm, Kubernetes, or managed cluster control planes. · Retail marketplace aggregation for end customers across many providers. · Always-on latency-sensitive inference or premium enterprise workloads that require flat-power assumptions. · Financing, insurance, or energy-provenance analytics sold as standalone products before core control software is deployed. |
| Wedge | Sell a paid pilot to the infrastructure or capacity leader at a GPU cloud bringing a variable-power site online, package the software as the fastest way to create a discounted flex-compute product without risking premium SLAs, and convert to annual production pricing once the pilot demonstrates higher sellable utilization and reliable completion-window performance. |
|---|---|
| Channels | Founder-led direct sales to neocloud, AI infrastructure, and power-backed site operators. · Energy and data-center developers that market AI-specific sites and can introduce new deployments before energization. · Managed Slurm, Kubernetes, and NVIDIA ecosystem partners that already sit in cluster operations. |
| Funnel targets | 12-15 target accounts per year -> 30-40% qualified pilot discussions -> 25-35% paid pilot rate -> 50%+ pilot-to-production conversion -> 60%+ of production customers expanding managed MW or adding a second site within 12 months. |
| Pricing | Paid pilot followed by annual software pricing anchored to MW under active management, with a secondary usage fee on flex GPU-hours scheduled through the platform; the rationale is that research indicates roughly $60k-$70k per MW-year is supportable if the software materially increases sellable utilization and reduces SLA risk. |
| MVP | The MVP should ingest live or periodic power-envelope inputs plus queue telemetry, forecast safe sellable GPU-hours, create premium versus flex capacity rules, and surface a read-only planning console with audit logs for one cluster running on Slurm, Kubernetes, or an equivalent managed control plane. It should not own full job placement at first; the proof point is that operators can contract and route flex workloads more confidently before inserting the product into the critical path. |
|---|---|
| 6 months | Land 1-2 paid pilots, support one production-like cluster integration, ship flex-queue policy templates for batch training and offline inference, and prove forecast accuracy and utilization lift on live workloads. |
| 12 months | Add guarded admission-control automation, site-level SLA policy management, customer-facing flex-capacity quoting workflows, and reusable integrations for the most common cluster telemetry and scheduler stacks seen in pilots. |
| 24 months | Standardize a multi-site control layer across several operators, add benchmarking for interruption tolerance and energy-linked throughput, and expand from single-site flex packaging into portfolio-level capacity planning and expansion inside existing customers. |
| Key bets | A read-only overlay can show enough ROI to win deployment before operators trust live admission control. · Early sites will have enough interruption-tolerant batch training and offline inference demand to fill flex tiers. · Buyers will pay on managed MW and commercial uplift rather than view this as a low-value scheduler feature. · Site-level forecasting accuracy will improve materially with cross-customer operating data, creating a defensible advantage over in-house tooling. |
| Revenue streams | Annual platform subscription priced by managed MW of variable-power AI capacity. · Usage-based fees on flex GPU-hours scheduled or contracted through the platform. · Implementation and integration fees for first-site deployment. · Premium analytics modules for forecasting, SLA reporting, and portfolio benchmarking. |
|---|---|
| Unit of value | Managed MW of variable-power AI capacity. |
| Target gross margin | 70% |
| Expansion levers | Grow from one pilot site to more MW and additional sites inside the same operator. · Add admission-control and commercial-policy modules after forecasting is trusted. · Expand from batch training into additional interruption-tolerant workload classes as benchmark data improves. · Sell benchmarking and planning analytics to existing operator customers once multi-site data exists. |
| North-star metric | Contracted flex GPU-hours delivered within promised completion windows from managed variable-power sites. |
|---|---|
| Input metrics | Managed MW under active forecasting. · Forecast error between promised and delivered GPU-hours. · Sellable utilization lift versus pre-deployment baseline. · Premium queue SLA miss rate. · Flex-queue fill rate and repeat purchase rate. · Paid pilot to production conversion rate. |
| Moats to build | Proprietary dataset linking site power envelopes, queue mix, and delivered GPU throughput. · Commercial-policy templates for packaging premium versus flex AI capacity by workload class. · Reusable integrations and rollback-safe deployment playbooks above common cluster control planes. |
| Kill criteria | If the first 3 pilots cannot show at least 10% sellable utilization lift or materially lower overcommit risk without increasing premium SLA misses, the wedge is too weak. · If no operator converts to production pricing above a meaningful managed-MW contract within 12 months of pilot start, the category is likely too narrow or too easy to internalize. · If buyers consistently require full control-plane replacement before paying, the product will be too integration-heavy for efficient pre-seed execution. |
Milestones
- Secure 2 design partners and convert at least 1 into a paid overlay pilot.
- Prove a read-only deployment that improves sellable utilization or commitment confidence without increasing premium SLA misses.
- Ship one repeatable integration path above a common cluster control stack and complete a production-ready security package.
- Convert the first pilot into an annual production contract with a defined managed-MW expansion path.
- Reach 3-4 production customers and roughly 30-40 MW under active management.
- Launch guarded admission control for flex workloads at production customers.
- Show at least one customer expansion from the first site into additional MW or a second site.
- Build internal benchmarking on forecast accuracy and interruption tolerance across multiple deployments.
- Reach the researched year-3 target of roughly 80 MW under management.
- Establish the company as the default commercialization layer for third-party variable-power AI sites in its beachhead segment.
- Expand product scope into portfolio-level planning and benchmarking while preserving neutral-control-plane positioning.
flowchart LR Wedge[Variable-power site launch wedge] --> MVP[Read-only forecasting and flex-queue MVP] MVP --> Proof[Forecast accuracy utilization lift first paid pilot] Proof --> Expansion[Admission control multi-site expansion benchmarking]
Founding team
| Role | Start timing | Rationale |
|---|---|---|
| Founding eng | Month 0 | Owns forecasting engine, telemetry ingestion, policy logic, and first operator integrations. |
| Founder CEO | Month 0 | Required for founder-led sales into a concentrated technical buyer set and for design-partner recruitment. |
| Founding product/infrastructure | Month 0 | Bridges operator workflows, pilot metrics, and scheduler integration details so the roadmap stays tied to commercial proof. |
| Integration / solutions engineer | Month 4-6 | Needed once pilots start to handle scheduler integrations, deployment safety, and customer-specific telemetry mapping without slowing core product work. |
| Customer success / implementation lead | Month 9-12 | Supports pilot-to-production conversion and creates a repeatable rollout process as customers add more MW or sites. |
Experiment roadmap
| Horizon | Experiment | Hypothesis | Success metric | Owner |
|---|---|---|---|---|
| 0–90 days | Interview 12-15 target operators and site developers to build a named beachhead account list with launch timing, MW size, and buy-versus-build preferences. | The market contains enough imminent variable-power sites to support at least 3 paid pilot opportunities in the next 12 months. | At least 10 qualified target accounts and 3 active pilot-scope discussions. | Founder CEO |
| 0–90 days | Build a simulation or shadow-mode prototype that converts sample power-envelope changes into revised sellable GPU-hour forecasts and flex-tier recommendations. | Operators will see clear commercial value from forecasting and packaging before the product touches live admission control. | At least 2 design partners agree the prototype is valuable enough to scope a paid overlay pilot. | Founding eng |
| 3–6 months | Run the first paid overlay pilot on one live cluster with forecast dashboards, flex-queue policy rules, and post hoc comparison against manual planning. | The product can increase safe sellable utilization without increasing premium SLA misses. | At least 10% utilization lift or equivalent reduction in overcommit risk with no increase in premium queue SLA misses during the pilot window. | Founding product/infrastructure |
| 6–12 months | Add guarded admission-control recommendations or automation for flex workloads at the first production candidate. | Once overlay value is proven, operators will trust limited control-path automation for non-premium jobs. | First customer authorizes production use for at least one flex workload class and signs an annual contract. | Integration / solutions engineer |
| 12–18 months | Benchmark interruption tolerance, price sensitivity, and completion-window acceptance across early workload types and sites. | Cross-customer data will reveal repeatable flex-capacity product templates that raise win rate and forecast quality. | Publish internal benchmarks used in at least 2 expansion deals and show better forecast accuracy than site-specific manual rules alone. | Founder CEO |
Risk assessment
- R1The number of commercially relevant variable-power AI sites may be too small for fast ARR growth. — Sell to any power-constrained or staged-energization AI site with the same commercialization problem, not just offshore deployments.
- R2Operators may refuse to trust a startup in the scheduling control path. — Begin with read-only forecasting and flex packaging, prove value on non-premium workloads, and add automation gradually.
- R3Vertically integrated providers or in-house teams may absorb the functionality. — Focus on neutral third-party operators and compound a forecasting plus commercialization dataset that is hard to recreate quickly.
- R4Flex-capacity demand may not be deep enough to justify the pricing model. — Validate interruption-tolerant workload classes early and tune product packaging around concrete completion-window preferences.
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| The number of commercially relevant variable-power AI sites may be too small for fast ARR growth. | High | High | Sell to any power-constrained or staged-energization AI site with the same commercialization problem, not just offshore deployments. |
| Operators may refuse to trust a startup in the scheduling control path. | High | High | Begin with read-only forecasting and flex packaging, prove value on non-premium workloads, and add automation gradually. |
| Vertically integrated providers or in-house teams may absorb the functionality. | Medium | High | Focus on neutral third-party operators and compound a forecasting plus commercialization dataset that is hard to recreate quickly. |
| Flex-capacity demand may not be deep enough to justify the pricing model. | Medium | Medium | Validate interruption-tolerant workload classes early and tune product packaging around concrete completion-window preferences. |
| Title | Capacity leader at a variable-power neocloud |
|---|---|
| Profile | A North American GPU cloud startup commissioning its first 20-50 MW site with staged energization or non-flat behind-the-meter power and selling batch training capacity to external customers. |
| Trigger | The site is nearing launch, the sales team wants to contract capacity, and operations cannot yet promise flat always-on delivery with confidence. |
| Buyer | VP Infrastructure |
| Initial contract | Assumed $75k-$150k paid pilot on one site, converting to annual production pricing based on managed MW and flex GPU-hour volume once the operator proves safe utilization lift; credible early production range is roughly $300k-$800k annualized depending on live MW and usage. |
What must be true
- At least 5-10 non-hyperscaler variable-power AI sites in the target geography will be commercially relevant buyers within the next 24 months.
- Operators will pay for a read-only forecasting and flex-capacity layer before demanding full scheduler replacement or full in-house builds.
- One or more early workload classes such as batch training or offline inference will accept completion windows large enough to create repeatable flex demand.
- The platform can improve sellable utilization or commitment confidence enough to justify roughly $60k-$70k per MW-year pricing.
- Early deployments will generate forecasting and commercialization data that meaningfully outperform manual planning and generic scheduler rules over time.
Open diligence questions
- How many live or funded sites in the next 24 months actually match the beachhead profile and third-party software buying model?
- What quantitative proof would make an infra leader trust this product in overlay mode, and later in admission control?
- Why will Crusoe-like vertical providers or in-house scheduler extensions not win the first deal by default?
- Which workload types create the highest early flex fill rate without unacceptable customer support burden?
- Does pricing by managed MW map to how buyers budget, or will they insist on pure usage pricing or bundled services?
| Call | Watch |
|---|---|
| Conviction | Medium-low conviction because the product wedge is crisp and timely, but the first-wave buyer pool and standalone market size may be too small unless expansion happens quickly. |
| Why believe | Power-constrained AI infrastructure is becoming real, and a neutral layer that converts volatile megawatts into contractual compute products addresses a concrete operational pain that generic schedulers do not solve. |
| Why doubt | The inputs still leave open whether enough third-party variable-power sites will launch soon, and whether operators will buy neutral software instead of building internally or defaulting to vertically integrated providers. |
| Next diligence | Confirm that at least two target operators with live or imminent variable-power clusters will fund a paid overlay pilot before the company asks for deep control-plane authority. |
Financial model
| Year 1 revenue | $495K EBITDA $-658K · Cash EOP $1.34M |
|---|---|
| Year 2 revenue | $1.93M EBITDA $-498K · Cash EOP $844K |
| Year 3 revenue | $4.18M EBITDA $353K · Cash EOP $1.20M |
| ARPU (annual) | $660K |
|---|---|
| Gross margin | 70% |
| CAC | $301K Payback 7.8 months |
| LTV / CAC | 8.5x LTV $2.57M |
| Round | pre-seed · $2.0M |
|---|---|
| Runway | 30 months |
| Milestone | Reach 4 production customers and roughly 40 MW under management, ship guarded admission control for flex workloads, and show the first multi-site expansion before a seed round. |
Model sanity
- Revenue engine. Base-case growth comes from scaling from 2 to 8 active operators at about 10 managed MW each and roughly $660K blended annual revenue per operator.
- Must go right. The first paid pilots have to convert within about two quarters so the company reaches 4 production customers and ~40 MW under management by the end of Y2.
- Model breaks if. If the beachhead pipeline only supports 6 active operators or gross margin stays in the high-60s, downside cash falls close to the floor before the seed milestone is proven.
- Next-round proof. A seed round is justified by 4 production customers, a guarded admission-control module in market, and at least one multi-site expansion reference.
- Revenue (line, area)
- Cash EOP (dashed)
- EBITDA (bars, gray = loss)
- Founder CEO
- Engineering
- Product/Infrastructure
- Solutions Engineer
- Customer Success/Implementation
- Sales
- G&A/Finance
| Y3 revenue | Y3 EBITDA | Cash low point | Description | |
|---|---|---|---|---|
| Downside | Pilot conversion stretches by a quarter, one planned site expansion does not close, and the revenue mix lands closer to core managed-MW pricing than the blended package. | |||
| Base | Founder-led year 1 lands two revenue-generating operators, year 2 reaches 4 production customers / ~40 MW, and year 3 expands to 8 active operators / ~80 MW. | |||
| Upside | The first production customer expands faster, referenceability compresses the sales cycle, and one extra operator lands in Y3 without materially increasing fixed cost. |
| Variable | Downside | Upside | Cash impact | Revenue impact |
|---|---|---|---|---|
| sales cycle | Pilot-to-production conversion slips by ~1 quarter because security and ops reviews take longer | Reference customers compress new-logo sales by ~1 quarter | ||
| ARPU | $600K blended annual revenue per customer | $720K blended annual revenue per customer | ||
| CAC | CAC rises toward ~$340K as the first rep needs more field time and references | Reference-led selling keeps CAC near ~$260K | ||
| hiring pace | A second AE and extra engineering hire are pulled forward before repeatability is proven | One non-core hire is delayed until after Q4Y2 conversion targets are met | ||
| churn | Monthly churn trends toward 2.5% and Y3 exits with one fewer retained operator | Monthly churn holds near 1.0% with strong renewal and expansion | ||
| gross margin | Gross margin stays at 67% because implementation remains service-heavy | Gross margin improves to 72% with cleaner integrations and less support load |
Scenarios
| Scenario | Y3 revenue | Y3 EBITDA | Cash low point | Description | Key changes |
|---|---|---|---|---|---|
| Downside | $3.13M | $-186K | $214K | Pilot conversion stretches by a quarter, one planned site expansion does not close, and the revenue mix lands closer to core managed-MW pricing than the blended package. |
|
| Base | $4.18M | $353K | $827K | Founder-led year 1 lands two revenue-generating operators, year 2 reaches 4 production customers / ~40 MW, and year 3 expands to 8 active operators / ~80 MW. |
|
| Upside | $4.95M | $920K | $910K | The first production customer expands faster, referenceability compresses the sales cycle, and one extra operator lands in Y3 without materially increasing fixed cost. |
|
Sensitivity
| Variable | Downside | Base | Upside |
|---|---|---|---|
| ARPU | $600K blended annual revenue per customer | $660K blended annual revenue per customer | $720K blended annual revenue per customer |
| CAC | CAC rises toward ~$340K as the first rep needs more field time and references | Forecast CAC of ~$301K per net new operator | Reference-led selling keeps CAC near ~$260K |
| churn | Monthly churn trends toward 2.5% and Y3 exits with one fewer retained operator | Monthly churn stays at 1.5% with expansions offsetting early logo risk | Monthly churn holds near 1.0% with strong renewal and expansion |
| sales cycle | Pilot-to-production conversion slips by ~1 quarter because security and ops reviews take longer | Pilot-to-production conversion stays inside roughly 2 quarters after proof of value | Reference customers compress new-logo sales by ~1 quarter |
| gross margin | Gross margin stays at 67% because implementation remains service-heavy | Gross margin reaches the 70% business-plan target | Gross margin improves to 72% with cleaner integrations and less support load |
| hiring pace | A second AE and extra engineering hire are pulled forward before repeatability is proven | Hires stay milestone-based around pilots, conversions, and multi-site proof | One non-core hire is delayed until after Q4Y2 conversion targets are met |
Key assumptions (30)
| ID | Name | Value | Unit | Source |
|---|---|---|---|---|
| A1 | Model start month | 2026-06 | month | [business-plan.yaml date] startup-finance heuristic: first full month after plan date |
| A2 | Opening cash at M1 | 2000 | USDK | [business-plan.yaml fundingAsk.targetFundingRangeUsd] selects the low end of the stated $2–4M pre-seed range because the base case reaches the next milestone before cash drops below ~$0.8M |
| A3 | Mature managed MW per active operator | 10 | MW/customer | [research.yaml market.som] 80 MW year-3 SOM mapped to 8 active operators in the base case |
| A4 | Blended annual revenue per mature customer | 660 | USDK/year | [business-plan.yaml gtm.pricing; businessModel.revenueStreams; research.yaml bottomUpSizingDrivers] 10 MW x ~$60K/MW-year plus ~10% usage and analytics uplift |
| A5 | Revenue recognized in landing month | 50 | percent of monthly ARPU | startup-finance heuristic: enterprise pilots and first production ramps start mid-month on average |
| A6 | Active paying customer ramp | 2 by Y1 / 4 by Y2 / 8 by Y3 | customers | [business-plan.yaml milestones; research.yaml validationPlan] conservative path inside the stated 4–8 early commercial deployments and 80 MW year-3 target |
| A7 | Gross margin target | 70 | percent | [business-plan.yaml businessModel.targetGrossMarginPct] |
| A8 | Founder CEO loaded annual cash cost | 144 | USDK/year | startup-finance heuristic: $120K cash salary plus 20% payroll tax and benefits |
| A9 | Engineer loaded annual cash cost | 180 | USDK/year | startup-finance heuristic: $150K salary plus 20% payroll tax and benefits for GPU infrastructure talent |
| A10 | Product/infrastructure loaded annual cash cost | 168 | USDK/year | startup-finance heuristic: $140K salary plus 20% payroll tax and benefits |
| A11 | Solutions engineer loaded annual cash cost | 156 | USDK/year | startup-finance heuristic: $130K salary plus 20% payroll tax and benefits |
| A12 | Customer success / implementation loaded annual cash cost | 132 | USDK/year | startup-finance heuristic: $110K salary plus 20% payroll tax and benefits |
| A13 | Sales AE loaded annual base cost | 168 | USDK/year | startup-finance heuristic: $140K base plus 20% payroll tax and benefits; commissions are modeled separately |
| A14 | G&A / finance loaded annual cash cost | 120 | USDK/year | startup-finance heuristic: $100K salary plus 20% payroll tax and benefits |
| A15 | R&D non-payroll spend | 10 in Y1, 12 in Y2, 14 in Y3 | USDK/month | startup-finance heuristic: cloud telemetry, observability, and security tooling for infrastructure software |
| A16 | Sales and marketing non-payroll spend | 6 in Y1, 8 in Y2, 10 in Y3 + 1.5 per AE + 8% of revenue | USDK/month | startup-finance heuristic: founder travel, references, conferences, and commissions for high-touch enterprise selling |
| A17 | G&A non-payroll spend | 7 in Y1, 9 in Y2 pre-finance hire, 10 after finance hire, and 11 in Y3 | USDK/month | startup-finance heuristic: legal, insurance, audit, and back-office overhead for critical infrastructure customers |
| A18 | First solutions engineer hire | Month 5 | month | [business-plan.yaml team] integration / solutions engineer needed once pilots start |
| A19 | Second engineer hire | Month 9 | month | startup-finance heuristic tied to [business-plan.yaml milestones 0–12 months] to finish repeatable integration and production security packaging before first conversion |
| A20 | First customer success hire | Month 10 | month | [business-plan.yaml team] customer success / implementation lead starts in the month 9–12 window |
| A21 | First AE hire | Month 14 | month | [business-plan.yaml gtm.funnelTargets; team] startup-finance heuristic: add dedicated sales only after the first production contract proves the wedge |
| A22 | Third engineer hire | Month 16 | month | [business-plan.yaml milestones 12–24 months] supports guarded admission control and multi-site benchmark ingestion |
| A23 | Finance / compliance hire | Month 22 | month | startup-finance heuristic tied to [business-plan.yaml milestones 12–24 months] when 3–4 production customers require more vendor management and finance ops |
| A24 | Fourth engineer hire | Month 22 | month | [business-plan.yaml milestones 24–36 months] supports portfolio analytics and second-site expansion readiness |
| A25 | Second customer success hire | Month 28 | month | startup-finance heuristic tied to [business-plan.yaml milestones 24–36 months] when customers add more MW or second sites |
| A26 | Second AE hire | Month 31 | month | startup-finance heuristic: add a second seller only after referenceable production deployments exist |
| A27 | Steady-state monthly churn for unit economics | 1.5 | percent | startup-finance heuristic: conservative early-stage enterprise infrastructure-software renewal risk |
| A28 | Blended CAC | 300.8 | USDK/customer | calculated from forecast Y2–Y3 sales and marketing spend of $1.805M divided by 6 net new active operators |
| A29 | Cash conversion timing | In-period collection | policy | startup-finance heuristic; flagged because infrastructure operators may still pay on 45–60 day terms |
| A30 | Funding ask | 2.0 | USDM | [business-plan.yaml fundingAsk] aligns with the low end of the stated range while preserving a >$0.8M cash floor in the base case |
flowchart LR TargetAccounts --> PaidPilots PaidPilots --> ManagedMW ManagedMW --> ProductionCustomers ProductionCustomers --> Revenue Revenue --> GrossProfit GrossProfit --> Cash
Flags: The initial SAM is still narrow, so the base case depends on winning 8 of a small number of North American variable-power operator deployments by Q4Y3. · Revenue is modeled as a blended per-operator ARPU and does not separately show subscription, usage, and implementation mix by contract. · Cash collection is assumed in-period even though infrastructure operators may still pay on 45–60 day terms, which would reduce the base-case cash cushion. · Holding 70% gross margin from Y2 onward assumes site integrations become meaningfully repeatable instead of remaining service-heavy.
Top risks
- Market still early. There may be too few variable-power AI clusters in production this year to support fast initial ARR growth. Mitigation: Start with any behind-the-meter or curtailed-power GPU sites, not only offshore deployments, while keeping the same product architecture.
- Integration friction. Infrastructure teams may resist inserting a new control layer into mission-critical cluster operations. Mitigation: Launch as a read-only forecasting and flex-queue product first, then earn trust before taking over admission control.
- Customers may prefer fixed-power contracts. Enterprise buyers could avoid flex capacity if the operational tradeoff feels too complex. Mitigation: Focus early sales on batch training, backfills, and offline inference workloads where price savings clearly outweigh timing flexibility.
Evidence
Cited sources (35)
- Tech Startups. Peter Thiel-backed Panthalassa raises $140M to build wave-powered floating AI data centers - Tech Startups · https://techstartups.com/2026/05/05/peter-thiel-backed-panthalassa-raises-140m-to-build-wave-powered-floating-ai-data-centers
- OfficeChai. Peter Thiel Leads $140 Million Investment In Panthalassa To Build AI Datacenters In The Sea · https://officechai.com/ai/peter-thiel-leads-140-million-investment-in-panthalassa-to-build-ai-datacenters-in-the-sea
- AWS. Save up-to 90% on On-Demand Prices – Amazon EC2 Spot Instances – Amazon Web Services · https://aws.amazon.com/ec2/spot
- AWS. Reserve GPU instances for ML workloads – Amazon EC2 Capacity Blocks for ML – AWS · https://aws.amazon.com/ec2/capacityblocks
- AWS. Best practices for Amazon EC2 Spot - Amazon Elastic Compute Cloud · https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-best-practices.html
- Google Cloud. Spot VMs | Compute Engine | Google Cloud Documentation · https://docs.cloud.google.com/compute/docs/instances/spot
- Google Cloud. Batch: Simplicity for Batch Computing | Google Cloud · https://cloud.google.com/batch
- Microsoft Learn. About Azure Spot Virtual Machines - Azure Virtual Machines | Microsoft Learn · https://learn.microsoft.com/en-us/azure/virtual-machines/spot-vms
- Microsoft Learn. Overview - Azure CycleCloud | Microsoft Learn · https://learn.microsoft.com/en-us/azure/cyclecloud/overview?view=cyclecloud-8
- SchedMD. Slurm Workload Manager - Overview · https://slurm.schedmd.com/overview.html
- SchedMD. Slurm Workload Manager - Slurm Power Saving Guide · https://slurm.schedmd.com/power_save.html
- Kueue. Overview | Kueue · https://kueue.sigs.k8s.io/docs/overview
- Kueue. Fair Sharing | Kueue · https://kueue.sigs.k8s.io/docs/concepts/fair_sharing
- Kubernetes. Pod Priority and Preemption | Kubernetes · https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption
- NVIDIA. NVIDIA Base Command Manager | AI & HPC Cluster Management Software · https://www.nvidia.com/en-us/data-center/base-command-manager
- Lambda. AI Cloud Pricing | GPU Compute & AI Infrastructure | Lambda · https://lambda.ai/pricing
- Lambda Docs. Using Lambda's Managed Slurm - Lambda Docs · https://docs.lambda.ai/public-cloud/1-click-clusters/managed-slurm
- Runpod. Pricing | Runpod · https://www.runpod.io/pricing
- Runpod Docs. Slurm Clusters - Runpod Documentation · https://docs.runpod.io/instant-clusters/slurm-clusters
- Crusoe. Crusoe Cloud Pricing for AI Compute & Inference | NVIDIA & AMD GPUs · https://www.crusoe.ai/cloud/pricing
- Crusoe. Command Center: GPU observability + orchestration | Crusoe Cloud · https://www.crusoe.ai/cloud/command-center
- Crusoe. Crusoe Announces New 900 MW AI Factory Campus in Abilene, Texas to Support Microsoft AI Infrastructure · https://www.crusoe.ai/resources/newsroom/crusoe-announces-new-900-mw-ai-factory-campus-in-abilene-texas-to-support-microsoft-ai-infrastructure
- Crusoe. Form Energy and Crusoe Announce Agreement for 12 Gigawatt-Hours of Iron-Air Batteries for AI Data Centers · https://www.crusoe.ai/resources/newsroom/form-energy-crusoe-announce-agreement-for-12-gigawatt-hours-of-iron-air-batteries-for-ai-data-centers
- Soluna. For AI - Soluna · https://www.solunacomputing.com/for-ai
- Soluna. Project Kati FAQs - Soluna · https://www.solunacomputing.com/blog/kati-faqs
- Soluna. What 1 Gigawatt Powers: A New Era of Renewable Computing - Soluna · https://www.solunacomputing.com/blog/1gw
- U.S. Department of Energy. Data Centers and Servers | Department of Energy · https://www.energy.gov/cmei/buildings/data-centers-and-servers
- ENERGY STAR. Data Center Equipment | ENERGY STAR · https://www.energystar.gov/products/data_center_equipment
- Crusoe. Crusoe Cloud | AI Platform & Services · https://www.crusoe.ai/cloud
- Google Cloud. Accelerator-optimized machine family | Compute Engine | Google Cloud Documentation · https://docs.cloud.google.com/compute/docs/accelerator-optimized-machines
- AWS. Efficient Batch Processing - AWS Batch - AWS · https://aws.amazon.com/batch
- Google Cloud. GPU pricing | Google Cloud · https://cloud.google.com/compute/gpus-pricing
- Microsoft Azure. Spot Virtual Machines – Spot Pricing and Features | Microsoft Azure · https://azure.microsoft.com/en-us/products/virtual-machines/spot
- Lambda. AI Cloud Platform | Lambda · https://lambda.ai/cloud
- Crusoe. Crusoe Energy | Energy-First Innovation for AI Cloud · https://www.crusoe.ai/energy