• RVM on Ubuntu 22.04 Jellyfish

    I had to setup RVM recently on my Ubuntu 22.04 desktop. However, the original installation instructions were plagued with issues, namely with openssl version on Ubuntu 22.04 which caused conflicts with the ruby installation.

    To fix this issue we need to:

    • Install the Ubuntu version of RVM
    • Install an older openssl version as a package in rvm
    • Reference the above openssl package during the rvm install

    Firstly, I had to install RVM Ubuntu. I followed the original instructions in the README. Ensure that any existing RVM installations are removed first.

    After RVM is installed properly, to install ruby 3.0.0, these were the steps I took:

    rvm pkg install openssl
    
    rvm install ruby-3.0.0 --with-openssl-dir=/usr/share/rvm/usr

    Only the above steps allowed me to get RVM to work.

    Hope it helps someone!

    rvm ubuntu
  • Python 3.12 'pkgutil has no attribute ImpImporter' error

    While creating and using a virtual env created with python 3.12, I installed a package which resulted in pip throwing an error of:

    AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?

    This occured after installing setuptools which was a dependency of another package. As a result, I was unable to use pip itself to remove setuptools.

    Rather than re-create a new venv, the only way to to resolve this issue is to reinstall pip by downloading the pip install file from https://bootstrap.pypa.io/get-pip.py and running it again:

    curl https://bootstrap.pypa.io/get-pip.py -O get-pip.py
    
    python get-pip.py

    By doing so, I was able to get pip working again without rebuilding the entire venv.

    This is highlighted in the python github repo: https://github.com/python/cpython/issues/95299

    python pip venv
  • Forward User IP from Cloudfront distribution

    In a recent project, I was troubleshooting an issue with a cloudfront distribution not passing the right request headers to the origin.

    According to HTTP request headers and CloudFront behavior:

    CloudFront sets the value to the domain name of the origin that is associated with the requested object.

    For X-Forwarded-Proto:

    CloudFront removes the header.

    By default, Cloudfront will forward the IP of the distribution to the origin and not the real user’s IP. In addition it will also remove the X-Forwarded-Proto header.

    To resolve the issue we need to add those two headers to the distribution via a custom policy.

    But which policy group do we add it to? Cache policy ? Origin request policy?

    To provide some context, recent changes to Cloudfront encourages the use of policies to edit the behaviour of the cache key, requests and response headers.

    As per their Cloudfront Policy blog post, Cache policies are generally used for caching assets. Origin request policies should be used instead to modify the request headers since it is invoked during a cache miss or revalidation. In my use case, I don’t want the user’s IP to be cached but instead forwarded it to the origin so a origin request policy is more appropriate.

    Since the cloudfront distribution was built using terraform, I was able to create a custom origin request policy and attach it to the distribution.

    resource "aws_cloudfront_origin_request_policy" "example" {
      name    = "example-policy"
      comment = "example comment"
      cookies_config {
        cookie_behavior = "none"
      }
    
      headers_config {
        header_behavior = "whitelist"
        headers {
          items = ["Host", "Cloudfront-Forwarded-Proto"]
        }
      }
    
      query_strings_config {
        query_string_behavior = "none"
      }
    }
    
    # Attach the above policy to the distribution
    resource "aws_cloudfront_distribution" "s3_distribution" {
      ....
    
    
      enabled             = true
      is_ipv6_enabled     = true
      comment             = "Some comment"
    
      ...
    
      # Attach the above policy to the distribution
      default_cache_behavior {
        ...
    
    
        origin_request_policy_id  = aws_cloudfront_origin_request_policy.example.id
        path_pattern     = "/*"
      }
      ...
    
    }

    We added the Host and Cloudfront-Forwarded-Proto headers to the custom policy.

    In my use case it appends the client IP in the format of x_forwarded_for: <my ip> in the origin’s logs.

    aws cloudfront terraform
  • Redact PII from Cloudfront logs

    In a recent project, I was asked to investigate how to redact or remove personal identiable information which are stored in cloudfront logs via AWS WAF for audit purposes.

    Using a resource of aws_wafv2_web_acl_logging_configuration we are able to declare a redacted_fields block to identity which part of the request to remove. Within the block we can only declare an argument of method, query_string, single_header and uri_path.

    Only the single_header argument takes a name attribute which is what I need in my use case.

    By entering each of the header name in individual block, I was able to filter it out from the cloudfront logs:

    resource "aws_wafv2_web_acl_logging_configuration" "example" {
      log_destination_configs = [aws_kinesis_firehose_delivery_stream.example.arn]
      resource_arn            = aws_wafv2_web_acl.example.arn
      redacted_fields {
        single_header {
          name = "header-1"
        }
    
        single_header {
          name = "header-2"
        }
      }
    }

    To test that it works, I was able to trigger fake requests to be sent via Kinesis firehose which populated the logs. Then I accessed the logs via S3 and checked that the headers were marked with REDACTED if it has removed it.

    More information can be found on the WAF Logging management

    aws terraform
  • Fix docker network build issues

    When running a docker build after the docker daemon is updated, the build logs keep failing with:

    Could not connect to archive.ubuntu.com:80 (185.125.190.36), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.91.39), connection timed out Could not connect to archive.ubuntu.com:80 (185.125.190.39), connection timed out
    ...
    

    It turns out that the docker daemon is unable to use the host networking to do a apt-get update within the ubuntu container during the build process and as such is unable to call out to the remote host.

    To fix the issue system wide we can create a /etc/docker/daemons.json file with the right naemserver entries and restart the docker daemon:

    Firstly, run the following to get the host DNS server ip

    nmcli dev show | grep 'IP4.DNS'

    Create a file at /etc/docker/daemons.json with the following entries:

    {
    	"dns": ["my-nameserver-ip-from-above", "8.8.8.8"]
    }

    Restart the docker daemon

    sudo systemctl restart docker.service
    
    sudo systemctl status docker.service

    As a test we can run the following image to see if it can do a nslookup of google.com from within a container:

    docker run busybox nslookup google.com

    The response should include the DNS server address from above:

    Server:		X.X.X.X
    Address:	X.X.X.X:53
    
    Non-authoritative answer:
    Name:	google.com
    Address: 172.217.16.238
    
    Non-authoritative answer:
    Name:	google.com
    Address: 2a00:1450:4009:819::200e

    Hope it helps someone!

    docker networking
  • Get list of Availability Zones in a given region

    A recent terraform deployment for vpsc it failed with an error of Resource not available in given availability zone.

    I used the aws-cli to get the list of azs from the given region in question:

    aws ec2 describe-availability-zones --region eu-west-1

    It returns a list as follows:

    {
        "AvailabilityZones": [
            {
                "State": "available",
                "OptInStatus": "opt-in-not-required",
                "Messages": [],
                "RegionName": "eu-west-1",
                "ZoneName": "eu-west-1a",
                "ZoneId": "euw1-az2",
                "GroupName": "eu-west-1",
                "NetworkBorderGroup": "eu-west-1",
                "ZoneType": "availability-zone"
            },
            {
                "State": "available",
                "OptInStatus": "opt-in-not-required",
                "Messages": [],
                "RegionName": "eu-west-1",
                "ZoneName": "eu-west-1b",
                "ZoneId": "euw1-az3",
                "GroupName": "eu-west-1",
                "NetworkBorderGroup": "eu-west-1",
                "ZoneType": "availability-zone"
            },
            {
                "State": "available",
                "OptInStatus": "opt-in-not-required",
                "Messages": [],
                "RegionName": "eu-west-1",
                "ZoneName": "eu-west-1c",
                "ZoneId": "euw1-az1",
                "GroupName": "eu-west-1",
                "NetworkBorderGroup": "eu-west-1",
                "ZoneType": "availability-zone"
            }
        ]
    }

    That allows me to troubleshoot the missing az and switch to another region with the right number of azs.

    aws aws-cli
  • Setting up custom learning rate schedulers in TF 2.0

    In ML training, it is essential to understand and utilize an approach to adjusting the learning rate of a model. It helps with applying regularization to the model to prevent overfitting.

    Learning rate decay is an example of a regularization technique which dynamically adjusts the learning rate of a model during its training process. It reduces the learning rate of the model over epochs or steps.

    There are 2 main approaches to using learning rate schedulers in TF 2.0:

    • Using the callback LearningRateSchduler and applying your own function

    • Creating a custom subclass of tf.keras.optimizers.schedules.LearningRateSchedule

    What is the difference ? The main difference is that approach 1 is meant to be called from the callbacks kwargs in the model.fit call whereas the second approach allows you to pass it as an input to the optimizer learning_rate kwarg.

    1. Using the LearningRateScheduler callback

    The callback class requires a function of the form:

    def my_lr_scheduler(epoch, lr):
      # custom code to adjust learning rate
    
      # return new learning rate

    The custom function needs to handle 2 parameters: epoch and lr (learning rate). This callback will be invoked at the beginning of every epoch, passing in the current epoch and optimizer learning rate. The custom function will need to return the new learning rate value, which the callback uses to update the learning rate of the optimizer

    To invoke the example callback above:

    from tensorflow.keras.callbacks import LearningRateScheduler
    ...
    
    mymodel.fit(
    	...
    	callbacks=[LearningRateScheduler(my_lr_scheduler)]
    )

    2. Subclass the LearningRateSchedule base class

    The LearningRateSchedule base class adjusts the learning rate per step / batch of training, rather than over an entire epoch. This is useful if you are training your model in steps rather than epochs. For example, in GAN training

    Example of creating a custom LR scheduler class:

    from tensorflow.keras.optimizers.schedules import LearningRateSchedule
    
    class LinearLRSchedule(LearningRateSchedule):
        def __init__(self, initial_learning_rate, max_iters, **kwargs):
            super(LinearLRSchedule, self).__init__(**kwargs)
            self.initial_learning_rate = initial_learning_rate
            self.max_iters = max_iters
    
        def __call__(self, step):
            new_lr = self.initial_learning_rate * (1 - (step / float(self.max_iters)))
            return new_lr
    
        def get_config(self):
            return {
                "initial_learning_rate": self.initial_learning_rate,
                "max_iters": self.max_iters
            }

    During training, the subclass would be passed directly into the learning_rate kwargs of an optimizer object:

    import tensorflow as tf
    
    optimizer = tf.keras.optimizers.SGD(learning_rate=LinearLRSchedule(1e-1, 100))

    Resources

    LearningRateScheduler Callback

    LearningRateSchedule Class

    machine-learning tensorflow tf2.0
  • Using Rails ecnrypted credentials in Rails 5.2

    From Rails 5.2 onwards, there is no longer a config/secrets.yml file created whenever a rails app is created. The default mechanism is to use credentials to unify the management and storage of confidential information.

    Within a new rails 5.2 app, you will see a config/credentials.yml.enc file which is encrypted by default using the config/master.key. The master.key file is the master key which is used to encrypt/decrypt data stored within the credentials.yml.enc file and as such, it is added to .gitignore by default.

    To view the contents of the encrypted file, you need to run the following:

    EDITOR="vim" bin/rails credentials:show
    

    This will display what was the contents of a file such as config/secrets.yml.

    Mine contains the following out of the box:

    # aws:
    #   access_key_id: 123
    #   secret_access_key: 345
    
    # Used as the base secret for all MessageVerifiers in Rails, including the one protecting cookies.
    secret_key_base: e86bd7e58727da9b818f0f5a8851e8e2c99679bb9ab0728e6d87fbf31febc26ff8b649dda74e8b5632d16521afb30066254a2e4d6869e2fb57cb93f072b3e0ef
    

    To edit/add new entries to the file:

    EDITOR="vim" bin/rails credentials:edit
    

    This will allow you to edit/update the entries within config/credentials.yml.enc

    You can still use the old YAML syntax to declare variables. For example:

    EDITOR="vim" bin/rails credentials:edit
    
    # Add the following snippet below
    foo:
      bar: baz
    

    To access any of the data during runtime, we can use Rails.application.credentials, which returns a ActiveSupport::EncryptedConfiguration object

    For example, to access the default secret_key_base:

    Rails.application.credentials.secret_key_base
    

    To access nested values, we can use:

    Rails.application.credentials.foo[:bar] # => baz
    
    ruby rails encryption
  • Fixing dep update reference not a tree error

    When running dep ensure -update <dependency> -v to update a depedency, one might run into the following error:

    Unable to update checked out version: fatal: reference is not a tree:
    

    This is due to the cached version of the dependency in GOPATH/pkg/dep/sources/<depname> being in detached HEAD state.

    To fix this, cd into the dep cache folder and update it manually:

    cd GOPATH/pkg/dep/sources/<depname>
    
    git checkout master # or branch specified in Gopkg.toml
    
    git pull
    

    Run dep ensure -update <dependency> again and it should work again.

    This is an open issue on the golang dep repository:

    golang dep
  • Using journalctl to check hardware / bootup errors

    While trying to figure out a hardware issue during startup, I discovered that on systemd systems, the journald daemon collects logs from early in the boot process.

    One can use journalctl to view systemd logs for issues:

    sudo journalctl -b -p err
    

    One can then page through the errors list, if any, to resolve any issue.

    Resources

    Wikipedia

    DigitalOcean tutorial

    redhat linux
  • yum update Protected multilib versions

    During a recent yum update, it failed with multiple lines of

    Error: Protected multilib versions:
    
    iptables-1.4.21-24.el7.i686 != iptables-1.4.21-23.el7.x86_64
    
    ( more error lines like above )
    ....
    
    

    From above, what the above error means is that for that specific package, version “X” of an RPM is installed for architecture x86_64 while yum was also instructed to install version “Y” of that same rpm for architecture i686.

    Rather than resolving each depedency manually, I enabled the Workstation Optional RPMs repo to locate the missing rpms and the problem was resolved:

    sudo subscription-manager repos --enable=rhel-7-workstation-optional-rpms
    
    sudo yum clean-all
    
    sudo rm -rf /var/cache/clean
    
    sudo yum update
    

    Additional Resources

    RedHat support ticket

    Same Issue from HPE

    redhat linux
  • Using kube-proxy to access deployments

    When we create a service on k8 cluster, it is often initialized with type of ClusterIP.

    We can still access the service using kubectl proxy.

    kubectl proxy allows one to interact with the API without the need for a Bearer token.

    Assuming we have a service called guestbook, we can access it as below:

    kubectl proxy > /dev/null &
    
    KC_PROXY_PID = $!
    
    SERVICE_PREFIX=http://localhost:8001/api/v1/proxy
    
    GUESTBOOK_URL = $SERVICE_PREFIX/namespaces/default/services/guestbook
    
    kubernetes kubectl proxy
  • Resolving dep ensure conflicts

    Sometimes when collborating on a golang project, it is possible to get dependency conflicts after runningdep ensure.

    The following is an approach I take to resolve them:

    • Run dep ensure -v with verbose to debug the issue.

    • Delete the repo’s Gopkg.lock

    • Clear out the ~GOPATH/src/pkg directory

    • Re-run dep ensure -v

    golang dep
  • Using virtualenv in python

    When working with python, sometimes it is important to create isolated environments due to compatibility issues with the libs being used. Some examples that come to mind is the dependency of pyOpenSSL lib by certbot or setting up a deep learning environment.

    To install virtualenv:

    pip install virtualenv
    

    To create an isolated environment based on a specific python version:

    virtualenv -p /usr/bin/python2.7 <path to env>
    

    Without the -p option, virtualenv defaults to the current python version.

    To activate the virtualenv:

    source <path to env>/bin/activate
    

    You should see the name of the virtualenv in brackets to the left of the terminal. As an extra step, do python -v to check that the version is the one specified above.

    To exit the virtualenv and return to the terminal:

    deactivate
    

    Also install virtualenvwrapper as it provides some useful utility commands to list and create virtualenvs:

    pip install virtualenvwrapper
    

    To list all available virtualenvs, for example:

    lsvirtualenv
    

    Additional Resources

    virtualenv docs

    virtualenvwrapper docs

    python virtualenv
  • golang pointer receiver error

    Assuming we have an interface declaration in go as so:

    type Stringer interface {
      String() string
    }
    

    We can create a custom struct to implement the interface like so:

    type struct MyStruct{
      Value string
    }
    
    func (m *MyStruct) String() string {
      return m.Value
    }
    

    If we try to assign a type of MyStruct to the Stringer interface, we will receive an error of __ MyType does not implement Stringer (String method has pointer receiver)__

    mytype := MyStruct{Value: "test"}
    
    var s Stringer
    s = m // throws the error above
    

    This is because the interface is defined on pointer types of *MyType and not the types of MyType

    To fix the error we just need to use the pointer type:

    mytype := MyStruct{Value: "test"}
    
    var s Stringer
    s = &m // no errors
    
    go golang error
  • Vendoring private github repos using dep

    When using dep for vendoring dependencies in a go project, I came across the issue of pulling down a private github repo. dep ensure -v keeps reporting of an error with the repo.

    To overcome this, you can create a ~/.netrc with your credentials to access the private repo. For example, when using github, you first need to create an Personal Github Token within your Account Settings. Then create a ~/.netrc file with the following format:

    machine github.com
        login [GITHUB USERNAME]
        password [GITHUB TOKEN]
    

    This is also documented in the dep repo:

    Private git repos using github token

    go golang dep
  • Kubectl and KUBECONFIG

    While working on a kubernetes based project, I had to set the $KUBECONFIG env variable in order to access a private cluster.

    Later, I started minikube and ran kubectl config view. This resulted in all my kubectl calls to the private cluster failing.

    The reason is due to the way kubectl config behaves when it detects the $KUBECONFIG env variable. According to the docs:

    kubectl config -h
    
    Modify kubeconfig files using subcommands like "kubectl config set current-context my-context"
    
    The loading order follows these rules:
    
      1. If the --kubeconfig flag is set, then only that file is loaded.  The flag may only be set once
    and no merging takes place.
    
      2. If $KUBECONFIG environment variable is set, then it is used a list of paths (normal path
    delimitting rules for your system).  These paths are merged.  When a value is modified, it is
    modified in the file that defines the stanza.  When a value is created, it is created in the first
    file that exists.  If no files in the chain exist, then it creates the last file in the list.
    
      3. Otherwise, ${HOME}/.kube/config is used and no merging takes place.

    As stated in point 2 above, since my $KUEBCONFIG is still present, when I started minikube, it merges the minikube settings into file pointed to by $KUBECONFIG, updating it by merging the contents of minikube, and sets minikube as the current context. Which is why all the kubectl calls are going to the minikube cluster only.

    As a note to self, I need to remember to

    unset $KUBECONFIG
    kubernetes kubectl
  • Redhat Subscription Renewal

    Recently I had to update my Redhat subscription.

    Afterwards, the subscription-manager application kept showing as “No valid subscriptions found”. This was caused by a mismatch between the type of RHEL system I was running and the actual subscription type itself.

    To ensure that one renews to the right subscription type, simply use the following:

    cat /etc/redhat-release
    redhat linux
  • Checking for open ports in go lang

    I had to create a periodic status check for an open port on a specific host for a service I was creating recently. The status check has to work for both the localhost in development and also on the remote host.

    Using the [net][net pkg] package in Go I was able to come up with the following snippet for testing the localhost port:

    package main
    
    import (
      "fmt"
      "net"
    )
    
    func main() {
      l, err := net.Listen("tcp", ":" + port)
      defer l.Close()
    
      if err != nil {
        // Log or report the error here
        fmt.Printf("Error: %s\n", err)
      }
    }

    We use [Listen][net pkg Listen] above as it works for localhost only.

    For testing the remote port, we can use [DialTimeout][net pkg DialTimeout] as it accepts a custom timeout parameter, which we can use to check for timeout errors:

    package main
    
    import (
      "fmt"
      "net"
      "time"
    )
    
    func main() {
      // host -> the remote host
      // timeoutSecs -> the timeout value
      conn, err := net.DialTimeout("tcp", host, time.Duration(timeoutSecs)*time.Second)
      defer conn.Close()
    
      if err, ok := err.(*net.OpError); ok && err.TimeOut() {
        fmt.Printf("Timeout error: %s\n", err)
        return
      }
    
      if err != nil {
        // Log or report the error here
        fmt.Printf("Error: %s\n", err)
        return
      }
    }

    Resources

    net pkg

    net pkg Listen

    net pkg DialTimeout

    golang network

subscribe via RSS